X10SRL-F number of memory modules

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

doop

Member
Jan 16, 2015
45
2
8
81
Hello! We just ordered a dedicated server. dmidecode says the exact configuration is
X10SRL-F
E5-1650 v3
2x32GB DDR4 (M393A4K40BB0-CPB)

Will it run slower with only 2 memory modules?
I see that ark.intel.com lists E5-1650 v3 with "Max # of Memory Channels 4".
In the Supermicro manual I could only find
Populating these DIMM modules with a pair of memory modules of the
same type and size will result in interleaved memory, which will improve memory
performance.
 

doop

Member
Jan 16, 2015
45
2
8
81
Yes, it does run. My fear is that the system is running at half speed, as the CPU has 4 memory channels and I assume 2 memory modules are only capable of using 2 of those 4 channels.

If my understanding is correct, it would be best to use 8 modules, resulting in 2 memory modules per channel. This way we use all the memory channels, and also get «memory interleaving» between each pair of modules on each channel which I think may give better performance if one module isn't able to saturate the bandwidth of it's channel (this probably depends on memory access patterns?).

I think this document supports the first claim (about #channels):
http://www.supermicro.com/support/resources/memory/X10_memory_config_guide.pdf said:
Question:
Do I have to populate all CPU memory channels? What if I only put one or two DIMMs per
processor?

Answer:
It’s ok to run with as little as one RDIMM/LRDIMM per system or per CPU (unbalanced configurations) but it will impact CPU performance. Socket R3 CPU’s have four memory channels and for best performance it’s recommended to have a minimum of one DIMM in per channel. A channel left unpopulated will reduce the memory bandwidth by 25%, so with only one RDIMM per CPU memory bandwidth performance is reduced by 75%.
However the motherboard manual ([1], page 1-6) seems to indicate the motherboard can't do better than "Dual-channel memory", so I think I'm stuck at 50% memory bandwidth regardless of memory configuration?

Is there some truth to this? :p

[1] ftp://ftp.supermicro.com/CDR-X10-UP_1.13_for_Intel_X10_UP_platform/MANUALS/X10SRL-F.pdf
 
Last edited:

Peanuthead

Active Member
Jun 12, 2015
839
177
43
44
Doop, in short their is a small difference since Haswell-EP supports/is in quad channel. The difference is imperceivable at best.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
Almost every bench have difficulty telling the difference between dual- and quad-channel RAM. I'd be very surprised if it made a non-negligible difference to a real-world workload.

Some synthetic benches like that Sandra mem bandwidth one will quite happily say "quad channel is twice as doubled my megahurtz!!!!111oneon" but that doesn't translate into real world performance except on exceptionally specialised code (or e-peen).
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
Without anything in the way of numbers, nor of details of the benchmark (is it likely CPU or RAM limited? If RAM-limited, bandwidth or latency? Or simply the amount of it?) or graphs of relative performance, there's little evidence there to suggest that the sole reason for something being faster is solely due to quad-channel RAM. There's no post from this acasas chap who, after putting quad channels into his xeon workstation, reported a doubling in performance.

If I'm reading it right the sole argument there seems to be "I did the same run on computer X and computer Y, computer Y should be faster but computer X is!". Someone saying that computer Y has more theoretical memory bandwidth is neither here nor there in the discussion, the question is over whether it's actually used or needed. Possible that this simulation engine is incredibly bandwidth hungry (and thus putting it into what I referred to above as specialised code) but there's more at play here than just theoretical memory bandwidth. Ignoring core count and speed, as also alluded to in the forums, NUMA is a big concern on 2P/4P tin and can have a huge effect on both memory bandwidth and latency.

Disclaimer: Haswell-EP user myself (Xeon 1650-v3), I did some experiments with the stuff on my workstation when I went from 2x8GB to 4x8GB (mostly video encoding, file archiving and RAW processing; none of the tasks used more than 6GB of RAM on their own) and didn't notice any difference in performance outside of the margin of error.

TL;DR: It's complicated :)