Memory advice for E5 v4 - which rank and width for RIMMs?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Aluminum

Active Member
Sep 7, 2012
431
46
28
Read a couple whitepapers that have been published, altogether they are as clear as mud. This is driving me nuts, anyone done detailed server builds or configuration?

The obvious/knowns:
-E5 269x v4 aka HCC die so dual memory controllers (apparently this complicates things but in a good way)
-8 dimm/socket, populating in multiples of 4 (the only unquestionable thing)
-going with registered
-I don't want load reduced DIMMs due to cost and high latency/lower clocks: they are only meant for max capacity
-no unbuffered, officially UNsupported by E5 26xx according to intel though it might work

I want to use 16GB modules since this is the sweet spot on price, I would populating 8 per socket but this means 2DPC which can limit speed to 2133? Ideally I want to run at 2400 as v4 cpu supports this but getting conflicting info from whitepapers and vendor pdfs. (e.g. lenovo claims this is a 'fancy feature' their competitors can only do at 1DPC?)

Fundamentally these all cost about the same (about $90~100 a stick) I just have to buy from different sellers depending on type and will end up with kingston or crucial given price/availability/support, both of them make all types:

1R4 - better ECC, low(est?) latency, less bandwidth(?)
2R4 - better ECC, higher latency, more bandwidth(?)
1R8 - basic ECC, low latency, less bandwidth(?)
2R8 - basic ECC, higher latency, more bandwidth(?)

1DPC 64GB/cpu or 2DPC 128GB/cpu? Really don't want to drop to 2133, can this be forced in bios or will the cpu do whatever intel programmed?


I know single rank should be lower latency by design, but supposedly dual rank has more bandwidth which makes no sense to me as you can only talk to half the dimm at a time?

I *think* I want to buy eight 1R4 but I'm not sure. I'm 90% sure I want _R4 due to better ECC. Oddly enough though every paper and post claims single rank x4 costs more due to needing higher density chips, right now those are the cheapest 16GB 2400 modules I can find.


references I dug up that have lots of info, but none of them seem to have a clear and concise answer:
https://h20195.www2.hpe.com/V2/getpdf.aspx/4AA6-6217ENW.pdf?ver=1.0
https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-broadwell-ep-memory-performance-ww-en.pdf
http://www.supermicro.com/support/resources/memory/X10_memory_config_guide.pdf
http://www.jedec.org/sites/default/files/files/Reza Bacchus_Technical Session Server Forum 2014.pdf
DDR4 SDRAM - Wikipedia
Memory Deep Dive: Optimizing for Performance - frankdenneman.nl
 

Aluminum

Active Member
Sep 7, 2012
431
46
28
Not surprised that nobody had anything to say, rank comparisons are pretty obscure and often confused with the usual ecc/reg/lrdimm topics.

According to this though if you run 2DPC the performance is about the same and you should get 2133 on v3 and 2400 on v4 xeons:

STREAM OMP benchmark compiled with ICC

So I said ****it and got eight 2x4 16GB 2400 modules, since there is an ECC improvement over Nx8.
 

ATS

Member
Mar 9, 2015
96
32
18
48
Not surprised that nobody had anything to say, rank comparisons are pretty obscure and often confused with the usual ecc/reg/lrdimm topics.

According to this though if you run 2DPC the performance is about the same and you should get 2133 on v3 and 2400 on v4 xeons:

STREAM OMP benchmark compiled with ICC

So I said ****it and got eight 2x4 16GB 2400 modules, since there is an ECC improvement over Nx8.
The performance improvement with 2R is due to the increased number of DRAM pages and hence the probability of hitting an open page. There is also a contribution due to the CPU operating the DRAM in burst chop mode. Natively the DRAMs have a burst length of 8, but with ganged channels, that would result in 128B transfer size which 2x the cacheline size. Burst chop limits the external burst length to 4 but doesn't change the internal burst length. As such, with 1R 1Dimm system, you are limited to 50% of max bandwidth. Going to multiple dimms per channel and multiple ranks basically eliminates the penalty of burst chop mode.

Oh, and yeah, for servers, you always want x4s as that allows chipkill.