Xeon E7 8894 v4 - what type of Memory for best performance (single core)?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

nutsnax

Active Member
Nov 6, 2014
254
97
28
113
lol now it's super cost effective server for 1400EUR + 1TB DDR3 next ~1500 lol, but need cassettes for DDR3 with older memory chip
DDR3 is also +150W (for 64 dimms), ok it's late - end of "hobby" time ;)
so when you say that it needs DDR3 cassettes, you mean that you can use the ones out of the DL580 G8 that came with v2 CPU's correct?
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
so when you say that it needs DDR3 cassettes, you mean that you can use the ones out of the DL580 G8 that came with v2 CPU's correct?
Yes - here I used ram cassetes from G8, I dont know if ddr3 cassettes exist with the new chip (3200MT/s) - then there will be no sense to use DDR4 at all ;)
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
Final thoughts on the memory in HP580 G9.
Just bought another 512GB DDR4 and test it, now its 1TB DDR4 2133MHz @ 1600MHz (32x32 DIMMs) CL12:
1658423441338.png

So the final result is: it doesn't matter what type of memory you will use, it can be even DDR3 not DDR4, just use 8 same DIMMs per CPU (architecture is kind of "octa channel" per CPU).

Difference between DDR3 and DDR4 is minimal (~1%), single core performance same on DDR3 (1333MHz CL9) and DDR4 (1600MHz CL12).
Ram make bigger difference when not all slots populated (then you see ~1.5% max between Rank 2 or 4 only in muli-core tests), but when done right any type of memory will perform almost same and better than any Epyc CPU ;) (not for the single core performance - also TR5995X is better in almost everything ;) ).
 

CyklonDX

Well-Known Member
Nov 8, 2022
845
279
63
Here's a few caviar's:


Typically server motherboards have daisy chain memory layout. (same goes to most current amd zen desktop mobos)

The fastest configuration is to have 1 stick per channel, adding more sticks adds capacity but increases latency to access data by few ns on each refresh/tick/clock cycle. The added memory sticks will increase latency but in some cases it will provide greater total bandwidth (not everywhere software level, and not on everything hardware).

Same rules apply on the sticks themselves, more ranks or banks often mean more latency. (depends on motherboard, and type of workload.)
In most cases a single rank 4bit provide best latency in servers. They are faster, but are not as dense.
They will suffer from lesser total bandwidth at cost of much better latency.
Lesser amount of banks = less latency, but potentially lower total bandwidth - unless chips are fast enough.

Load Reduce chips increase latency a lot, but also increase capacity and potential total bandwidth.
Registered mem increase latency, but it provides lesser of a hit when working in daisy chain scenarios vs normal unbuffered memory.


There's no point of using DDR4 clocked at lower Hz than 2133MHz, unless you are going for capacity.
The IO bus clock, as well as memory chip clock is often higher with higher clocked ram, resulting in overall better system performance.
(The DDR3 modules clocked at same speeds as DDR4 can have less latency, but much less capacity.)
 
  • Like
Reactions: BoredSysadmin

RolloZ170

Well-Known Member
Apr 24, 2016
5,340
1,611
113
Registered mem increase latency
typical Registered memory's are two or more DIMMs in once with only one DIMM(address)load.
one 2666Mt 2Rx? RDIMM works with 2666Mt
the equvalent two DIMMs 2666Mt 1Rx? slow down to 2400Mt (POR)
 

CyklonDX

Well-Known Member
Nov 8, 2022
845
279
63
The registered mem, requires 2 refreshes to access the memory for read/write/seek op. While unregistered mem take just single refresh. (so instead lets say 10ns it takes 20ns)
 

CyklonDX

Well-Known Member
Nov 8, 2022
845
279
63
makes little sense, at that point i think the Hz would have to be lowered of the sticks. (if we had ddr4 2400 they'd likely have to run at 1600 to work)
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
imagen a board with 8 UDIMMs per channel.
You are super close ;)

E7 v2/v3/v4 support 24 DIMMs per CPU.
so it is "6 DIMMs" per CPU channel
or "3 DIMMs" per Jordan Creek mem buffer chip channel

There's no point of using DDR4 clocked at lower Hz than 2133MHz, unless you are going for capacity.
Not possible here :/ (possible on E5 v2 even on DDR3 -1680v2 also possible with E5 v3, I think up to 2600 or slightly more when overclocked - 1660v3/1680v3).

It only applies in a very limited fashion to this situation, because E7 v4 use external memory buffers (e.g. Jordan Creek)
Exactly, I spend some time to tweak it, and it's different than any other Xeons.
How it works and why 2400MHz is not possible:

E7 v2/v3/v4 chips has 4 memory channels connected to the memory buffer - Jordan Creek or Jordan Creek 2.

E7 v2 has 4 lanes of SMI working at 2666MHz or 1600MHz
E7 v3/v4 has 4 lanes of SMI2 working at 3200MHz or 1866MHz (or 2666/1600MHz when connected to older Jordan Creek).

each Jordan Creek chip have 2 mem channels and can work at:
1333MHz (2666MHz to the CPU) or in a single channel mode 1600MHz (1600MHz to the CPU)

Jordan Creek 2: 1600MHz or 1866MHz.


So fastest possible configuration for v3/v4 is:
E7 v4
*(SMI2 ch1)Jordan Creek @ 3200MHz
#(DDR ch1)DDR3 @ 1600MHz
#(DDR ch2)DDR3 @ 1600MHz
*(SMI2 ch2)Jordan Creek @ 3200MHz
#(DDR ch3)DDR3 @ 1600MHz
#(DDR ch4)DDR3 @ 1600MHz
*(SMI2 ch3)Jordan Creek @ 3200MHz
#(DDR ch5)DDR3 @ 1600MHz
#(DDR ch6)DDR3 @ 1600MHz
*(SMI2 ch4)Jordan Creek @ 3200MHz
#(DDR ch7)DDR3 @ 1600MHz
#(DDR ch8)DDR3 @ 1600MHz

it is much faster than:

E7 v4
*(SMI2 ch1)Jordan Creek @ 1866MHz
#(DDR ch1)DDR3 @ 1866MHz
*(SMI2 ch2)Jordan Creek @ 1866MHz
#(DDR ch3)DDR3 @ 1866MHz
*(SMI2 ch3)Jordan Creek @ 1866MHz
#(DDR ch5)DDR3 @ 1866MHz
*(SMI2 ch4)Jordan Creek @ 1866MHz
#(DDR ch7)DDR3 @ 1866MHz


SMI is working at 3200MHz, in time when DDR4 was ~2400MHz, so it's similar to 9000MHz today ;)
I was not able to find any mode or solution faster than 1866MHz and this 1866MHz was not optimal, and much slower than 2x1600MHz.
 

fp64

Member
Jun 29, 2019
71
21
8
Hello,

Revived this post because i am looking into buying a DL580 G9 with (probably) 4x E7-8880 as a gpu platform.

Is it possible to condense the chat above to simple recommendations for max fp64 performance?
 

Kernkamp

New Member
Mar 23, 2019
2
0
1
For CFD, memory bandwidth is the most important factor. The post just before your's fp64 is key.: run 8x DDR3/4 at 1600 MT/s per socket and get 4x 3200 MT/s from the SMI2 link.
 

Kernkamp

New Member
Mar 23, 2019
2
0
1
Besides the dirt cheap E7-8880 v4, you could also look at the E7-8894 v4 and E7-8890 v4. They are $100 and $60 on ebay respectively.
 

fp64

Member
Jun 29, 2019
71
21
8
Thanks.
managed to skim thru the various manuals and it seems E7 has two memory controllers per chip. nice feature really. the cheapest 8894 seems to be about 600 euro on ebay.
 

nutsnax

Active Member
Nov 6, 2014
254
97
28
113
I'm looking at this for a massive data mining project. I was thinking of getting a DL580 gen 9 filled with 64GB sticks of DDR3 but can't remember if that requires a special BIOS?