Xeon E7 8894 v4 - what type of Memory for best performance (single core)?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

MichalPL

Active Member
Feb 10, 2019
189
25
28
kind of ;)
sorry I name multicore wrong it's a "passmark CPU score" - so there is a single core factor that can be huge when single core is so slow.

I will paste the full results soon and make sure it utilize all 96cores
 

nutsnax

Active Member
Nov 6, 2014
260
98
28
113
PassMark - AMD Ryzen 9 5950X - Price performance comparison - the result is bit under 46k
Passmark doesn't yet have results for quad E7-8894, but its smaller cousin quad E7-8880 (24 vs 22 cores and 165W vs 150W TPD)
PassMark - [Quad CPU] Intel Xeon E7-8880 v3 @ 2.30GHz - Price performance comparison - 50k


FTFY, and yes. o_O
That is insane..... Is it the much lower clocks that is the reason?

Im looking at one of these v4 servers for a heavily multithreaded workload.

This will use as many threads as i can get but no one single thread utilizes 100% of its assigned core currently so im thinking i could get away with a whole bunch of lesser cores like the xeon v4.

Also, what impact does removing security mitigations have, i wonder?
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
OK in multicore performance not that bad (overall rank affected by single core performance drama) - looking on the details. It is like 64 core 3995WX, similar to fastest 64 core Epyc, slower than TR 5995X, but sill should be at least 15-20% faster :/

and yes new passmark fully utilize all 96 cores.


1657305161466.png

Compare to others (Integer Math and SSE):

4x 8894 v4 (not tweaked yet, problems with boost):
Integer Math488,554 MOps/Sec

Extended Instructions125,986 Million Matrices/Sec

AMD Ryzen Threadripper PRO 5995WX:
Integer Math628,857 MOps/Sec

Extended Instructions119,620 Million Matrices/Sec

[Dual CPU] Intel Xeon Platinum 8380:
Integer Math586,518 MOps/Sec

Extended Instructions153,512 Million Matrices/Sec

AMD Ryzen Threadripper 3970X:
Integer Math265,668 MOps/Sec

Extended Instructions73,828 Million Matrices/Sec

AMD Ryzen 9 5950X:
Integer Math192,671 MOps/Sec

Extended Instructions40,343 Million Matrices/Sec

Intel Core i9-12900K:
Integer Math138,265 MOps/Sec

Extended Instructions33,849 Million Matrices/Sec
 

MichalPL

Active Member
Feb 10, 2019
189
25
28


I used numbers from my experience (I "fix" it later and copy paste the numbers like 5995X from passmark - because dont have any) when we build a few 5950X workstations.

+$50 for better AIO plus 3.8GHz DDR4 and 5950x is around 50k passmark points instead of 46k.

Passmark doesn't yet have results for quad E7-8894, but its smaller cousin quad E7-8880 (24 vs 22 cores and 165W vs 150W TPD)
PassMark - [Quad CPU] Intel Xeon E7-8880 v3 @ 2.30GHz - Price performance comparison - 50k
Wow it means 4x 8895 v2 on DDR3 is "faster"


It says 57k but I never was able to do more than 51k on dl580 g8.
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
Im looking at one of these v4 servers for a heavily multithreaded workload.
Might be smart. They are almost for free now, like G8 2.5year ago. The only problem is (for me at least) single core drama. 2.5year ago also was no Ryzen5000, 12900k and Epyc zen3.


This will use as many threads as i can get but no one single thread utilizes 100% of its assigned core currently so im thinking i could get away with a whole bunch of lesser cores like the xeon v4.
Confirmed 8894v4 can do this, I am fighting with 2.9GHz all core limit (was 3.2GHz in 8895v2 - 15 vs 24 cores but also 22 vs 14nm).

Maybe even Oracle 8x 8895v2 DDR3 will be better?

Also, what impact does removing security mitigations have, i wonder?
Hmmm, good idea - do you know how to do this?
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
Ok now 8894v4 it's faster than 8895v2 @3.6GHz :)
2115 points on single core @ 2.88GHz, still need 3.4GHz (or more)

AMD EPYC 7773X - 2410 points :/
AMD EPYC 7773X still a bit faster in "Integer Math" much slower in (not needed in my case) SSE/AVX.
 

nutsnax

Active Member
Nov 6, 2014
260
98
28
113
Ok now 8894v4 it's faster than 8895v2 @3.6GHz :)
2115 points on single core @ 2.88GHz, still need 3.4GHz (or more)

AMD EPYC 7773X - 2410 points :/
AMD EPYC 7773X still a bit faster in "Integer Math" much slower in (not needed in my case) SSE/AVX.
Did you disable mitigations? And is this score the result after you disabled them?
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
Ok fully tested, the winner in the memory is: 2133MHz 4Rx4 32GB (slightly ~0.3% faster than 2400MHz Rank 2 32GB).
working at 1600MHz CL 12-11-11.

The only current issue is: 2 DIMMs per channel are improving speed a bit, so I should add in the future another 512GB,
in just 2 cartridges populated (1 CPU only connected to the RAM) in the multicore performance was huge (~20%), in all it's less but still visible.

Multicore result is between Threadripper PRO 5995WX and EPYC 7773X now,
single core is 12% lower than EPYC 7773X, and no Idea how to increase it (boosting to 3.4GHz now).

1657902801398.png
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
Just read pdf about "Jordan Creek 2" chips (mem buffers for E7 v3/v4) - and I made BIG mistake in number of RAM dimms and memory configuration.

The bad idea:
E7 is quad channel
So 4x 4 =16 dimms
NOT TRUE

good one:
E7 is quad channel but using SMI interface.
SMI almost double the single DIMM speed.
Each "Jordan Creek 2" chip is converting two ddr channels (3 dimm each) to single SMI interface.

So: 4 processors x 4 SMI x2 = 32DIMMs optimal.

In short there is still space for improvment I put just 16dimms here
 
  • Like
Reactions: Dmitrij_S

nutsnax

Active Member
Nov 6, 2014
260
98
28
113
Just read pdf about "Jordan Creek 2" chips (mem buffers for E7 v3/v4) - and I made BIG mistake in number of RAM dimms and memory configuration.

The bad idea:
E7 is quad channel
So 4x 4 =16 dimms
NOT TRUE

good one:
E7 is quad channel but using SMI interface.
SMI almost double the single DIMM speed.
Each "Jordan Creek 2" chip is converting two ddr channels (3 dimm each) to single SMI interface.

So: 4 processors x 4 SMI x2 = 32DIMMs optimal.

In short there is still space for improvment I put just 16dimms here
So there is a significant improvement in having all of the memory channels populated?

Also how much power does it draw when running this benchmark?
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
sorry it is 64x16GB

but yes it does !! it is much faster on DDR3 (64dimms) than on DDR4 (16 dimms)

1657989852203.png
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
ok yes, 64x DDR3 @ 1333MHz CL9 ("octa" ch) are faster than (16 dimms) DDR4 @ 1600MHz CL12 quad ch.
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
just updated the video - DDR4 cassettes again (unfortunately drivers update in background - slightly affected results), bios is U17 2.76
 

MichalPL

Active Member
Feb 10, 2019
189
25
28
lol now it's super cost effective server for 1400EUR + 1TB DDR3 next ~1500 lol, but need cassettes for DDR3 with older memory chip
DDR3 is also +150W (for 64 dimms), ok it's late - end of "hobby" time ;)