Tested: DL580G8 (Gen8) and G9 (Gen9) *Lack of* Compatibility

Layla

Game Engine Developer
Jun 21, 2016
165
117
43
37
I've been long wanting to do this test, and today finally got around to it with a few spare CPU trays and Gen8 and Gen9 chassis right next to each other.

Unfortunately, it seems that:
1. You can't boot a V3 CPU with DDR3 memory on an HP U17 BIOS. It complains about mismatched memory vs. CPU. This is silly, since the memory boards are interchangeable and the hardware fully supports it. HP seems to just disallow this to increase market segmentation and thereby profits.
2. You can't boot a V2 CPU + DDR3 on a U17 BIOS.
3. You can't boot a V3 CPU + DDR3 on a P79 BIOS.
4. You can't flash a U17 BIOS on a DL580 G8 with a P79 BIOS.

Basically, despite the CPU socket being exactly the same for E7-V2, E7-V3 and E7-V4, and the CPU trays being swappable, and the CPU trays and motherboards being identical, HP has firmware locked out all of the neat things this hardware platform was designed to be able to do in terms of interchangeability and scalability. What a shame. :/
 

CookiesLikeWhoa

Active Member
Sep 7, 2016
112
26
28
32
We are talking Intel E5-V3's here? If that's the case, those are DDR4 locked.

I know V1/V2 can be swapped (with bios support) and V3/V4 can be swapped (with BIOS support) but never saw V2/V3 swap because of the different memory requirements.

But I could be gravely mistaken here.
 

Layla

Game Engine Developer
Jun 21, 2016
165
117
43
37
E7 not E5.

But, V3 of both E5 and E7 support both DDR3 and DDR4 in hardware. But on E5, the motherboard needs to pick which slots it provides. On E7, with memory buffer chips, you often (though not always) get server designs with memory boards which can be replaced, so you can pick DDR3 or DDR4 memory boards dynamically by swapping those.
 
  • Like
Reactions: Fritz

MichalPL

Member
Feb 10, 2019
53
5
8
Not fully true.

E5 v1 v2 - yes you can swap it, the difference is DDR3 1600 vs 1866 (sometimes you can overclock it to 2133MHz), and v2 are better in massive multicore performance. When having 4 cores v1 or v2 not make any difference but 6 cores yes, 1650 v2 is much faster than 1660 v1. 1680 v2 is the nice one (still). Overclocking wise all of them can reach ~4.5GHz (~4.7 single core), 4.1-4.3GHz without voltage increase.


E5 v3 - Just some of them are supporting DDR3 (short list and only from 26xx series) not overclockable ones (and no sense to buy any of them anytime) because 1680v2 is MUCH faster (until building 2x CPU with slow single core performance like ~3.6GHz).

If I remember in passmark10 the fastest one with DDR3 was about 20k points. Fastest dual cpu E5 v2 ~28k points.

1660 v3 was nice, after overclocking (4.4/4.8GHz) like i7 9800X, and even today its still just slightly (~20%) slower than i9-10800X.

E7 v1 and v2 are different. E7 v1 is more like hm... predecessor of E5 v1, slow and ugly.

E7 v2 are good, I also installed it in HP DL580 G8 60 cores, 3.2GHz all core 3.6GHz boost (E7 8895 v2) - I don't see sense to upgrade it to v3 because the difference in number of cores are minimal 15 vs 18, and you need to change RAM to DDR4, for I don't know 18% speed increase ?

E7 v4 are better - are you sure if you replace RAM cassettes are you not able to flash it to G9 ?
E5 v4 - no sense to look at it, 14 core xeons/i9 for new LGA2066 are much better ;)

Summary:

There is no sense to think about it because of: AMD 5800X, 5950X and TR 3970/3990/3995 X
(until you need 2TB of RAM then you can buy DL580 60core cheaply and fill it with 2TB RAM DDR3 almost for free)

Lot of PCIE slots are good too. but QPI links are not that fast, I reach maybe 24Gbytes/s read/write... slooooow !!!! really Intel was fantastic but in 2014 not in 2021 now AMD is the king

PS. I am also using (two, almost identical) DL580 G8 for compiling game engines (and now switching to TR :) )

Passmark10: AMD 5950X - 49k points (3700 single core), DL 5800 G8 60 cores 3.2GHz/3.6Boost - 52k points (2050 single core). Yes in compiling it's more like fastest Epyc (Rome) because don't need an FPU, but still TR 64core is faster.
 
Last edited:

Layla

Game Engine Developer
Jun 21, 2016
165
117
43
37
Haswell-EP and Haswell-EX both support DDR3 and DDR4 - possibly on EP they dropped it from LCC dies? Or they just pretend it isn't there. I don't pay attention to LCC or single-socket dies.

Broadwell-EX also supports DDR3 and DDR4.

HP will not let you use DDR3 on the U17 BIOS and won't let you use DDR4 on P79. And you can't vanilla flash P79 onto a G9, or U17 onto a G8.

Which isn't to say it can't be done - but to say HP doesn't want you to do it, and puts roadblocks in the way. Dell is rumored to be equally evil about this, despite also using memory boards.

Whats an actual NUMA-aware benchmark look like? Obviously the Gen8 would crush the 5950X in anything NUMA-aware and heavily threaded.

I have a Dual Epyc 256T machine - It's fast, but the quad-xeon EX (V2, V3, V4) still hold their own pretty well. We only have one legacy V1 left - it's very inefficient by comparison.

But yes, 1TB+ of RAM is the main reason for the 4-socket servers.
 

MichalPL

Member
Feb 10, 2019
53
5
8
Haswell-EP <- I can say from experience that just some of them support DDR3 one of them is: E5-2678 v3 and I am fully confirmed that it support is while others (for example 1660 v3) not, and I am almost sure that this is LCC/microcode thing not a BIOS(UEFI).

Problem with E5-2678 v3 is that it's slower than 1660 v3 after overclocking, and over all E5-2678 v3 (even dual core) it's to slow to use it as a desktop PC (single core performance is too slow).

What is funny, difference in speed between DDR3 and DDR4 was minimal - I don't exactly remember what it was but I am guessing not more than 5%.

I think this is a full list Haswell-EP with DDR3 support:
E5-2629 v3 3,2 x 8
E5-2649 v3 3,0 x 10
E5-2669 v3 3,1 x 12
E5-2673 v3 3,1 x 12
E5-2676 v3 3,1 x 12
E5-2678 v3 3,3 x 12

Haswell-EX:
Yes, and it's 100% official.
what is funny it's DDR3 at 1600MHz and DDR4 at fast DDR3 speed ;)

I was thinking about the same to replace the cassettes to DDR4 and flash (via SPI?) G9 BIOS :) But programmers said that we need two of the HP so I decided to not broke it ;) but... today it's a sport - probably you can buy used G9 without CPUs for a price of a single 5950X ;)

About the cassettes: between RAM and CPU there is a custom chip, I don't know what the chip is exactly doing but (allowing for 6 dimms per channel?) for the max performance 1333MHz setting (I forgot the name of it) is better than the 1600MHz, and dual rank 1866MHz low CL dim into each white slot.

Side note here: HP engineering (inside) it's beautiful beast :) I think it was a HP decision to "disable" it. Also the naming of the "v1" "v2" it's to compatible between E5 and E7 and i7 series, it's like with the cars experimenting different things in different segments. (PCIE 3.0, RAM speed, ..)



Ivy Bridge EX vs Haswell-EX:
In my opinion there is no reason to even look into Haswell-EX having Ivy Bridge EX machine why (until you need more modern AVX).

Ivy Bridge EX: up to 3.6GHz and 15 cores
Haswell-EX: up to 3.3GHz and 18 cores
same 22nm process, almost same performance per clock. 8895 v2 is the winner here.

One sec I will log into server and do or find some tests.

Whats an actual NUMA-aware benchmark look like? Obviously the Gen8 would crush the 5950X in anything NUMA-aware and heavily threaded.
No it's not crushing :) I was also shocked how fast the 5950X is, I am saying in compiling CPP it's maybe about 1.5-2x faster.

We build one with a water cooling, tweak RAM to work with 3466MHz (sticker speed 3800MHz - but AMD still have problem with ram speed) put 4x SSD Samsung Evo 970 and make hardware RAID0 (12GB/s) on "gaming" X570 motherboard! Performance/price is amazing here.
I think 16 cores here is more like equivalent of ~35 E7 8895 v2 cores.

And it's small, light and silent box (with RGB inside __of course__ ;) )

I have a Dual Epyc 256T machine - It's fast, but the quad-xeon EX (V2, V3, V4) still hold their own pretty well. We only have one legacy V1 left - it's very inefficient by comparison.

But yes, 1TB+ of RAM is the main reason for the 4-socket servers.
Yes - V1 is more like core 2 quad architecture with HT :) or i7 970. (while V1 in E5 is more like i7 3770k - 2 gen. up)
V2 is almost same speed (IPC) like i9-10940X :) (and E5 v2 and E7 v2 are almost identical too)

Epyc hmm.... they are not good, mostly because of almost same single core performance like E7 v2/v3 (and much slower than E5 v2/v3).
TR single CPU (with good water cooling designed for TR) should be faster than dual cpu Epyc in almost everything.
 
Last edited:

NablaSquaredG

Active Member
Aug 17, 2020
387
147
43
About the cassettes: between RAM and CPU there is a custom chip, I don't know what the chip is exactly doing but (allowing for 6 dimms per channel?) for the max performance 1333MHz setting (I forgot the name of it) is better than the 1600MHz, and dual rank 1866MHz low CL dim into each white slot.
That is a chip called "Jordan Creek", a memory buffer developed and produced by Intel.
Xeon E7s don't use standard DDR interface (the RAM is never directly attached to an E7 CPU), but rather something called "Intel SMI" for "Intel Scalable Memory Interconnect", which is a high speed memory bus, similar to (IBM) Open Memory Interface.
 
  • Like
Reactions: Layla and MichalPL

MichalPL

Member
Feb 10, 2019
53
5
8
promised benchmark:
1616331923304.png

*It's not the fastest one because single core performance it's only stable if 1 CPU is inside (and then it's even faster :) I thnk about 2100 points )

You can compare in details to:
5950X:
after putting proper cooling (for ~$150), and Tweak RAM correctly (just 2 channels) it's much faster and reach 49000 points.

3970X:
3995X:

and then for example 3 page files for RAM on 3x Samsung 980 Pro - 1M ops per second ;), cheap dedicated 3TB RAM extender ;)

Problem with Epyc is here:
 

MichalPL

Member
Feb 10, 2019
53
5
8
That is a chip called "Jordan Creek", a memory buffer developed and produced by Intel.
Xeon E7s don't use standard DDR interface (the RAM is never directly attached to an E7 CPU), but rather something called "Intel SMI" for "Intel Scalable Memory Interconnect", which is a high speed memory bus, similar to (IBM) Open Memory Interface.
So the compatibility things about DDR memory are shifted to the "Jordan Creek" and "Jordan Creek 2" chips ?
 

Layla

Game Engine Developer
Jun 21, 2016
165
117
43
37
I have zero faith in Passmark as a benchmark above 64T - FastBuild or UnrealEngine build times would be interesting.
 

MichalPL

Member
Feb 10, 2019
53
5
8
UnrealEngine: we have no precisely measuring it unfortunately, it's about 2 times faster than 5950x (with linking - where 5950x is much faster):/

In my opinion Passmark10 is good at it (Passmark9 was not good)

BTW. I found the old photo of single CPU installed in DL580 G8 - E7 8895 v2, but unfortunately Passmark9. 2.7x speed up in "points" on 4 cpu's.
1616352596123.png
 

Layla

Game Engine Developer
Jun 21, 2016
165
117
43
37
UnrealEngine: we have no precisely measuring it unfortunately, it's about 2 times faster than 5950x (with linking - where 5950x is much faster):/

In my opinion Passmark10 is good at it (Passmark9 was not good)

BTW. I found the old photo of single CPU installed in DL580 G8 - E7 8895 v2, but unfortunately Passmark9. 2.7x speed up in "points" on 4 cpu's.
View attachment 18003
That's why it's a poor benchmark for the things I care about, it should be 3.xx faster on 4 CPUs for compiling code, for example.

Also in my use-case linking isn't serial, so I don't have the issue UE4 does with single threaded performance.
 

MichalPL

Member
Feb 10, 2019
53
5
8
That's why it's a poor benchmark for the things I care about, it should be 3.xx faster on 4 CPUs for compiling code, for example.
Yes, where XX is close to 99 : There is single core speed factor that is taken into equation :) But small sub result's should be ~4x faster.

Also in my use-case linking isn't serial, so I don't have the issue UE4 does with single threaded performance.
True. single core it's still faster than E5 2678 v3 (on DDR3or even DDR4) and still about 65% of the fastest intel CPU after many years :)

For sure we will be testing it when TR will arrive - so I will note time for all machines. Unfortunately all 5950X are taken from the office to home (covid) because they are fast :) I should have just some 5800X and 1680v2 and 2x 2997v2 I think one 10920X, and my own old desktop 1650v2 - with is unfortunately slow when have 40k sharers to compile, but it was fast enough to rewrite part of raytracing UE4 code in one night, day after receiving DXR version :)

btw. I just found used GL580 G9 for $1340 :) with no CPU, worth try?
hmm...
hmmm - 8895v3 - $699 (18 cores)
hmmm - 8895v2 - $149 (15 cores)

is it (E7 v3) also boosting to 3.2GHz all core ?

ok, not now ;) I will compare it to TR... but HP is so well designed... I remember HP 620 dual CPU, built (almost) like mac pro :)
 
  • Like
Reactions: Layla

Patriot

Moderator
Apr 18, 2011
1,333
718
113
I worked there during that era, there were roms that supported ddr3 on v3 for developmental purposes, it is not as far as I am aware a hardware limit though it is probably not an ideal performance thing, or for the sake of uniformity of ddr4 on gen9 a supported configuration.

I worked with Quad opty, quad Sandy and ivy rigs on the e5 and e7 era. 2p epyc and 2p scalable put a hurt on them in power and performance.
Intel did quietly release E7 v4 though I don't know if that is a supported config in these.

I would suggest a Zen1 or Zen2 Epyc build over 18c v3 e7 chips. While I did love playing with the 580s they were a tad cantankerous.
I am still trying to decide how to replace my Xeon workstation, 1680v3 @4.2ghz with 256gb rdimms. Little 5600x in my gaming rig runs laps around it even on multithreaded performance with 2 less cores.


If yall want to play with hardware performance metrics, I have on hand... a 2p 2699v4 box, 1 x99 box that can take any cpu, has 1680v3 currently.
a 5600x and 5950x box.
 
Last edited:

Layla

Game Engine Developer
Jun 21, 2016
165
117
43
37
I honestly suspect it was to force planned obsolescence - they get a lot more money selling 580 Gen9 than just CPU upgrades!
 

NablaSquaredG

Active Member
Aug 17, 2020
387
147
43
Yep that seems very likely, because I can confirm that 1st gen memory buffers (for v2 CPUs) work just fine with v4 CPUs... They just weren't validated by Intel afaik.

It's one thing to not officially support it, but it's a completly different thing to block it in the BIOS
 
  • Like
Reactions: Layla

RussianE39

New Member
May 22, 2021
4
2
3
Funny enough, if we talk about E7 v3 + DDR3 - almost every vendor except HP and Dell supported those configuration. Cisco UCS series support v3 with DDR3 ram, Oracle (which in my opinion almost always best engineered servers), even Huawei supported DDR3 with v3 Xeons. I had some experience with almost every quadsocket Brickland platform released, and in my experience DL580 Gen8/Gen9 is actually one of the worst ones, only IBM with their awful X6 is worse. And my favorites are Cisco UCS C460 and Oracle X4-4/X5-4 machines. Although nothing new here, DL560 Gen8 were one of the worst "el-cheapo" quad socket machines... So what else to expect?