16 Core 32 Thread HP z820 dual socket workstation build/performance upgrade

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
Except it doesn't. Modern single socket systems have considerably more even with half the CPUs. Dual socket AMD Epyc nets you ~4x the memory bandwidth.
True, there are plenty of state of the art systems that still fall short of the z820's true performance potential. That's why my other system, with 2 channel DDR4 Gskill b die 4200 kit and 5.0GHz overclock with 9600KF only develops about half of the effective bandwidth of a z820. Bottom line, close to 110GB/s in memory perf is still a good number in 2020. If you think otherwise, you are not thinking clearly.

Yeah, that's kind of the point, using an effective 8 channel memory system to maximize memory bandwidth. all potential memory bandwidth :) You've got to remember, the very subject of "performance upgrades" is the central theme of this build :)


The others are also correct about TDP and multipliers. Turbo boost is not indefinite.
lol You think turbo comparisons between Ivy Bridge and sandy bridge are trustworthy? Let me ask you this. What CPU had a configurable TDP? Sandy bridge? Ivy bridge? Both?

Did you not read the twenty posts where I talked about a configurable TDP on ivy bridge and a TEMPORARY boost above stock TDP?

As I said earlier, I stand behind the 100GB/s read performance as still being competitive to this very day.

The latency equivalent is similar to some of the Ryzen chips.
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
The others are also correct about TDP and multipliers. Turbo boost is not indefinite.
Again, if you are correct about a TDP limitation on some of these lower wattage Ivy Bridge chips, that should be easily evidenced from benchmarking an identical battery of tests on both CPUs to define performance potential between the high voltage one and the low voltage one. But yet no disparity in benchmark results can be established.

You know whats odd? The 2673 v2 processor performs identically to it's retail cousin at a lower voltage. I have the benchmarks to prove it.
 

gabe-a

New Member
Sep 10, 2020
26
6
3
Very impressive! Agreed that 107GB/s memory bandwidth is impressive even in 2020. Maybe you can find an OEM version of the Epyc Rome (there are some out there!) that post even better numbers than a dual epyc is currently supposed to (820GB/s for a single dual-socket board out-of-the-box):

There are a total of eight DDR4 memory controllers on this hub chip, the same number in total that were on the Naples complex; both support one DIMM per channel and have two channels per controller, but Rome memory runs slightly faster – 3.2 GHz versus 2.67 GHz – and therefore with all memory slots filled, yields a maximum of 410 GB/sec of peak memory bandwidth per socket. That’s 45 percent higher than the Cascade Lake Xeon SP processor, which has six memory controllers for a total of 282 GB/sec of memory bandwidth running at 2.93 GHz
 

gabe-a

New Member
Sep 10, 2020
26
6
3
Just a quick comment about TDPs and the like, since I did a lot of tinkering with undervolting way back in the skylake era (oh wait, we're still in the skylake era! :) ).

If all else is equal in a processor, one with lower TDP may perform less well under intense load (I'm talking HPC, prolonged, load -- maybe longer than short consumer benchmarks), if the load causes it to exceed the rated TDP longer term. The benchmarks you ran may not have sufficiently stressed out the CPU for any differences to become visible, if it indeed is just a TDP-capped version of the exact same CPU as its consumer counterpart.

But here's where things get interesting. I prefaced my earlier statement with the qualifier "if all else is equal." This is often not the case, as CPUs can be configured with different voltages per clock. Let's consider the corrollary with 2 chips of identical TDP, but lower stock voltages per clock, all else equal. In this case, the CPU with the lower voltages would be expected to outperform the one with higher voltages on workloads that push them to their max TDP, because power dissipated is a function of voltage applied! So what if the processor had both lower voltage and lower TDP? Then it gets hard to say without having the exact numbers. In theory, there is a voltage "break-even point" where if you lower the voltage just enough to compensate for the decreased TDP, it would perform exactly identical to a higher-voltage, higher-TDP CPU of the same clocks under all loads (this could be calculated, but I am too lazy and/or ignorant of the specs), and also in theory, you may have found this lower-voltage, lower-TDP CPU. In theory, though.... just theoretically.... you may have found a processor under that tipping point where it can actually outperform the higher TDP CPU in certain workloads, specifically those that would push the higher TDP processor to throttle. How's that for mind-bending? ;)

In reality I suspect you have a CPU that is either just TDP capped (and all else truly is equal) but you haven't really pushed it to do the high-AVX in-cache linear algebra operations needed to induce throttling, leading to the appearance that they are equal (but really only equal in the workloads tested by these benchmarks); or you have a CPU that is indeed somewhat lower voltage, helping to mitigate the difference in TDP.

I am absolutely stunned you went to China for this CPU by the way. Very impressed.

One final comment -- I really love my HP Z8 workstation's cooling, as well as my Colfax Intl Knights Landing watercooled tower. Absolutely fantastic beasts that caused me to believe, quite wrongly and to my bitter misfortune, that I could trust a third-party company to build me a dual Epyc Rome 7742 tower that had similar high-performance cooling properties. Unfortunately my machine was kiddie-grade, absolutely horrifically built nightmare where every component would take turns overheating under moderate HPC load. So I do appreciate an industrial-grade cooling system built to-spec!
 
Last edited:
  • Like
Reactions: Storm-Chaser

MBastian

Active Member
Jul 17, 2016
205
59
28
Düsseldorf, Germany
Well, no doubt it is a nice system. Personally, since I have no super special requirements at home, I plan on sticking with my pimped out dual 2690v2s system with 256GB RAM for the next few years. Or at least until something at least 2.5 times as fast(single and multi-core) is available for halfway reasonable prices. If I want to play with the big guns, I have them at work.

Don't take this the wrong way but while reading this thread I cant help wondering if you are really interested in honest comments and critique or you just want to have confirmation.
 

Whaaat

Active Member
Jan 31, 2020
315
166
43
Did you not read the twenty posts where I talked about a configurable TDP on ivy bridge and a TEMPORARY boost above stock TDP?
Configurable TDP is about mobile CPUs, so that's definitely not your case:
TDP_conf.PNG
Temporary exceeding TPD value by means of PL1 and PL2 was available in Sandy too though.
Yeah, that's kind of the point, using an effective 8 channel memory system to maximize memory bandwidth. all potential memory bandwidth :) You've got to remember, the very subject of "performance upgrades" is the central theme of this build
Simply adding memory bandwidth of two sockets tells very little about the "performance" of the resulting build. But you can always measure actual latency and bandwidth for situation when CPU from one socket uses data in memory belonging to the CPU sitting in other socket. Feel free to share with us measured latency and loaded memory bandwidth values.
You know whats odd? The 2673 v2 processor performs identically to it's retail cousin at a lower voltage. I have the benchmarks to prove it.
I don't see anything odd in this. I launched Passmark CPU benchmark and found that my 2687w peak consumption is only 112w during
benchmark run. Does it mean that 115w CPU perform similar to 150w CPU in real-world loads if they both have identical multi-core turbo freq? I often use FDTD and FEM solvers on my computer, and they are pushing power consumption to the TDP limit and would be happy to pull even more out of the CPU. Synthetic equivalent of such load is Linx, and I'm actually very curious to see the behavior of 2673v2 under Linx load once PL1 turbo timer is elapsed.
 

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,091
1,507
113
Don't take this the wrong way but while reading this thread I cant help wondering if you are really interested in honest comments and critique or you just want to have confirmation.
You're pretty much spot on and even the above post shows that. Their previous thread indicated exactly that and was a real gem of responses: https://forums.servethehome.com/ind...ge-c6145-x2-poweredge-c4130-good-combo.28264/

The OP will just become hostile if you post anything that doesn't align with their views, even if factual, since it doesn't validate their unchangeable position.
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
You guys forgetting about real world benchmarking proving my point? LOL

1600189865070.png

VERSUS THE OEM 2673:

1600189921593.png

Yes, you read that correctly. The 2673 v2 actually scores HIGHER than the 2667 v2 in some areas including overall rank at userbenchmark.com
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
Reported, ignored, blissful silence. Don't feed the troll(s).
You gotta remember why I am being skeptical. Your logic is akin to someone basically telling me the AMD Phenom II 140W is potentially faster and better pick than it's later, 125 w revision. Think about that for a second. Its laughable.

And of course, since I liked the E5-2673 v2 so much, I had to go out and order two of them :)

1600206333237.png
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
Reported, ignored, blissful silence. Don't feed the troll(s).
Blissful silence is what you are doing by ignoring real benchmark results.

Definitely NOT trolling. I actually own two E5-2673 v2 as well as a two E5-2696 v2 processors.

Lets review some results from passmark as well:
1600283211567.png

1600283299579.png

Nearly identical. Because they are both the same processor, except the 2673 v2 is more voltage efficient.
 

gabe-a

New Member
Sep 10, 2020
26
6
3
Hi there again,

As a preamble, I was the guy who only voiced being impressed, and also kudos for having traveled to China for your passion, which I also support!

I'm actually curious about this processor (remember my post about the voltage vs clocks?). Is there any way you can report the voltages? Or is the CPU so rare it isn't supported by reporting tools? Any chance I can get you to try Linux and dump kernel-level CPU debug info, where you may be able to get voltages if supported by some kernel variant (maybe Clear Linux, the intel distro)?

I'm a fan of the ivy bridge generation, as my friend actually built a number of dual xeon E5-2400 v2 (10 cores each, 3.2GHz max turbo) for next to nothing. Apparently ivy bridge was the last generation to support instruction throughput of 0.5 per clock of the 16-way shuffles (pshufb assembly instruction), absolutely amazing for in-register lookups, translation, all sorts of bioinformatics. I guess haswell and later chips had to sacrifice something for avx2 (integer and byte-extended avx, which is also quite amazing).
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
Hi there again,

As a preamble, I was the guy who only voiced being impressed, and also kudos for having traveled to China for your passion, which I also support!

I'm actually curious about this processor (remember my post about the voltage vs clocks?). Is there any way you can report the voltages? Or is the CPU so rare it isn't supported by reporting tools? Any chance I can get you to try Linux and dump kernel-level CPU debug info, where you may be able to get voltages if supported by some kernel variant (maybe Clear Linux, the intel distro)?

I'm a fan of the ivy bridge generation, as my friend actually built a number of dual xeon E5-2400 v2 (10 cores each, 3.2GHz max turbo) for next to nothing. Apparently ivy bridge was the last generation to support instruction throughput of 0.5 per clock of the 16-way shuffles (pshufb assembly instruction), absolutely amazing for in-register lookups, translation, all sorts of bioinformatics. I guess haswell and later chips had to sacrifice something for avx2 (integer and byte-extended avx, which is also quite amazing).
Sure, these are the specs on the 2673 v2 within AIDA64. Don't have anything real time in terms of voltage #s.
1600293065813.png

This is a CPUz screenshot, though the voltage is not static and jumps around quite a bit, so it's not an accurate depiction of average vcore.

1600293215925.png

As you can see here, HWInfo64 also misidentifies my 2673 v2 as the retail 2667 v2 variant. This is a good way to real time monitor core speeds, and if you load up the CPU it will hold a 3.6GHz all core turbo, just like the 2667 v2 and the 2687w v2. Hence the benchmarking between all three of these processors is very similar.

1600293437478.png

Keep in mind this result was achieved with my dual socket 2673 v2 rig
1600293920988.png
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
Just a quick comment about TDPs and the like, since I did a lot of tinkering with undervolting way back in the skylake era (oh wait, we're still in the skylake era! :) ).

If all else is equal in a processor, one with lower TDP may perform less well under intense load (I'm talking HPC, prolonged, load -- maybe longer than short consumer benchmarks), if the load causes it to exceed the rated TDP longer term. The benchmarks you ran may not have sufficiently stressed out the CPU for any differences to become visible, if it indeed is just a TDP-capped version of the exact same CPU as its consumer counterpart.
I've had lots of time to collect results with the 2673 v2 under load and it never drops below 3.6GHz all core turbo speed, even after sustained torture tests.

This indicates it is able to achieve a 3.6GHz all core turbo with 110 watts or less. If it was limited by TDP, you would see the all core turbo multiplier eventually drop when it reached the stock over boost time limit, except that doesn't happen. As I said, benchmark results are identical because both processors are identical in terms of clock speeds and turbo configuration.

The reality is that the 2673 v2 is a more voltage efficient version of the 2667 v2. It's the better choice.
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
This is my CPU mini benchmark result with the twelve core E5-2696 v2 chips in my other z820... Keep in mind this is OEM and once again, one of the best choices in the 2600 series family for a balance of speed, performance and energy consumption.

1600299596188.png
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
Don't take this the wrong way but while reading this thread I cant help wondering if you are really interested in honest comments and critique or you just want to have confirmation.
I think you are forgetting this is my build :)

Well yeah, of course I would be if the comments were accurate...

Of course I am interested in honest comments. However, there is no honesty in claiming the E5-2667 v2 is faster than the oem 2673 v2 because of TDP limit. This is totally bogus information on ivy bridge and the benchmarks prove this.

Also people saying they've had difficulties with the cooling system on the z820 either had it running in a wood shop for years on end or they are getting it confused with another, cheaper system. I have it clamping down no problem on a combined 240 watts - that's 24 cores and 48 threads at 3.1GHz and it doesn't break a sweat.
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
Apparently, I've got another 1st over at HWBOT. Seems like dual E5-2696 v2s are pretty rare as well...

1600357957474.png

Another example of OEM being preferable to retail. In this case, the E5-2696 v2 actually has a higher all core turbo of 3.1GHz, 100MHz more than Intel's flagship, the E5-2697 v2
 
Last edited: