16 Core 32 Thread HP z820 dual socket workstation build/performance upgrade

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
First of all thanks for the discussion, I mean no disrespect at all. I enjoy the technical details we have both offered up here. I hope we can keep it respectful and just focus on the questions at hand, and I will do my part to facilitate that. This is interesting reading, sorry if i come across offensive, It's difficult to read voice inflection in situations like this, but I mean well. And will eat my lumps if I have to. I will respond to your other questions later, extensively, going for a hot date right now.
 
  • Like
Reactions: Marsh

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
LOL, that's because CPU-Z which you are using for benchmarking is unable to make more than 70W load on 2650v2. And I'm talking about Linx that pulls about 110W from 2650v2 (at 3GHz!) during first 10 seconds, then 'Power Limit 1' comes into play and CPU is not permitted to consume more than TDP hence dropping its frequency below 2.9GHz.
So in your case it appears your PL2 has a time restriction of 10 seconds, no? On my rig, an HP z820, the PL2 value is unlimited, meaning it never has to throttle back at all, and never ever returns to base, non turbo speeds (and this is with a 2650 exactly like yours.. But I cant say for sure that I am correct because I have not run Linx yet, as both of my z820 rigs are at my parents home in update NY (running a 5.1Ghz 9600KF in place) Can you open AIDA64 on your 2650 v2 and snip the same details I did with my 2673 v2, impossible to find these numbers online without the far reaching aspects of AIDA64/HWInfo64. That would really help us get a comparative handle on exacting turbo specifications.

I understand your point here. CPU-z is a very short term benchmark, not suitable to measure how turbo operation performs for extended periods under heavy load. Really only good for measuring very short data bursts and fun for benchmarking competitions. That being said, PL2 is unlimited here (contrary to what you are saying, at least in my case, because, as you can see in the screenshot, all the cores are pretty much pegged at the advertised base turbo speed. It never once dropped down to below this number. And you can know for certain I am not bluffing because if this screenshot was taken at idle, you would see 3 or 4 of the threads floating up around the top end of the turbo multipliers. I've observed this many times before. You know the CPU is pegged when you have all cores dropping down to base turbo multi. This was done with a 30 minute sustained AIDA64 torture test if memory serves me correctly.
 

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
LOL. PL2=forever is a feature of motherboard vendors that doesn't grant Intel recommendations, not a particular CPU feature. And if your motherboard set tau=unlimited value automatically or allows you to set it manually doesn't mean this CPU will behave similarly in other motherboards such as Supermicro or Dell.
Intel most definitely created the PL1 and PL2 values (it's part of their turbo 2.0 package) and the entire framework surrounding turbo boost 2.0, which it is directly linked to, so yes it is a feature set of the CPU, but Intel does grant vendors the agility to modify these values. Intel merely allows OEM partners to change these parameters to better suite work loads, specific hardware configurations or environmental challenges. Think of this comparison. Turbo core overclocking. Sure, the end user (aka vendor) can go in there and change the multipliers to suite their needs but the turbo core boosting systems were still designed, created and implemented by Intel, only later are they taylored fit
This time you are right. That is because I don't use Hyper Threading, most useless performance-wise feature, but, as you correctly noticed, +25W to power consumption.
I totally agree 100% That's why I bought a 9600KF hexacore. No need for hyperthreading.
 

Whaaat

Active Member
Jan 31, 2020
315
166
43
So in your case it appears your PL2 has a time restriction of 10 seconds, no?
No, PL1=10sec, PL2=7.81ms -> means useless
Can you open AIDA64 on your 2650 v2 and snip the same details I did with my 2673 v2, impossible to find these numbers online without the far reaching aspects of AIDA64/HWInfo64.
Here is what you ask for. Although PL1=40sec for 2650v2 it behaves exactly the same as 2670 that has PL1=10sec. 10sec of over-speeding above TDP under Linx, then reduction of frequency to fit TDP. Not sure why. Didn't have time to find out, but I didn't expect much from this CPU anyway. Only v4 never reach TDP under heavy load. But they have different multipliers for AVX, hence they don't even try to consume a lot.
2670-id.PNG
1 square duration is 10 sec
2670-power.PNG
2687w-id.PNG
PL1=128 sec is set in BIOS of Supermicro MB (doesn't allow to set higher value), this is NOT default Intel value
2650v2-id.PNG
2650v2-power.PNG
2680v4.PNG
2687wv4-id.PNG
 
  • Like
Reactions: Storm-Chaser

Whaaat

Active Member
Jan 31, 2020
315
166
43
I just found that using QuickCPU utility I can set values PL1 and PL2 whatever I want. Guess what changes happen apart from Aida64 screen. Exactly nothing shifts in the behavior of CPU. It is well within original Intel specs.
5 sec in one square
2650v2-power2.PNG
 
  • Like
Reactions: Storm-Chaser

Storm-Chaser

Twin Turbo
Apr 16, 2020
151
25
28
Upstate NY
LOL, not Linux

No, this is simply not power-hungry load type for a CPU. High TDP versions of server CPUs were specially brewed to maintain multipliers under real power hungry computational AVX loads, while some OEM versions were designed to work under light load (Amazon, Facebook servers) but with high multipliers.
Not really a theory I would buy into. Maybe not like AVX instructions but its still going to push the CPU pretty hard. You shouldn't be measuring these chips to that standard anyway, it's a rarely used platform. There are plenty of retail CPUs with lower wattage and less multis than some if the OEMs as well. There is no correlation there, as far as I can tell, considering Intel also offers lower voltage chips as well, which are actually suited for the workloads you are referring to. As for "specially brewed" processors, which ones are you referring to from the 2600 series family here? Little dubious about the "lite" load CPUs as well. Remember, the important number here is P2 power state and duration, not base TDP. Because all processors from this family have a max power limitation, including the flagship, high performance chips. So you push one of these chips hard enough it will throttle back if you exceed the p2 limit. We also need to find the P2 limit on the 2687W v2, very curious how it will measure up to the 2673. Because in nearly every benchmark I've looked into, they are both nearly dead even. It just speaks again to the point that Intel knows what they are doing when they set turbo parameters and

Intel rates their processors using highly complex workloads that will push the CPU to its limits, in terms of workload complexity. Sure, there are processors out there geared more towards per core performance as opposed to multi-core power, but it's nothing new and you must also consider this identical scenario plays out on retail chips as well.

Again you seem to have a total lack of regard for actual benchmarking tools to measure CPU performance. It's very easy to hop on any number of long duration or short duration benchmarks and compare OEM vs retail, they are identical performers at the same clock speed, its super obvious. This is because Intel's server chips are all pretty stout and perform very well, and they have implemented an appropriate "max" TDP that will allow the processor to run nearly all work loads without dropping down to base speed. So you can't really say the 2678w is the clear choice because a) it performs identically to the 2673 on nearly every single benchmark I can find on the internet and b) many people could do without cooling a 150 watt CPU considering you can get a 110 watt variant that uses less voltage, runs cooler and performs identically.

Keep in mind these chips are using identical, retail equivalent cores so I don't know how you are arriving at the conclusion that OEM CPUs are somehow weaker than their retail siblings. OEM basically just means you get the processor without a fancy box. Obviously a server OEM CPU has slightly different requirements but the general idea remains the same. They are typically purchased in bulk. Gotta remember, a higher TDP number only means the processor is using more voltage at the same clock speed. And one thing is for certain, core voltage does not effect performance in any way whatsoever (obviously within the stability range), provided you still have head room with the P2 power mode, NOT rated TDP. Because you were saying in a number of posts that these OEM processors will throttle back once they hit max TDP, but thats not true. It's the p2 power state that is key to look at, which usually specs at a much higher wattage than stock TDP. And the 2673 also wins, even in the case of the 2667 v2, because it uses identical clocks but 2667 v2 has a higher TDP, indicating the OEM chip is more voltage efficient yet can carry the same work load. Check the benchmarks for yourself. They are nearly identical in every regard. This is because intel knows what they are doing and they certainly understand that the vast majority of workloads can be handled by these Xeons with very limited throttling.

No, PL1=10sec, PL2=7.81ms -> means useless

Here is what you ask for. Although PL1=40sec for 2650v2 it behaves exactly the same as 2670 that has PL1=10sec. 10sec of over-speeding above TDP under Linx, then reduction of frequency to fit TDP. Not sure why. Didn't have time to find out, but I didn't expect much from this CPU anyway. Only v4 never reach TDP under heavy load. But they have different multipliers for AVX, hence they don't even try to consume a lot.
View attachment 19714
1 square duration is 10 sec
View attachment 19715
View attachment 19720
PL1=128 sec is set in BIOS of Supermicro MB (doesn't allow to set higher value), this is NOT default Intel value
If for anything, great reference material for future builds. But need to locate the most crucial data sets for 2667 v2 and 2673 v2, namely P1 and P2 values. I'd also like to see the data on the 2696 v2 (OEM chip as well) and the 2697 v2, And per our discussion we can see most performance aligns with the use of the P2 power limit, since that is essentially acting as base clock if your thermals are good. This is where it gets a bit convoluted in my opinion. We know that the 2687w v2 is a 150 watt processor (base speed measurement-no turbo running at 3.4Ghz) while the base speed of the 2673 v2 is 3.3 GHz. So that's a difference of 40 watts from just 100MHZ difference in base clock.! Something must be going on here. I know their turbo characteristics are slightly different but both still have a base turbo clock of 3.6GHz and same top end 4.0GHz muli

View attachment 19723
Yep, under some 'lite' type of load those OEM versions are better, but they sucks in heavy computational job.
I'm curious though, how so will it suck at heavy computational load? The base speed numbers are 2.5 and 2.7 respectively. In other words, if you could OC a 2696 v2 to 2.7Ghz it would be roughly a 130 watt chip, so pound for pound with the 2697 v2 in terms of voltage and thermals (and not to mention both offer identical performance data). In fact as I already mentioned, the 2696 v2 has a higher all core turbo speed, thus making it a better choice since both chips use about the same voltage at a given clock speed. I guess the only other question that needs answered is whether or not the P2 value is different. So this OEM processor alone contradicts your statement that OEMs suck in heavy workload. It's because they dont. Because this one most certainly outperforms Intel's flagship CPU in heavy workloads. A core is a core. You can pump huge voltage through it or undervolt it, but regardless of voltage performance characteristics are identical (because OEM and retail cores are identical), given that you are within specified boost wattages, which is not atypical.

LOL, that's because CPU-Z which you are using for benchmarking is unable to make more than 70W load on 2650v2. And I'm talking about Linx that pulls about 110W from 2650v2 (at 3GHz!) during first 10 seconds, then 'Power Limit 1' comes into play and CPU is not permitted to consume more than TDP hence dropping its frequency below 2.9GHz.
If you are only pulling 110W with your 2650 v2 there is no reason it should be throttling down at all, to below base turbo speed of 3GHz. Since 118W is the max limit for PL2, there is no reason your CPU should not be able to hold that number.

View attachment 19724
Sure. AVX offset was introduced in Skylake, generations after your 2673v2 first saw the world. Your CPU cannot set different multipliers for different types of load, instead it can only throttle being limited by the TDP, while 2687w regardless of generation will never change frequency under AVX load
Not true. 2687w v2 also has a power limit cap, just like every other single CPU in that family. Push it hard enough, and it will throttle. Besides AVX is pretty much useless and irrelevant here since its use is scarce. Since AVX is aimed at improving parallelization, and most desktop applications are not suited for it, since contrary to popular belief parallelism is not the same as multi-threading, there are not that many applications outside specific use-case scenarios that employ AVX.

But in hindsight I would chose the 2667 v2 over the 2673 now, because they are some much cheaper. I knew this at the time, but I was still fascinated with how rare the 2673 v2 is so I had to pull the trigger and get two of them.