Understanding modern CPU performance trends and questions

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Bert

Well-Known Member
Mar 31, 2018
859
408
63
45
I came to the realization of 200 watt 2679 v4 and 300watt 2699p v4 CPUs. Existence of these CPUs how much extra overhead Intel has left in these CPUs as most customers prefer high efficiency. It is sad that they are all locked.

I think these CPUs won't run on a typical server board but I assume a workstation motherboard like Z10PE-D can run them, right?
 

Bert

Well-Known Member
Mar 31, 2018
859
408
63
45
I think @RolloZ170 may know a trick or two to override the default power limits. I still think Z10PE can do it; those boards are heavily beefed up.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,426
1,640
113
I think it unlikely that the VRMs can push 300W or even 200W.
some can. but there may be a TDC barrier to skip like on the 1st & 2nd gen scalables OEM high TDC SKUs.
BIOS microcode would also need to support the OEM chips.
microcodes of HEDT was always same. Broadwell core i7 cpuid 406F1
I think @RolloZ170 may know a trick or two to override the default power limits. I still think Z10PE can do it; those boards are heavily beefed up.
there are still some OEM Broadwell not running because of TDP 165W and or TDC=199Amps on some boards.
 
  • Like
Reactions: Sean Ho

bayleyw

Active Member
Jan 8, 2014
306
102
43
1. Intel still focuses on increasing clock rates and IPC. I am not able to keep up but AFAIK, Intel is still the winner at single thread performance. I see more than 20% lead of Intel in some benchmarks. Intel is still heavily investing on that front, golden coves are hitting to 6GHz.

2. If I am not mistaken, Intel only has 2 compute dies in Sapphire Rapids; I am not sure why they are still holding onto the larger dies given that most applications are not getting benefit out of it. I cannot believe that Intel is still behind on the technology of packaging multiple dies after so many years. Now Sierra Forest is going to offer 288 cores, I assume that would be 8 compute dies but why so late?

3. Why is single large package preferred over a multi socket solution? What is the big advantage of 128 core AMD Epyc server with 12 memory channels over 2x 56 core Sapphire Rapids with a total of 16 channels of memory.

4. IIUC, there is so much computing available in a modern CPU. It is all locked in behind power delivery and cooling. With the right cooling and power delivery, more computer power can be extracted from modern processors. If I can power and cool it, a modern CPU can deliver 50% more computing power.

5. Why do we need higher voltage for higher clocks? Modern CPUs are heavily undervolted to run cool and they are stable, let's say at 2.5Ghz at 0.8 volt. Why do I need to up the voltage to 1.2 or 1.3 to hit 4.5Ghz? IIUC, this requires 225% more power and cooling that package becomes nearly impossible.

6. Why don't we get multi-socket CPUs which can run at higher power envelopes as long as there is enough power and cooling? I still want to see systems similar to EVGA SR-2, a large workstation board with very high power delivery when there is demand. Unlike server workloads, workstation workloads are spiky and efficiency is much less important. A dual socket system is much easier to cool and power than a huge single socket.
1. Intel and AMD are about the same. Actual performance is going to depend on implementation and firmware settings; Zen 4 and and Golden Cove have comparable IPC and the 7950X and 13900K have comparable 1T turbo speeds.

2. Design choices. No one expected Rome to work as well as it does, but here we are. The highly disaggregated approach AMD uses let's them build datacenter parts on less mature nodes, but consumes a huge amount of power driving the on-package fabrics which prevents them from scaling to low-power operating points.

3. As you said, the 2S system has more bandwidth per core. CFD and HPC people will intentionally select 2S systems with lower core counts to save money and maximize bandwidth. For general purpose applications there's no difference.

4. Sure, but is 1.5x the compute per socket worth 5x the power per socket?

5. At higher clock speeds, transistors have less time to switch, so upping the operating voltage helps circuitry distinguish between a '0' and a '1' more clearly.

6. SR-2's were only 200W per socket overclocked, and it made a lot more sense to build a 12 core workstation out of 2 6 core processors each pulling 200W than a 112 core workstation out of 2 56 core processors each pulling 1500W. As in (3), you could make an argument for a 2S low core count system to free up power budget and increase per-core bandwidth, but you would be paying a lot in money and power for a marginal improvement in performance.
 
  • Like
Reactions: Bert

Bert

Well-Known Member
Mar 31, 2018
859
408
63
45
I just read that no more dual socket workstations. Lenovo PX is the last one and following article confirms that. I don't see any dual socket sapphire rapid workstations from Dell or HP.

 

Bert

Well-Known Member
Mar 31, 2018
859
408
63
45
1. Intel and AMD are about the same. Actual performance is going to depend on implementation and firmware settings; Zen 4 and and Golden Cove have comparable IPC and the 7950X and 13900K have comparable 1T turbo speeds.

2. Design choices. No one expected Rome to work as well as it does, but here we are. The highly disaggregated approach AMD uses let's them build datacenter parts on less mature nodes, but consumes a huge amount of power driving the on-package fabrics which prevents them from scaling to low-power operating points.

3. As you said, the 2S system has more bandwidth per core. CFD and HPC people will intentionally select 2S systems with lower core counts to save money and maximize bandwidth. For general purpose applications there's no difference.

4. Sure, but is 1.5x the compute per socket worth 5x the power per socket?

5. At higher clock speeds, transistors have less time to switch, so upping the operating voltage helps circuitry distinguish between a '0' and a '1' more clearly.

6. SR-2's were only 200W per socket overclocked, and it made a lot more sense to build a 12 core workstation out of 2 6 core processors each pulling 200W than a 112 core workstation out of 2 56 core processors each pulling 1500W. As in (3), you could make an argument for a 2S low core count system to free up power budget and increase per-core bandwidth, but you would be paying a lot in money and power for a marginal improvement in performance.

1- Yes read a lot of reviews, it seems like, Intel has an edge of 10%-20% on single thread. IIUC, intel has more advanced power management and OC/Turbo capabilities as well.

2. I also read more about that I see more evidence of complex applications that are not executing on fully partitionable workloads such as video processing are doing much better on Intel. Overall, there is a cost of multi dies but does not show up on Cinebench or artificial benchmarks.

3.Intel had frequency optimized SKUs for Broadwell (ie 2687wv4) where single threaded performance is better as CPU power is spent on improved single thread performance. Having dual socket gives you to the option to take advantage of these CPUs without sacrificing too much on the core count and also extra memory bandwidth.

4. Yes for me. When I ran my compiler, I don't mind spending 5x power, it is a short burst, blocking my time and better to wait 20 seconds instead of 30 seconds; it is a power I need for quick bursts.

5. Yes thank you! Now it all makes sense. Can we consider transistor like capacitors and have a similar voltage/discharge curve?

6. I had that board, it was so easy to keep the chips cool, run them at max but still keep them under 60c. Now, I am running my 10980xe at 110C under load because I cannot keep the temps under control. More than bandwidth and power delivery, cooling is the problem. AFAIS, memory bandwidth and power delivery is increasing and at exponential rates, but cooling is not.

EVGA SR-2 was much cheaper board with lifetime warranty. I think MSRP was around $500. Today, board prices are crazy.
 
Last edited:

Tech Junky

Active Member
Oct 26, 2023
370
124
43
10980xe at 110C under load
Sounds like an air gap or not a big enough cooler with 6+ pipes. Then again the 10/11th gen models are more of a placeholder in the grand scheme of things. With a 12700k and 6 pipes cooler I could put 100% load and keep it under 65c. Even with my current blazing hot 7900x AMD I can idle at 40 but load takes it to 95 but they're designed to run hot and back off until the temp drops.
 

Bert

Well-Known Member
Mar 31, 2018
859
408
63
45
Sounds like an air gap or not a big enough cooler with 6+ pipes. Then again the 10/11th gen models are more of a placeholder in the grand scheme of things. With a 12700k and 6 pipes cooler I could put 100% load and keep it under 65c. Even with my current blazing hot 7900x AMD I can idle at 40 but load takes it to 95 but they're designed to run hot and back off until the temp drops.
Based on what I read, this seems reasonable temps when running all cores at 4.6gHz. 2 cores run at 4.8GHz and they are the one usually hit the temp limits. Not sure about your target clocks for 7900x but that's a much efficient gen CPU running with 12 cores, it is expected to run cooler and higher.
It is possible though, improving the temps a little bit better application of paste and smoothing the surfaces. I doubt that that will make more than 5C difference.
 

Tech Junky

Active Member
Oct 26, 2023
370
124
43
Not sure about your target clocks for 7900x but that's a much efficient
I don't muck with it but, it tops out around 5ghz. It idles higher than the Intel setup I was running though in terms of speed. I added an A380 GPU though for transcoding and cut the CPU down to ~10% with qsv transcodes vs 75-100% with CPU only. The time also dropped to 1/8th of the time required by the CPU only.
 
  • Like
Reactions: Bert