Finally: Overclocking EPYC Rome ES

Gordan · May 17, 2020

How do I verify whether I have one of the OC-able OEM CPUs? It's a production model, came in an anti-static bag in bubble wrap, no fancy box.

Will this work with Asrock EPYCD8-2T?

Is there a Linux tool for this?

Epyc · May 17, 2020

Gordan said:
How do I verify whether I have one of the OC-able OEM CPUs? It's a production model, came in an anti-static bag in bubble wrap, no fancy box.

Will this work with Asrock EPYCD8-2T?

Is there a Linux tool for this?

It comes by shady deals on second hand platforms

It's just a fun thing. Retail performance is so much better. Never enough oc headroom to match it

Gordan · May 17, 2020

So when it said "OEM", it meant just a different ES?

The thing is, I could do with a bump in clock speeds. I upgraded from a dual socket X5690 (all-core boost to 3.7, single core boost to 3.83), and compared to that, my 7402P (boost up to 3.35) single thread performance is virtually identical, with both CPUs non-OC-ed. That is massively disappointing for 10 years of advancements. According to Passmark, that is Core-i3 level of single performance.

Epyc · May 17, 2020

Gordan said:
So when it said "OEM", it meant just a different ES?

The thing is, I could do with a bump in clock speeds. I upgraded from a dual socket X5690 (all-core boost to 3.7, single core boost to 3.83), and compared to that, my 7402P (boost up to 3.35) single thread performance is virtually identical, with both CPUs non-OC-ed. That is massively disappointing for 10 years of advancements. According to Passmark, that is Core-i3 level of single performance.

Only some es samples are unlocked, and you need a custom bios for that mobo. But all es samples are clocked at 1,5-2ghz. Fully over clocked you might end up near retail. But there are other downsides like the If fabric that runs twice as slow and no option to upgrade the bios. If you got that nice 7402p just make sure you buy fast ddr for it. 2933 is the sweet spot because that will take you're If to it's max of 1467.
Thats a big boost should you have low clocked ddr.
And at the end of the day it's a whole lot of cores in one package, only 180w tdp. I mean trx40 all have 280watts so its only a good chip for you if you can leverage the multicore performance
Otherwise we can trade rigs, I get yours and you can tinker with 2 es samples

Or maybe buy trx40, above 4ghz all core

Epyc · May 17, 2020

c3l3x said:
I just ordered a Supermicro H11SSL. I hope that one is decent. I'll be using it for CPU rendering as well, but I'm slowly moving to GPU rendering (e.g. Arnold). Curious to know if there's a real world impact. For example, if Cinebench doesn't seem to care then maybe C4D renderings wouldn't be affected?

The CPU I have is a -04, I wonder if it will be as likely to have this issue?

Don't know to be honest.
But I do know I will never ever buy supermicro boards again. Utter disappointed and would recommend staying far away from it. Especially for single socket there are beautiful boards available all better then supermicro

Spartus · May 17, 2020

Epyc said:
Don't know to be honest.
But I do know I will never ever buy supermicro boards again. Utter disappointed and would recommend staying far away from it. Especially for single socket there are beautiful boards available all better then supermicro

I'm testing in H11SSL-i, but I need another and now I'm curious what you'd recommend given a choice?

Do the ASROCK boards have a lot more tuning exposed?

c3l3x · May 17, 2020

Epyc said:
Don't know to be honest.
But I do know I will never ever buy supermicro boards again. Utter disappointed and would recommend staying far away from it. Especially for single socket there are beautiful boards available all better then supermicro

I'm going to give it a try with the Supermicro, but I would prefer a Gigabyte board. They're just hard to find in stock. I'm open to suggestions though

Epyc · May 17, 2020

c3l3x said:
I'm going to give it a try with the Supermicro, but I would prefer a Gigabyte board. They're just hard to find in stock. I'm open to suggestions though

I personally love the gigabyte board, don't have it but just layout and feature wise. Heard a lot of good things about the Asrock and believe there is also a Asus one.

Epyc · May 17, 2020

Spartus said:
I'm testing in H11SSL-i, but I need another and now I'm curious what you'd recommend given a choice?

Do the ASROCK boards have a lot more tuning exposed?

To be honost I have no idea. I have the Supermicro because it's littery the only dual socket mobo available for seperate purchase. Maybe other people know better. Think the Asus one also has lot of following after first gen

efschu3 · May 17, 2020

At least on consumer grade AM4 platform, gigabytes UEFIs are most crap and buggy I have ever seen. If their UEFI programmers for enterprise Hardware are different, you can give it a try, if they are the same - stay realy far away from these boards.

Epyc · May 17, 2020

efschu3 said:
At least on consumer grade AM4 platform, gigabytes UEFIs are most crap and buggy I have ever seen. If their UEFI programmers for enterprise Hardware are different, you can give it a try, if they are the same - stay realy far away from these boards.

I got many am4, x399 and trx40 boards. I can tell you non come even close of being as atrociously bad as Supermicro. Not even the 80 dollar wanky budget stuff.

bayleyw · May 18, 2020

I feel like the impact of the fabric speed issue is being grossly exaggerated. AMD over-provisions their fabric bandwidth (48 bytes per FCLK cycle) so even running at a lowly 600MHz you're looking at 230GB/sec of aggregate bandwidth. That's enough to fill 8 channels of DDR4-2400 (153GB/sec) with bandwidth to spare for transferring to PCIe devices. At DDR4-3200 speeds that goes up to 308GB/sec, which fills 8 channels with over 100GB/sec to spare for other I/O.

The real effect the lower fabric clocks have is increasing inter-core latency, which affects workloads with a ton of communication and cache synchronization overhead. So far the only one I've seen is @Spartus 's CFD workload - CFD is an odd one out as far as HPC goes because it is almost entirely an interconnect and bandwidth benchmark. I'd expect in-memory transactional databases to be similarly, if not more, affected, but who the heck runs production databases on engineering samples?! That being said, it is unlikely we will see anomalously low performance - the platform is still fast, it will just be 20% slower than it could be.

I certainly haven't noticed any issues in day to day use, even gaming is serviceable and if Spartus hadn't pointed out the IF clock issue its unlikely I would have ever noticed.

As an aside, I would like to point out that grumbling about Supermicro's lack of tuning options is kind of unfair - we're well outside of the scope of supported operations on their boards now and the fact that anything works is a miracle.

Layla · May 18, 2020

bayleyw said:
I feel like the impact of the fabric speed issue is being grossly exaggerated. AMD over-provisions their fabric bandwidth (48 bytes per FCLK cycle) so even running at a lowly 600MHz you're looking at 230GB/sec of aggregate bandwidth. That's enough to fill 8 channels of DDR4-2400 (153GB/sec) with bandwidth to spare for transferring to PCIe devices. At DDR4-3200 speeds that goes up to 308GB/sec, which fills 8 channels with over 100GB/sec to spare for other I/O.

It's really late and I haven't read the architecture documents, but your math seems off by an order of magnitude...

600MHz * 48-bytes per clock = 28.8GB/sec

So either it's not 48-bytes per clock, or there's more parallelism there? Or something...

Also, for dual socket, CPU-CPU interconnect bandwidth is incredibly important. Since all accesses to memory and pci-e devices local to the other socket go through socket-to-socket IF.

Layla · May 18, 2020

Layla said:
It's really late and I haven't read the architecture documents, but your math seems off by an order of magnitude...

600MHz * 48-bytes per clock = 28.8GB/sec

So either it's not 48-bytes per clock, or there's more parallelism there? Or something...

Also, for dual socket, CPU-CPU interconnect bandwidth is incredibly important. Since all accesses to memory and pci-e devices local to the other socket go through socket-to-socket IF.

Since you're quoting 230GB/sec, I'm guessing whatever FCLK is, it's 8*48b wide?

bayleyw · May 18, 2020

Layla said:
It's really late and I haven't read the architecture documents, but your math seems off by an order of magnitude...

600MHz * 48-bytes per clock = 28.8GB/sec

So either it's not 48-bytes per clock, or there's more parallelism there? Or something...

Also, for dual socket, CPU-CPU interconnect bandwidth is incredibly important. Since all accesses to memory and pci-e devices local to the other socket go through socket-to-socket IF.

Sorry, 230GB/sec is for a 64-core package, which has 8 IF links (one per CCD). The aggregate package bandwidth is 230GB/sec for all 8 dies, and 28.8GB/sec for any single CCD (but it makes sense to assume that any single CCD will not need all eight channels of bandwidth).

The intersocket connection is 4 of the same IF links, so 115GB/sec full duplex at the crippled FCLK rates (still faster than Cascade Lake-SP's three UPI links).

Conclusion: Infinity Fabric 2 is really freaking fast, AMD did a great job on the Epyc Rome uncore.

Epyc · May 18, 2020

bayleyw said:
I feel like the impact of the fabric speed issue is being grossly exaggerated. AMD over-provisions their fabric bandwidth (48 bytes per FCLK cycle) so even running at a lowly 600MHz you're looking at 230GB/sec of aggregate bandwidth. That's enough to fill 8 channels of DDR4-2400 (153GB/sec) with bandwidth to spare for transferring to PCIe devices. At DDR4-3200 speeds that goes up to 308GB/sec, which fills 8 channels with over 100GB/sec to spare for other I/O.

The real effect the lower fabric clocks have is increasing inter-core latency, which affects workloads with a ton of communication and cache synchronization overhead. So far the only one I've seen is @Spartus 's CFD workload - CFD is an odd one out as far as HPC goes because it is almost entirely an interconnect and bandwidth benchmark. I'd expect in-memory transactional databases to be similarly, if not more, affected, but who the heck runs production databases on engineering samples?! That being said, it is unlikely we will see anomalously low performance - the platform is still fast, it will just be 20% slower than it could be.

I certainly haven't noticed any issues in day to day use, even gaming is serviceable and if Spartus hadn't pointed out the IF clock issue its unlikely I would have ever noticed.

As an aside, I would like to point out that grumbling about Supermicro's lack of tuning options is kind of unfair - we're well outside of the scope of supported operations on their boards now and the fact that anything works is a miracle.

Something strange about this IF setup that feels counter intuitive.
When i do 1:1 ratio my memory latency drops a lot, what you would expect.
But the cache latency shoots up, like 0,5- 0,9 ns on the L1 and upwards of 10 ns at the L3. This is not what i would expect, normally running 1:1 with high IF fabric link decreases latency all around.

When i do 2:1 with much lower IF fabric at 800mhz en memory at 3200mhz the memory latency shoots up as expected.
But the cache latency decreases massively again.

From that it would seem that the IF in the bios is possible only interconnect and offchip fabric and the on die fabric seems to just scale with the memory clock, or am i mistaken?

And btw i do not judge the board on the overclocking cabability's, its more a crazy stupid inefficient layout, with a dumb wire scheme (almost all on CPU1) and crappy things like needing keys to update the bios , a very poor VRM implementation and finally very limiting socket to socket link performance

Gordan · May 18, 2020

Epyc said:
Only some es samples are unlocked, and you need a custom bios for that mobo. But all es samples are clocked at 1,5-2ghz. Fully over clocked you might end up near retail. But there are other downsides like the If fabric that runs twice as slow and no option to upgrade the bios. If you got that nice 7402p just make sure you buy fast ddr for it. 2933 is the sweet spot because that will take you're If to it's max of 1467.
Thats a big boost should you have low clocked ddr.
And at the end of the day it's a whole lot of cores in one package, only 180w tdp. I mean trx40 all have 280watts so its only a good chip for you if you can leverage the multicore performance
Otherwise we can trade rigs, I get yours and you can tinker with 2 es samples
Or maybe buy trx40, above 4ghz all core

Oh, I thought 3200 was optimal for RAM. Is 2933 better for Epyc? I have 2933 DIMMs but I run them at 3200, 1T, without gear down mode. Is 2933 really going to be better (or no worse)?

7402P has cTDP of 200W, which I have set - I get close to maximum boost clocks even under full load. I have little doubt that it would handle 280W just fine with appropriate cooling, if the TDP limit could be unlocked, but it seems settings over 200W are ignored, same as the multiplier/divisor/voltage custom P-state settings are ignored.

Because I am using this machine as a triple-seat workstation (3 VMs, each with a GPU and USB controller), I figured I would probably benefit from the extra memory I/O with 8 channels compared to trx40's 4 memory channels, and the extra PCIe lanes (no trx40 board seems to have three of it's slots wired up for x16), plus I need to run with 5 cards (3 GPUs, SAS controller, USB card suitable for splitting up it's ports for PCI passthrough). So really it was Epyc or Xeon, Threadripper just wouldn't have come with what I need.

bayleyw · May 18, 2020

Gordan said:
Oh, I thought 3200 was optimal for RAM. Is 2933 better for Epyc? I have 2933 DIMMs but I run them at 3200, 1T, without gear down mode. Is 2933 really going to be better (or no worse)?

Depends on the workload, but for workstation type loads 2933 will probably be slightly snappier.

Epyc · May 18, 2020

Gordan said:
Oh, I thought 3200 was optimal for RAM. Is 2933 better for Epyc? I have 2933 DIMMs but I run them at 3200, 1T, without gear down mode. Is 2933 really going to be better (or no worse)?

7402P has cTDP of 200W, which I have set - I get close to maximum boost clocks even under full load. I have little doubt that it would handle 280W just fine with appropriate cooling, if the TDP limit could be unlocked, but it seems settings over 200W are ignored, same as the multiplier/divisor/voltage custom P-state settings are ignored.

Because I am using this machine as a triple-seat workstation (3 VMs, each with a GPU and USB controller), I figured I would probably benefit from the extra memory I/O with 8 channels compared to trx40's 4 memory channels, and the extra PCIe lanes (no trx40 board seems to have three of it's slots wired up for x16), plus I need to run with 5 cards (3 GPUs, SAS controller, USB card suitable for splitting up it's ports for PCI passthrough). So really it was Epyc or Xeon, Threadripper just wouldn't have come with what I need.

According to documentation retail epyc IF runs at 1:1 at max 2933mhz ddr. After that it goes down to 1:2. So going to 3200mhz gives more bandwidth but a latency penalty because the fabric clock speed halves. To play devil's advocate, if you have a trx with memory of 4000+mhz it's really Rome amounts of bandwidth. Mobos with triple x16 are indeed not available but the general comfort and usability is much higher I think. But in you're case epyc is probably best indeed

Gordan · May 18, 2020

Epyc said:
According to documentation retail epyc IF runs at 1:1 at max 2933mhz ddr. After that it goes down to 1:2. So going to 3200mhz gives more bandwidth but a latency penalty because the fabric clock speed halves. To play devil's advocate, if you have a trx with memory of 4000+mhz it's really Rome amounts of bandwidth. Mobos with triple x16 are indeed not available but the general comfort and usability is much higher I think. But in you're case epyc is probably best indeed

Are you saying that infinity fabric on trx is not limited to 1467MHz?
Or do you mean that I'd partly mitigate losing 50% of the memory channels by gaining 30% more memory clock speed?

I am eyeing the recently announced 7F72, but they still seem to be unavailable for purchase, and the price is double compared to a 7402P which makes it really poor value.

Finally: Overclocking EPYC Rome ES

Member

Member

Member

Member

Member

Active Member

New Member

Member

Member

Active Member

Member

Active Member

Game Engine Developer

Game Engine Developer

Active Member

Member

Member

Active Member

Member

Member