Finally: Overclocking EPYC Rome ES

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Nabladel

Member
Jan 27, 2017
38
5
8
37
Right, everything was on auto since I just updated the bios. No thats currently the fastest ram I have :(
 

Epyc

Member
May 1, 2020
56
8
8
Hey guys, I fixed my dram issues with a remount of the two processors, from there it went very easy.
Have only got ram sticks that are 2933 or 3200, but the IF just diddnt want to play ball all the time.
Now I have got the IF fabric very rock solid on a stable 1:1 1200mhz.:cool:

Gonna enjoy this for the day before trying to push even further.
I enjoyed it for a couple of minutes and tried to push further.
2666 seems not possible, keep the x4 link on 10,xxx Gbps because going to 13 makes it impossible to even keep 1200 IF.

For stability CLD0_VDDP voltage was really important increasing it from 700mv to 900mv.
Also gave it more SOC voltage. But everything remains very cool under load.
Added images as guide
 

Attachments

Last edited:
  • Like
Reactions: efschu3

efschu3

Active Member
Mar 11, 2019
160
61
28
This is VEGA to VEGA GPGPU Link or what? Or is the interconnect between the CPUs also called xgmi?
 

Epyc

Member
May 1, 2020
56
8
8
I believe its the socket to socket communication bus.
Getting the bus to higher speeds improves performance and reduces latency but introduces instability pretty quick
 
Last edited:

Epyc

Member
May 1, 2020
56
8
8
IMG_20200517_201044_resize_16.jpg
After a full day of tinkering this is my final and best result for memory latency, it sacrifices a bit of max bandwidth. Cinebench though does not seem to care about anything memory.
Maybe gonna do some tighter memory timings but for now fully stable and it's on a render job.
For references, default I had a memory latency of around ~150ns
 

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
View attachment 14138
After a full day of tinkering this is my final and best result for memory latency, it sacrifices a bit of max bandwidth. Cinebench though does not seem to care about anything memory.
Maybe gonna do some tighter memory timings but for now fully stable and it's on a render job.
For references, default I had a memory latency of around ~150ns
Rome should be able to get upto 204GB/sec per socket, and that's 65-85BG/sec for two sockets?

That's barely better than single Xeon V1 bandwidth.

I know you're only using 2400, not 3200, but if the IF clock issue can't be sorted out, and it's really going to be this bandwidth starved, it looks like I might be selling my dual Epyc ES just as soon as I'm finished building it (because that's a lot of cores to feed with such little bandwidth, and real world performance will probably be awful).
 

Epyc

Member
May 1, 2020
56
8
8
Rome should be able to get upto 204GB/sec per socket, and that's 65-85BG/sec for two sockets?

That's barely better than single Xeon V1 bandwidth.

I know you're only using 2400, not 3200, but if the IF clock issue can't be sorted out, and it's really going to be this bandwidth starved, it looks like I might be selling my dual Epyc ES just as soon as I'm finished building it (because that's a lot of cores to feed with such little bandwidth, and real world performance will probably be awful).
Well as I said, I sacrificed bandwidth for latency optimisation. I'm running with 4 numa nodes and minimal interleaving.
With most settings at default and memory at 3200 I already reached above 120gb/s bandwidth.
Also have you ever run this cache and memory bandwidth tool? It's not really forgiving and thinking you would even come close to the theoretical bandwidth is crazy.
Although I realise from the beginning this supermicro board is just complete garbage. It's just utter trash, it can't do shit, it runs like crap. Really like a 25 year old board, cpus are cool. Should have gone single socket with a real mobo. :(:rolleyes:
 

Epyc

Member
May 1, 2020
56
8
8
Rome should be able to get upto 204GB/sec per socket, and that's 65-85BG/sec for two sockets?

That's barely better than single Xeon V1 bandwidth.

I know you're only using 2400, not 3200, but if the IF clock issue can't be sorted out, and it's really going to be this bandwidth starved, it looks like I might be selling my dual Epyc ES just as soon as I'm finished building it (because that's a lot of cores to feed with such little bandwidth, and real world performance will probably be awful).
IMG_20200501_112848_resize_38.jpg
When scrolling past my previous results maybe If is less important then I thought.
This is at 2933 and my ram can easily do 3200 altough I don't have a screenshot of that.
 

c3l3x

New Member
May 1, 2020
29
8
3
Well as I said, I sacrificed bandwidth for latency optimisation. I'm running with 4 numa nodes and minimal interleaving.
With most settings at default and memory at 3200 I already reached above 120gb/s bandwidth.
Also have you ever run this cache and memory bandwidth tool? It's not really forgiving and thinking you would even come close to the theoretical bandwidth is crazy.
Although I realise from the beginning this supermicro board is just complete garbage. It's just utter trash, it can't do shit, it runs like crap. Really like a 25 year old board, cpus are cool. Should have gone single socket with a real mobo. :(:rolleyes:
I just ordered a Supermicro H11SSL. I hope that one is decent. I'll be using it for CPU rendering as well, but I'm slowly moving to GPU rendering (e.g. Arnold). Curious to know if there's a real world impact. For example, if Cinebench doesn't seem to care then maybe C4D renderings wouldn't be affected?

The CPU I have is a -04, I wonder if it will be as likely to have this issue?
 

Epyc

Member
May 1, 2020
56
8
8
Rome should be able to get upto 204GB/sec per socket, and that's 65-85BG/sec for two sockets?

That's barely better than single Xeon V1 bandwidth.

I know you're only using 2400, not 3200, but if the IF clock issue can't be sorted out, and it's really going to be this bandwidth starved, it looks like I might be selling my dual Epyc ES just as soon as I'm finished building it (because that's a lot of cores to feed with such little bandwidth, and real world performance will probably be awful).
BTW you really got me thinking and comparing results. With IF clock seperate the memory latency is up but cache is low latency. When running 1:1 the memclock is way down, mem latency is down as expected but cache latency is way up. Took a lot of tweaking to get it somewhat down again. Could it be that cache runs at memclock and IF is really only the interconnect fabric and not the fabric on die? Otherwise cache would have to behave different o_O
 

Jon

Member
Feb 28, 2016
77
18
8
42
Probably need to play with my timing and add 2 ram sticks but for those taking note on fabric speed

1589760831290.png

1589760871499.png

1589761107183.png
 

Epyc

Member
May 1, 2020
56
8
8
Probably need to play with my timing and add 2 ram sticks but for those taking note on fabric speed

View attachment 14141

View attachment 14142

View attachment 14143
That's a retail trx sample dude, not really comparable to a epyc engineering sample.
I can also show you my 3960x running 3400mhz CL 14-14-14-35 non ecc memory and it wipes the floor with everything. But that's something completely different than a dual socket server with ecc and handicapped bios
 

Jon

Member
Feb 28, 2016
77
18
8
42
No this is a epyc es chip not trx. But clocked higher. Does show the IF clock speed will influence cache speeds

AMD EPYC 64 Core ROME 7742 100-100000053-04 2.00GHz(3.2GHz Turbo) 225W Processor

on a SuperMicro H11SSL-i
 

Epyc

Member
May 1, 2020
56
8
8
No this is a epyc es chip not trx. But clocked higher
Why does it identify as castle peak?
And sry for the comment before, getting a bit late and my eyes are beginning to get a bit blurredo_O
Then the question remains how did you do it? Maybe single soc is also more foregiving
 
Last edited: