Finally: Overclocking EPYC Rome ES

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

ExecutableFix

Active Member
Nov 25, 2019
123
64
28
hello @ExecutableFix
I have a very important question:
Is there any way to avoid the overclocking behavior of the tool? Such as by setting the BIOS to reject the function, or Ask the motherboard manufacturer to update the BIOS to solve the problem?
Because I have a lot servers running of ROME 32-core ES(100-000000054-04_32/24_N) CPUs with supermicro h11dsi rev2.0, I need to avoid someone damaging my cpus.
So unless people have direct access to the OS (so like a dedicated server without hypervisor) it can't execute SMU commands. So it's impossible to execute this software from a virtual machine.

But otherwise I don't think there's an option to disable it if users have direct access to the machine.
 

kaixin

New Member
Apr 24, 2020
4
0
1
So unless people have direct access to the OS (so like a dedicated server without hypervisor) it can't execute SMU commands. So it's impossible to execute this software from a virtual machine.

But otherwise I don't think there's an option to disable it if users have direct access to the machine.
Unfortunately, I provide bare servers to customers . Most servers will install windows server 2016/2019 operating system. So I will worry about this problem .
If I change ES CPUs to retail CPUs or QS CPUs, can I avoid this problem?
 

kaixin

New Member
Apr 24, 2020
4
0
1
It won't work on retail CPU's so that's always an option
Fortunately, I found it does not work on certain models.
i have 100-000000054-05 and 100-000000054-04 Two models, both motherboard and system are same, The overclocking tool is invalid for the 100-000000054-05 ( will report error: error wrirting SMU), but works on 100-000000054-04. Maybe 100-000000054-05 is the closest to the retail version, i don't know.
 

ExecutableFix

Active Member
Nov 25, 2019
123
64
28
Fortunately, I found it does not work on certain models.
i have 100-000000054-05 and 100-000000054-04 Two models, both motherboard and system are same, The overclocking tool is invalid for the 100-000000054-05 ( will report error: error wrirting SMU), but works on 100-000000054-04. Maybe 100-000000054-05 is the closest to the retail version, i don't know.
Interesting. Would you be able to share a CPU-Z screenshot of the -05?
 

yesoos

Member
Mar 10, 2020
40
4
8
PL
That would be a possibility in the future for sure. It's definitely something I've noted on the to-do list
Hi, Could you also lock those critcal values with high possibilty of killing CPU, add some warning maybe, just copy paste from first post and add some agree button to bypass. BR
 

efschu3

Active Member
Mar 11, 2019
160
61
28
In fact you can oc the 2666 dram all the way to 3200 if the chips can handle. All server motherboards I used can overclock dram to cpu's support limit. But be aware that not all chips can run stable at high frequency.

Also 2133 and 2400 kits seems to be incompatible on Rome platform, not sure if it's an isolated bios issue though.
Anyone in here can confirm that 2133/2400 RAM is incompatible with ROME?
 

dassiq

Member
Jul 10, 2017
36
15
8
49
I can confirm 2400 run fine with Rome, however 2133 are not compatible with any EPYC chip as far as I know.
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
You can run DDR4-2133 on Epyc Naples. And I would be really surprised if it would not run on Epyc Rome.
 
  • Like
Reactions: Layla

Jon

Member
Feb 28, 2016
77
18
8
42
I take my previous statement back for PC-2133 ram.

It does work with ROME, I just tried with a Rome ES and H11SSL-i board.

So both 2400 and 2133 work with ROME ES
Have Retail Rome chips in a EPYCD8-2T and a H11SSL-i and both work with 2133 and 2400 ram speed. But usually have 2666 and 2933 ECC ram in them.
 

bayleyw

Active Member
Jan 8, 2014
302
99
28
Just booted up my ZS1406E2VJUG5 on an H11SSL-NC with @ExecutableFix 's modded BIOS and some cheap DDR4-2400 ECC off eBay and it works great. Haven't gotten around to overclocking yet (I didn't have any drives handy to put Windows on).

Some observations:
  • ZS1406E2VJUG5 is a very low power engineering sample @stock. Running Prime95 on a Ubuntu 20.04 thumbdrive I see 200W total load at the wall on a RM850i, less than 60W delta over ~140W at idle. Clock speeds are really, really low running Prime, on the order of 800MHz according to the kernel. I'm not sure if the readout is correct though, it is significantly below base clock...
  • Speaking of which, Ubuntu 20.04 works great on these boards - I didn't see any strange behavior and the system was generally responsive on the desktop despite being a 1.4GHz processor.
  • The H11SSL behaves...strangely with a PCIe video card. With onboard VGA disabled and the BIOS set to default to offboard graphics it does output the BIOS to the Strix RX570-O4G I had installed, but only reliably over DVI. Two of the three (admittedly very cheap) HDMI monitors I tried failed to detect a signal.
  • This was further confounded by a regression introduced into the latest amdgpu drivers: with amdgpu.dc=1 (default), HDMI monitors outright don't work as early as the splash screen. It took quite a bit of digging to figure this out, at first I thought it was some strange Supermicro BIOS quirk that forced all output over the DVI ports.
 

bayleyw

Active Member
Jan 8, 2014
302
99
28
Further shenanigans after installing Windows 10 Enterprise 19H2 on a very old SATA SSD:
  • The H11SSL really does not like the combination of Windows and an offboard (AMD) GPU. To install Windows and the AMD drivers I had to:
    • Set VGA priority to offboard in the BIOS, disable the onboard graphics.
    • Install Windows. After going through the usual steps and installing the Windows Server 2019 SP3 chipset drivers you will find that despite getting video, no display device appears in Device Manager.
    • Reinstall the VGA jumper. Plug in a VGA monitor - you'll find that Windows now outputs over the VGA port.
    • The AMD GPU will now appear in Device Manger. Proceed with installing the drivers.
    • Restart. Both the onboard and offboard graphics will output an image, but the primary will be set to the VGA port and all your windows will show up there.
    • Use Device Manager to disable the onboard graphics in Windows. This will require some trickery (in my case, I booted with a single VGA+DVI monitor, switched to VGA, opened Device Manager, dragged the window over to the phantom 'second monitor', switched to DVI, then disabled the onboard graphics).
    • You won't get the spinning circle or Windows logo while booting (the BMC clobbers these), but once Windows starts everything will work great.
  • Performance is pretty solid: I get the best MT performance without using the EDC bug at 2.5GHz, where it matches a 7742 in Cinebench:

KeyShot 9.0 turned out a "disappointing" 673 fps - matches the Epyc but gets thrashed by the 3990X, which does almost 1K fps. Power consumption was low - about 290W at the wall (150W over idle), which probably explains the huge edge Threadripper has here - TR will turbo to exactly 280W in most scenarios.

  • Power consumption is at a pretty reasonable 360W (220W over idle) in CB20, which is conveniently exactly on par with a 7742 as well. As mentioned before, KeyShot was a lower 290W (150W over idle). Despite the CPU being stuck at 2.5GHz (the overclock disables idle frequencies), idle power with a RX570 and a bunch of RGB fans is still at a reasonable-though-high 140W.
  • VRM's are...warm...in my desktop-style setup - this board really needs server-style front-to-back airflow to keep the VRM's cool. Bonus fans required, I guess.
  • The 'High Multi-Core' preset crashes immediately on my sample on CB20; I think I saw one other user suffer from this problem so it may be wise to drop the preset by a couple hundred megahertz. It could also be the H11SSL has different VRM's from @ExecutableFix 's H11DSi and the voltage droops enough to crash the system.
  • I had one strange occurrence where my CB score was stuck at 16.5K and power consumption dropped to 320W - unclear if it was a hardware issue (hot VRM's?) or a software one.
 
Last edited:
  • Like
Reactions: lixinran0809

Epyc

Member
May 1, 2020
56
8
8
Hey guys, I have my first real server hardware here, also 2 epyc 32core ES.
Been a whole road to get it all working but as a experienced tuner I got a lot of thoughts and experiences with the setup I wanted to share.

First of got the pair of 2s1705e3vivg5 BB ES 32 core samples second hand.
Bought a H11DSI-NT V2 supermicro board because, its the only dual socket sold separately and the v2 version for the 10gb lan.
I got half of the dimms populated with 2933mhz rdimms, just the money was not sufficient to get everything filled for now.
Running windows server 2019.

The samples run by default run at 1,7ghz, kinda slow but with the help of the program I could get it up to 3ghz/1v program settings.
In reality this meant that it was running at 2,4ghz on 1.34V, and that's fast enough for me, also seem to be a bit at the max here of what can be done with ease.

I know from TRX40, TR4 and ryzen how important memory and IF speed is for the end result.
So I was very surprised to read that SP3 IF is up to maximum 1467Mhz because TRX40 for example goes to 1800+
This would mean that up to 2933mhz the fabric is clocked 1:1 and at 3200mhz there is a divider active.
After testing it out with AIDA cache and memory tester you can see that at 2933 there is the least latency, at 3200 latency increases and bandwith increases.
So that would be a clear sign of the clock divider becoming active.
Its a shame we cant 'modify' the IF max clock for 1:1 ratio, this is something I have done on ryzen mobile where it also should not be possible.
Maybe there's some hope. If we could get the IF up to 1600 with a 1:1 locked ratio would be a huge improvement in performance and latency.

I can advice everyone to buy or get ur ram to 2933, not so much for the bandwidth but for the decrease in latency and overall improvement because of the faster running Infinity fabric. To be clear, 2933 is the best speed vs latency option.

In the bios you can also tweak the speed between the two sockets, by default its running at 10,66gbps.
I got it up to 16gbps, at 18gbps its very unstable. But a decent improvement in CB score.
Also forced in bios that the infinity fabric should run max speed all the time, not going in sleep mode to prevent jitter from cycles needed to reup the clock.

Damn, what are these server boards crappy the settings. After reading the complete amd guideline document about rome and all the options and functionality's I should be able to adjust, I can honestly say supermicro has not implemented even half of the functions.
Compared to a trx40 system this feels like a 10 dollar OEM mobo bios.

At the Moment I am close to the 19K points in CB, but with only half the dimms per socket filled. So I think there is still a lot of headroom for improvement when more dimms arrive.

Things that I would really love:
1 Memory timing adjustment, one of the big gains on desktop is to do every dramm timing custom, primary, secondary and tertiary timings. I know it can be done because when placing the same DDR4 ecc Rdimm in a trx system all the memory timings are tunable.
2 More IF control and options. I will see if the hack that works on ryzen mobile could maybe also work on EPYC, if the IF could go to 1600 1;1 running 3200mhz would actually give a big performance increase instead of the mixed bag it is now.
3 Like with all zen, latency is the biggest enemy, if you decrease it performance goes up drastic
4 Noctua coolers named TR4-SP3 are not really sp3 proof............ something with flow direction.

Anyway, always open to feedback or suggestions about all.
Greetings.
IMG_20200501_112848_result.jpg
 

Attachments

EagerToLearn

Active Member
Feb 25, 2020
115
25
28
But there seems to be something odd about this. I got my 7551 running @ 3 GHz with 1.025V.

Could it be, that yours is one of the very early ES samples?
 

Epyc

Member
May 1, 2020
56
8
8
But there seems to be something odd about this. I got my 7551 running @ 3 GHz with 1.025V.

Could it be, that yours is one of the very early ES samples?
To be honost im a bit doubtfull what measurement to believe, because everything give different values. This 1.34v is from HWINFO, it changes when I change voltage in comparison.
In the supermicro management its gives a lot lower voltage around 1,07V but I don't have them in that high regards atm.
So even if 1.34 is correct its still acceptable. A threadripper boosting on all cilinders has even higher voltage draw at times.
But looking at the powerdraw and how cool the chips remain its probably much lower
 

Attachments

bayleyw

Active Member
Jan 8, 2014
302
99
28
I'd believe 1.07V@2.4GHz, I think HWINFO64 and IPMI were reading just under a volt on my 64C ZS1406 @2.5. 2S is one stepping earlier than ZS so a slight increase in voltages to hit similar clock speeds seems plausible.

The best way to resolve the confusion is with a wattmeter - if it's actually 1.34V you'll be seeing some incredibly high power consumption at the wall, probably 600W+.
 

Epyc

Member
May 1, 2020
56
8
8
I'd believe 1.07V@2.4GHz, I think HWINFO64 and IPMI were reading just under a volt on my 64C ZS1406 @2.5. 2S is one stepping earlier than ZS so a slight increase in voltages to hit similar clock speeds seems plausible.

The best way to resolve the confusion is with a wattmeter - if it's actually 1.34V you'll be seeing some incredibly high power consumption at the wall, probably 600W+.
Yea, that sounds very logical, that would mean there is a lot of performance headroom left :D
wanted to start off without pushing it to far, don't know how fragile these ES are