Finally: Overclocking EPYC Rome ES

Spartus

Active Member
Mar 28, 2012
192
30
28
Toronto, Canada
There is one tool everyone who mines should own, and it would also help you diagnose the power and temps. A kill-a-watt meter (aka power meter, or even just an AC clamp ammeter).

If you monitor the power draw you will know if the temp is due to cooling issue, or bad IHS, or is actually drawing 500W+ per socket at load


Let me preface this by saying that I'm a noob and I only know about building desktop PC's and not servers. I got into this because I wanted to mine monero (I don't want people to start with, "OMGEEZ, it's unprofitable, you will never ROI you retard." All it takes is for the crypto to shoot up and all that mining pays off. I went with the EPYC ES over Ryzen 9's because I thought it would take up less space and power supplies. I decided to do this in the first place because there were 2S/ZS cpu's on ebay at the time, and I thought it's just cooler to delve into something like this rather than buying an overpriced GPU and sit on a ticking timebomb. Even if monero mining doesn't pay off, I can sell a system like this in Russia without losing money (and maybe even earning a bit) because not many people use ebay and such a system would be an exotic build either way. Nothing to lose.
So I first wanted to get a ZS, and the only guy selling them on ebay (March of 2021) doesn't send them to Russia because of customs restrictions (although it's completely fine if it's for personal use and I only ordered 2 CPU's, but only later in the shipping tab did I notice that he doesn't ship to Russia (yeah, my fault, lost >$100 after the currency fluctuations and PayPal's butt-tearing exchange rates (didn't realize I could actually convert at my bank's rate because it's not so easy to notice that you actually CAN do that, FML) and because the seller waited for like 3 days and then refunded me. But my fault, whatever.
So that's why I had to settle with a 2S instead of a ZS. Luckily, I snagged one up for a decent price. So that is why I have a 2S in the first place.
And ever since I ordered it, the price of monero actually went up by 80%, so please don't roll your eyes because the whole point of getting in on mining is to do it when it's not so profitable in anticipation of the price increase.



If you don't want to read the back story, start here:
So when I first booted everything up, HWinfo64 showed the motherboard temps in the 50s/60s, then I did a stress test and started fiddling with the Rome ES overclocking program, I checked the CPU temps, they were fine under load, so I scrolled back to the voltages (right before the temps shot up, so I didn't notice that) and just tested the presets. When I applied the "Best of both" preset (for single- and muli-core performance), I noticed the Core VID shoot up to 1.3V and I pooped my pants (before I realized that it's just when the CPU is requesting, not the actual voltage. I did all of this after setting the voltage manually to 0.9 or 0.95 (I keep it at 0.9 right now).
And so I pooped my pants and turned off the PSU switch ASAP. For a couple of minutes, it actually wasn't turning back on so I started shedding tears and digging out a grave for myself and the CPU, but then everything powered back on.

But ever since that first incident, it's been working fine BUT the motherboard temperature reporting has been weird:
View attachment 18659

This is what the temperatures are even if the CPU is at its default of 400 MHz after booting, so I was like, "WTF m8?!" and shrugged.
This is what IPMI shows BTW:
View attachment 18660
And here you can see a maximum of 58C even thought max CPU Die is 119C (this is from the same logging session right now)



Then my noobish brain thought, "Who knows, maybe it's the drivers or some stuff," decided to update, but then Windows (yes, I was doing this on Windows 10 Pro, don't laugh), I got some BSOD cuz I was just using a cheap old Geforce 210 and for some reason after whenever I would update, the BSOD would always be related to the NVIDIA driver so I couldn't update the drivers automatically.
Then I actually busted my balls and tried Ubuntu (which is really newcomer-unfriendly, unless you have another PC next to you that you can google stuff on - thankfully I did), I tried like a dozen different programs, but they would only show the temperature of the NIC (like for reals, I tried a few of them). Anyway, I decided to go without Linux right now because I also couldn't get it to report the CPU voltage, and I'm not gonna try overclocking if I can't even see the voltages, so I went back to Windows and switched to my GTX 1060 so that I have less errors.
Tried installing windows server 2016, but the USB wouldn't boot for some reason (even though secure boot is disabled and everything), I can only install Win10 Pro.

These are the VRM's, right? Or is this for the RAM modules?

Or is this the VRM? To me the first photo looks more fitting, but what do I know.
Anyway, these aren't too hot to the touch. Both the chokes and the heatsinks are like ~50-60 degrees if I had to guess, so something's gotta be wrong.


And the motherboard temperature stay pretty much constant no matter if I'm at 0.4 GHz or 1.8 GHz.

Any help is appreciated.
Thank you
 

boomheadshot

New Member
Mar 20, 2021
23
1
3
There is one tool everyone who mines should own, and it would also help you diagnose the power and temps. A kill-a-watt meter (aka power meter, or even just an AC clamp ammeter).

If you monitor the power draw you will know if the temp is due to cooling issue, or bad IHS, or is actually drawing 500W+ per socket at load
CPU package power in HWinfo64 was roughly at ~350 Watts each time. Is this inaccurate? I'll get a powermeter.


I have a SuperMicro H11DSi motherboard (not 'NT' version)
2x EPYC (ES) 2S1404E2VJUG5 installed
16 DIMMs installed. (64GB (4GB x 16) RAM)
I have flashed the custom Rome BIOS (Rome_H11DSI_Rev2) posted by ExecutableFix (from page 1)

Still will not POST to BIOS. Any suggestions?
With my Asus board, downgrading bios via retail CPU or IPMI doesn't work, so you need to do what I did or Gabber did. Just a guess, I'm not a financial advisor.
 

bayleyw

Active Member
Jan 8, 2014
101
28
28
I have a SuperMicro H11DSi motherboard (not 'NT' version)
2x EPYC (ES) 2S1404E2VJUG5 installed
16 DIMMs installed. (64GB (4GB x 16) RAM)
I have flashed the custom Rome BIOS (Rome_H11DSI_Rev2) posted by ExecutableFix (from page 1)

Still will not POST to BIOS. Any suggestions?
You need the _Rev1 bios for a 1.0 board.
 

itworks

New Member
May 18, 2021
2
0
1
I am a xmr miner
and I try to use 2s1404 .
when I try to apply the presets of multiply,
it print PPT 0 TDC 0 EDC 30A 3800 ghz 1.05 v
and soon it turn black and reboot.

I use this step on my ZS1406(apply the presets and then set 2.4ghz and 0.9v), and it works well.

now I simple set the 2s1404 with 2.3ghz and 0.9 v, the hashrate is about 60% lower than the normal value.

is there anyone could give some advice ?
 

boomheadshot

New Member
Mar 20, 2021
23
1
3
I am a xmr miner
and I try to use 2s1404 .
when I try to apply the presets of multiply,
it print PPT 0 TDC 0 EDC 30A 3800 ghz 1.05 v
and soon it turn black and reboot.

I use this step on my ZS1406(apply the presets and then set 2.4ghz and 0.9v), and it works well.

now I simple set the 2s1404 with 2.3ghz and 0.9 v, the hashrate is about 60% lower than the normal value.

is there anyone could give some advice ?
Oh shi-
I think it might mean that the 2S is much worse than the ZS. FML...
 

kbhinkle

New Member
Jan 11, 2021
3
0
1
Hello,
I am looking for 64 Core Rome ES CPU's. As many as 8-10. 'JUG5' p/n are fine. If anyone has any insight (other than Ebay) a connection would be much appreciated. Thanks in advance.
 

boomheadshot

New Member
Mar 20, 2021
23
1
3
Guys. good news.

So first, I added 4 more sticks of RAM, 8 sticks in total - and I still had a low hashrate, so I was pissed off and decided to use the EDC "bug" (yes, I didn't do this previously because I was afraid as hell of killing the CPU. And I got 50 KH/s on 8 sticks of RAM with the High multicore preset, and now everything's good. But as soon as I start mining, my CPU die temp shoots up to 120 C, IDK if it's a correct reading though.

I am a xmr miner
and I try to use 2s1404 .
when I try to apply the presets of multiply,
it print PPT 0 TDC 0 EDC 30A 3800 ghz 1.05 v
and soon it turn black and reboot.

I use this step on my ZS1406(apply the presets and then set 2.4ghz and 0.9v), and it works well.

now I simple set the 2s1404 with 2.3ghz and 0.9 v, the hashrate is about 60% lower than the normal value.

is there anyone could give some advice ?
Welp, I've got the same problem now. A couple of times it worked fine, now it just reboots when I try to use the "High Multi Core" preset. What sometimes helped was manually setting the voltage to 1.05V and 3.6 GHz and then using the preset. But now I can't get it to work either.

At this point, after another crash my mobo went into a reboot loop and I was looking for a noose. Luckily, I noticed in the IPMI beforehand that my battery voltage was RED (in the bios IPMI readings), so I took out the battery and everything reverted back to normal settings (even though clearing the CMOS didn't help). This build is gonna give me a heart attack haha.

BTW, on the 2S with the preset enabled, HWinfo64 was reporting 840 Watts (CPU package power) holy shit, lol. That was at 3.8 GHz and 1.05V. When I bumped the freq down to 2.6, the power consumption dropped to like 420 Watts

When the "High Multi Core" worked, I got 50kh/s on XMRig and 7500 on R15, I was quite surprised. But now I can't get it to work :(
When @ stock, I get about 3300 on R15, then when I apply the "Best of both" preset, it says that it's at 3.2 GHz, but the R15 score drops down to 3100

Okay, so I manually set PPT 0, TDC 0, EDC 30A, applied, then set the freq to 3.0 GHz, and got 6447 on R15. I guess the 2S is just too unstable, or it's so energy inefficient that it just overloads the mobo VRM's or something like that.

I've also noticed that it's slightly more stable it you gradually bump up the frequencies, but if you want any real performance, you gotta go for the EDC bug.
 
Last edited:

mirrormax

Active Member
Apr 10, 2020
103
40
28
Got to cool that vrm. I made a custom waterblock for mine and can pull 900+ watt from the wall(includes some disks) stable.
 

lixinran0809

New Member
Nov 26, 2019
6
1
3
I found this rare -04 OEM 48-core Rome chip. Since it's -04 I assume it's also overclockable. It's very interesting that cpuz recognize this as a threadripper 3980x, which doesn't even exist. The seller is asking 14,000 CNY ($2,200) for it.
702947173343271.jpg702998660488562.jpg703035101309903.jpg
 

boomheadshot

New Member
Mar 20, 2021
23
1
3
I found this rare -04 OEM 48-core Rome chip. Since it's -04 I assume it's also overclockable. It's very interesting that cpuz recognize this as a threadripper 3980x, which doesn't even exist. The seller is asking 14,000 CNY ($2,200) for it.
I just think the prices are way too high now, but if you really need it, then I guess you could get one.
 

itworks

New Member
May 18, 2021
2
0
1
Guys. good news.

So first, I added 4 more sticks of RAM, 8 sticks in total - and I still had a low hashrate, so I was pissed off and decided to use the EDC "bug" (yes, I didn't do this previously because I was afraid as hell of killing the CPU. And I got 50 KH/s on 8 sticks of RAM with the High multicore preset, and now everything's good. But as soon as I start mining, my CPU die temp shoots up to 120 C, IDK if it's a correct reading though.



Welp, I've got the same problem now. A couple of times it worked fine, now it just reboots when I try to use the "High Multi Core" preset. What sometimes helped was manually setting the voltage to 1.05V and 3.6 GHz and then using the preset. But now I can't get it to work either.

At this point, after another crash my mobo went into a reboot loop and I was looking for a noose. Luckily, I noticed in the IPMI beforehand that my battery voltage was RED (in the bios IPMI readings), so I took out the battery and everything reverted back to normal settings (even though clearing the CMOS didn't help). This build is gonna give me a heart attack haha.

BTW, on the 2S with the preset enabled, HWinfo64 was reporting 840 Watts (CPU package power) holy shit, lol. That was at 3.8 GHz and 1.05V. When I bumped the freq down to 2.6, the power consumption dropped to like 420 Watts

When the "High Multi Core" worked, I got 50kh/s on XMRig and 7500 on R15, I was quite surprised. But now I can't get it to work :(
When @ stock, I get about 3300 on R15, then when I apply the "Best of both" preset, it says that it's at 3.2 GHz, but the R15 score drops down to 3100

Okay, so I manually set PPT 0, TDC 0, EDC 30A, applied, then set the freq to 3.0 GHz, and got 6447 on R15. I guess the 2S is just too unstable, or it's so energy inefficient that it just overloads the mobo VRM's or something like that.

I've also noticed that it's slightly more stable it you gradually bump up the frequencies, but if you want any real performance, you gotta go for the EDC bug.

agfter s
Guys. good news.

So first, I added 4 more sticks of RAM, 8 sticks in total - and I still had a low hashrate, so I was pissed off and decided to use the EDC "bug" (yes, I didn't do this previously because I was afraid as hell of killing the CPU. And I got 50 KH/s on 8 sticks of RAM with the High multicore preset, and now everything's good. But as soon as I start mining, my CPU die temp shoots up to 120 C, IDK if it's a correct reading though.



Welp, I've got the same problem now. A couple of times it worked fine, now it just reboots when I try to use the "High Multi Core" preset. What sometimes helped was manually setting the voltage to 1.05V and 3.6 GHz and then using the preset. But now I can't get it to work either.

At this point, after another crash my mobo went into a reboot loop and I was looking for a noose. Luckily, I noticed in the IPMI beforehand that my battery voltage was RED (in the bios IPMI readings), so I took out the battery and everything reverted back to normal settings (even though clearing the CMOS didn't help). This build is gonna give me a heart attack haha.

BTW, on the 2S with the preset enabled, HWinfo64 was reporting 840 Watts (CPU package power) holy shit, lol. That was at 3.8 GHz and 1.05V. When I bumped the freq down to 2.6, the power consumption dropped to like 420 Watts

When the "High Multi Core" worked, I got 50kh/s on XMRig and 7500 on R15, I was quite surprised. But now I can't get it to work :(
When @ stock, I get about 3300 on R15, then when I apply the "Best of both" preset, it says that it's at 3.2 GHz, but the R15 score drops down to 3100

Okay, so I manually set PPT 0, TDC 0, EDC 30A, applied, then set the freq to 3.0 GHz, and got 6447 on R15. I guess the 2S is just too unstable, or it's so energy inefficient that it just overloads the mobo VRM's or something like that.

I've also noticed that it's slightly more stable it you gradually bump up the frequencies, but if you want any real performance, you gotta go for the EDC bug.
I solved that problem (cpu reboot when apply presets )
the reason is that 3800 is too high for the cpu,you need another version of overlock soft(1.0.0-BETA) on which the setting 3800 won't work(you can give me the email and I send you),and only enable HIGH-MULTI
then you can set the EDC 30 A, 3.8 and 0.9v on the soft you use before,if reboot, rise 0.9 to 0.95,
up to 1.05 for me to stop the reboot.

besides,2s need water cooler ( i use 500TDP ), to let the temperature bellow 80℃
 

boomheadshot

New Member
Mar 20, 2021
23
1
3
agfter s

I solved that problem (cpu reboot when apply presets )
the reason is that 3800 is too high for the cpu,you need another version of overlock soft(1.0.0-BETA) on which the setting 3800 won't work(you can give me the email and I send you),and only enable HIGH-MULTI
then you can set the EDC 30 A, 3.8 and 0.9v on the soft you use before,if reboot, rise 0.9 to 0.95,
up to 1.05 for me to stop the reboot.

besides,2s need water cooler ( i use 500TDP ), to let the temperature bellow 80℃
What are your benchmarks like? I remember seeing the ZS get 10k on R15, like a stock 3990X. But the 2S gets only 7500 @ 3.8. It's enough for me, but I was surprised to see that much of a difference.
Beating myself up over not getting the ZS.

I was hoping that the Icegiant prosiphon elite would be enough, but I guess at 500 Watts the only way is water. Is anyone getting by with a really high TDP with this air cooler, or is there no chance to make it work as-is?
 

irgen

New Member
Jan 14, 2021
24
2
3
So how far can one push 64-core ZS on h11ssl, realistically? With all active cores, if even under water. I think weak VRM is limiting factor here?
 

boomheadshot

New Member
Mar 20, 2021
23
1
3
if thats through HWinfo it sounds unlikely on 1cpu, especially at 1.05, to hit 900w my 2 are running at like 1.3v(not 100% load though)

iam messuring at the wall

I guess the temperature sensor/power readings are off on this CPU, it was like that even when the mobo temps weren't acting up in the very beginning. What kind of temperature probe should I use to get a more accurate temp? Never had to use any hardware for this, that's why I'm asking. Because if the Icegiant prosiphon elite isn't enough, then I GUESS I need to watercool, already got a cheap used waterblock for TR4 (Alphacool XPX Aurora Pro, got it for $28 locally, will it be good enough? I see you guys have heatkillers and stuff, but I just don't have that kind of cash to blow. I've seen the alphacool on an Igorslab review, he tested it on a 2990wx, it did seem to have some good surface area, but the vid is in German and the youtube auto-translate was meh)
I don't care about aesthetics at all, just want a cheap liquid system. But then I think - do I even need to? I won't be surprised if the temperatures act up again (I'm referencing what I've written on the previous page for those who haven't read it).
Is delidding a thing? I've delidded 2066 CPUs, but I understand that this is a completely different beast. What am I supposed to do if the CPU is actually heating up that much? I'll appreciate your advice, guys.
Does the 2S have overheating protection? I think there was one time where I was testing with the deepcool fryzen when it shut off by itself (I let it mine XMR for about 3-4 mins or something like that), never happened on the icegiant (only when I was toying with the overclock tool, but that's a different story).
I'm thinking of just going **** it and using it as-is, praying to the gods that protections will kick in to stop me from overheating the CPU. Or am I a retard with wishful thinking? I wanna use it for CHIA plotting, and I wanna make use of all the cores. I'm not gonna go 3.8 GHz on this thing, maybe 2.6 GHz to 3.0 GHz (with the EDC bug, of course).
I guess I'm going to go liquid anyway, because the icegiant won't work in a horizontal orientation, and I'm tired of my mobo being on a stand box.
Will a 360mm rad be enough, or should I go for 480 mm?
 

bayleyw

Active Member
Jan 8, 2014
101
28
28
So how far can one push 64-core ZS on h11ssl, realistically? With all active cores, if even under water. I think weak VRM is limiting factor here?
The limiting factor is an internal mechanism of unknown origin which starts clock stretching if the processor is pushed too far. Briefly, for intensive non-AVX-heavy workloads:
  • 64 cores, HT on: limit seems to be around 2.5GHz, performs like a 7742, about 15% slower than a 3990X.
  • 64 cores, HT off: internal limiter seems inactive, I've done 3.2GHz on 64c/64t but the power consumption goes through the roof (well over 300W). Feels sustainable with good VRM cooling (water or high-pressure air), but if your workload is HT-friendly performance is the same as 2.5GHz with HT on, so there isn't much point
  • 32 cores, HT on: internal limiter seems inactive, I've done 3.6GHz on 32c/64t with OK power consumption. Limit here is how much you want to push the volts on a $1K+ CPU with unknown process characteristics
I usually leave mine at 3.2GHz unless I know the workload is particularly intensive, in which case I back down to 2.5GHz.