For those interested in Rome overclocking: https://forums.servethehome.com/index.php?threads/overclocking-epyc-rome-es.28111/
missmah/ipmi_tools ?However, this didn't complete solve the issue for me. In the end I grabbed a shell script that someone wrote (Sorry, but I forget who wrote this to give proper credit) for a Supermicro LGA2011 board and modified it slightly. This runs at boot and polls the temps and fans and makes adjustments. I haven't had any issues since with fan ramping at idle and it has worked well in ramping up the fans under load.
I have some similar experience of system freeze-up when the motherboard has a few GPUs installed.Hello everyone
It has been a while but i am using my es build again and had it running stable on the almost 3.8ghz for an es sample base 1.9 speed.
Motherboard: Asrock EPYCD8-2T
CPU: AMD Eng Sample: 2S1905A4VIHF4_30/19_N
Ram: 2x 16gb 2400mhz ecc (Hynix HMA42GR7AFR4N-UH)
Psu: Corsair 1000W (HX 1000)
But recently added a simple rx 580 8gb sapphire card into it, and that made me not able to keep up this clock speed. Actually it has been freezing up even on 3.4, trying to get it down to stable again. It runs for a good few hours and suddenly wants to freeze up. Only having a minecraft client as activity. Not even a full load.
Now im not the most experienced guy, so maybe you guys have some insights for me on how to achieve the higher clocks again, could it be a motherboard power pass down issue? I doubt it is the PSU. Nor do i think that powering the gpu from another psu will help.
I just know i could achieve much higher stable cpu clockspeeds without this videocard in it.
Interesting, i have had this ran with a simple old amd barco mxrt 5600 videocard initially a few months ago and ran many tests on nearly 3.8ghz clock getting good scores and ran it with cpu intensive load for days with no problem.I have some similar experience of system freeze-up when the motherboard has a few GPUs installed.
I have two setups: Asrock EPYCD8-2T + Epyc 7551 + 4x AMD GPUs, Gigabyte MZ31-AR0 + Epyc 7601 + 4x NVIDIA GPUs. On both of the machines I have the freeze issue with an overclocked CPU. The symptom of freeze looks the same on the two machines: the login cursor stops blinking (I run Linux), system not responding to ping, and even not responding to the NMI button. No any error message on screen or SOL console. CPU heatsink becomes warm after the freeze.
Both systems were overclocked at 3.3 GHz and were stress tested using mprime without GPUs, and have been running stable for some time. However, once I have the GPUs installed, on the Asrock system has a high chance of freezing, sometimes after running/idling for a few hours and sometimes immediately after system startup (when the GPU kernel driver is loaded). On the other Gigabyte motherboard, the freeze occurs when I run some GPU workload, especially when I attempted to increase the power limit of GPUs (which puts more stress on power delivery).
On the other hand, I have two supermicro boards (H11ssl and H11dsi) both running with overclocked Epyc 1st gen CPUs with multi-GPU setup. They never have the freeze issue. It is probably just a coincidence, but my guess is that the PCIe power delivery of the motherboard may affect stability of CPU voltage, and the system can become unstable with heavy PCIe workload. If you can obtain a supermicro motherboard it is definitely worth trying.
The EPYCD8-2T motherboard has a PCIe 6-pin connector (at the bottom right corner on your picture). Technically it is only necessary when you use more then 3 GPUs but I guess you can try to connect it to enhance power delivery.Interesting, i have had this ran with a simple old amd barco mxrt 5600 videocard initially a few months ago and ran many tests on nearly 3.8ghz clock getting good scores and ran it with cpu intensive load for days with no problem.
But i am not really putting a big load on either the cpu or the gpu now, so the freeze makes no sense really, can also go fine for hours or just randomly quit on me when i think it is running fine for a while.
I added a rx580 to it and left the barco in there as well, and got the freeze issues like you describe, curser just stops, screen freezes and the cpu does get hot probably, it happened a few times overnight, or when i was not watching and when i came back the airflow of the watercooling was significantly warmer, not too hot. If i did not have sufficient cooling in it it would probably have killed the cpu from the heat.
I have removed the barco card now, and testing the clock, put it back on 3.6ghz to see how things go...
This motherboard however did seem to be really chill about what clock and voltsages to set to, vrams seem to stay cool but then again my whole airflow setup is sufficient enough.
*Also to note i tried a evga 1080 ti in this motherboard, on the bios it came in the box but it would not accept/boot with that card in it. The rx580 booted fine after a 60 seconds boot. Probably because 1080 ti not supported yet on my bios version, but unsure if safe to upgrade because ES sample cpu. At some point i will want to put a better card in it than a rx580 so will have to see about that.
Edit: Trying out having the rx580 on a second power supply (1000w), but no luck achieving the previous clocks of 3.8ghz. I will try and put the videocard in pci slot 1. Edit: Also no luck on trying the additional power supply or putting the rx580 in pci slot 1.
Tomorrow i will probably check with just the barco card again if i can get 3.8 ghz again... But this seems like a motherboard power draw issue. I can also still try the two other pci slots that have not been tried yet but if this is a power draw issue over the motherboard i doubt a difference in slot used will matter. Imho slot 1 or 3 will not able to keep up 3.6 , 3.8 or even 3,4 for that matter.
For now i am set to 3.1 clock, as i need to use the computer and want to see if it stays stable on that. I was able to complete another test on 3.6 ghz but it was NOT stable. This was stable without the rx card even 3.8 was.
See attatchments.
I will for sure keep trying out some combinations and different settings, the only really known stable speed i had before this was the 3.8ghz eventual clock the computer was on for a few months but not actively used. However it was very stable on the benchmarks but this was with only the little barco card added. I will try lowering my voltages if i decide to eventually stay on 3.1 or so but if i can get some higher stable that be ideal.The EPYCD8-2T motherboard has a PCIe 6-pin connector (at the bottom right corner on your picture). Technically it is only necessary when you use more then 3 GPUs but I guess you can try to connect it to enhance power delivery.
Also if you run at 31x you can potentially lower the voltage a bit more, which might help to reduce instability cased by high power consumption. My 2S1905A4VIHF4_30/19_N runs at 33x with 1.0375V (but that is on a Supermicro motherboard).
One additional thing you can try is to reduce PCIe width from 16x to 8x, which helped me to stabilize the system sometimes.
@epyc es001 what type of memory are you using in your krpa-u16? thanksmy motherboard is asus krpa u16, actually i dare not to make any changes because its an es cpu.
I have an EPYC 7351P retail version and with this tool I can overclock to 3.4ghz. The stock freq is 2.4ghz, so...Just to confirm, this works on retail (NOT es) naples chips too, right?
Only 700W from the wall...
VRM max temps at 73 are a bit toasty so I'm not sure I can push this further, thoughts?
It seems my voltage is not high enough, something downclocks (I see this in the power draw from the wall) when cinebench is running. The scores are really irregular (I can increase clocks but the score doesn't increase unless I increase voltage) for some reason.VRM should be good until about 95 C for non-regular (24/7) use. I got a bit over 20K score on R20 with 32@3.150GHz (7601) x2. I am not sure why yours would be more than 10% lower.
I ran 31.5x (1.05V) 24/7 for a long time and had VRM at 97 C irregularly, but usually under 90 C and it seemed fine.
Only 700W from the wall...
VRM max temps at 73 are a bit toasty so I'm not sure I can push this further, thoughts?