sounds like something is off, the vrm should overheat way before the cpus. ive never seen cpu temp higher than vrms. would think either the reported temp is wrong, cpu is faulty, or its the vrm overheating, are you monitoring vrm temps with ipmi?
Let me preface this by saying that I'm a noob and I only know about building desktop PC's and not servers. I got into this because I wanted to mine monero (I don't want people to start with, "OMGEEZ, it's unprofitable, you will never ROI you retard." All it takes is for the crypto to shoot up and all that mining pays off. I went with the EPYC ES over Ryzen 9's because I thought it would take up less space and power supplies. I decided to do this in the first place because there were 2S/ZS cpu's on ebay at the time, and I thought it's just cooler to delve into something like this rather than buying an overpriced GPU and sit on a ticking timebomb. Even if monero mining doesn't pay off, I can sell a system like this in Russia without losing money (and maybe even earning a bit) because not many people use ebay and such a system would be an exotic build either way. Nothing to lose.
So I first wanted to get a ZS, and the only guy selling them on ebay (March of 2021) doesn't send them to Russia because of customs restrictions (although it's completely fine if it's for personal use and I only ordered 2 CPU's, but only later in the shipping tab did I notice that he doesn't ship to Russia (yeah, my fault, lost >$100 after the currency fluctuations and PayPal's butt-tearing exchange rates (didn't realize I could actually convert at my bank's rate because it's not so easy to notice that you actually CAN do that, FML) and because the seller waited for like 3 days and then refunded me. But my fault, whatever.
So that's why I had to settle with a 2S instead of a ZS. Luckily, I snagged one up for a decent price. So
that is why I have a 2S in the first place.
And ever since I ordered it, the price of monero actually went up by 80%, so
please don't roll your eyes because the whole point of getting in on mining is to do it when it's not so profitable in anticipation of the price increase.
If you don't want to read the back story, start here:
So when I first booted everything up, HWinfo64 showed the motherboard temps in the 50s/60s, then I did a stress test and started fiddling with the Rome ES overclocking program, I checked the CPU temps, they were fine under load, so I scrolled back to the voltages (right before the temps shot up, so I didn't notice that) and just tested the presets. When I applied the "Best of both" preset (for single- and muli-core performance), I noticed the Core VID shoot up to 1.3V and I pooped my pants (before I realized that it's just when the CPU is requesting, not the actual voltage. I did all of this after setting the voltage manually to 0.9 or 0.95 (I keep it at 0.9 right now).
And so I pooped my pants and turned off the PSU switch ASAP. For a couple of minutes, it actually wasn't turning back on so I started shedding tears and digging out a grave for myself and the CPU, but then everything powered back on.
But ever since that first incident, it's been working fine BUT the motherboard temperature reporting has been weird:
This is what the temperatures are even if the CPU is at its default of 400 MHz after booting, so I was like, "WTF m8?!" and shrugged.
This is what IPMI shows BTW:
And here you can see a maximum of 58C even thought max CPU Die is 119C (this is from the same logging session right now)
Then my noobish brain thought, "Who knows, maybe it's the drivers or some stuff," decided to update, but then Windows (yes, I was doing this on Windows 10 Pro, don't laugh), I got some BSOD cuz I was just using a cheap old Geforce 210 and for some reason after whenever I would update, the BSOD would always be related to the NVIDIA driver so I couldn't update the drivers automatically.
Then I actually busted my balls and tried Ubuntu (which is really newcomer-unfriendly, unless you have another PC next to you that you can google stuff on - thankfully I did), I tried like a dozen different programs, but they would only show the temperature of the NIC (like for reals, I tried a few of them). Anyway, I decided to go without Linux right now because I also couldn't get it to report the CPU voltage, and I'm not gonna try overclocking if I can't even see the voltages, so I went back to Windows and switched to my GTX 1060 so that I have less errors.
Tried installing windows server 2016, but the USB wouldn't boot for some reason (even though secure boot is disabled and everything), I can only install Win10 Pro.
These are the VRM's, right? Or is this for the RAM modules?
Or is
this the VRM? To me the first photo looks more fitting, but what do I know.
Anyway, these aren't too hot to the touch. Both the chokes and the heatsinks are like ~50-60 degrees if I had to guess, so something's gotta be wrong.
And the motherboard temperature stay pretty much constant no matter if I'm at 0.4 GHz or 1.8 GHz.
Any help is appreciated.
Thank you