Dual EPYC 7B13 overheating issue

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NikVince

New Member
Feb 5, 2024
5
0
1
I start with the background:

I have ordered from the seller Tugm4770 on ebay since I have read great things about him/her.

My order:
Motherboard - SuperMicro H12DSI-N6
CPUs - 2x EPYC Milan 7B13
Ram - 256 Gb DDR4 ECC Samsung

The CPUs came preinstalled in the motherboard as it shipped and I immediately installed two BeQuiet! Dark Rock TR4s and started the machine.
I installed HiveOs to mine monero and immediately the machine sat at 80 degrees celsius at idle and at 105 it was shitting down under load after 3/4 minutes.
I took of the coolers and remounted them without cleaning of the MX-6 thermal paste I already applied and added another bit (yeah I know, stupid me). Booted immediately to 100 degrees at idle.
Rinse and repeat, I took the coolers off again and cleaned the thermal paste, placed everything again and we are back at 80 degrees idle.
I subsequently ordere two Arctic Freezer SP3 4U-m after realizing that the BeQiet! coolers weren’t strong enough and after installing the new coolers I immediately noticed that a lot more war air was coming out from the back of them.
The motherboard is on an open test bench style place at the moment.
The idle temperature displayed in HiveOs still stayed at 80 degrees and I don’t understand what’s happening.
I contacted the seller and he suggested me getting replacements for the CPUs, but given that they both are hanging pretty much at the exact same temperatures I think that something is off and that it isn’t very likely that both CPUs are faulty in the exact same way.
I asked him for a screwdriver and will proceed in reseating them both when it arrives (odd that I haven’t received one in the first place, but ok).

Is there anything I could try in the meantime?

Big Site Note:

I cannot see the temperatures from the BMC/IPMI dashboard as the sensors say N/A so even tho I have the temperatures displayed in HiveOs, I am not sure that they are right.
I even tried installing Windows 10 and the CPU Package temperature are the same 80 degrees even tho the CCD’s are sitting at much lower temperatures around 40 at idle.

Conclusion:

I have built various consumer PCs in the last years, but I have never worked with enterprise gear before, so if I’m missing something really obvious or you have any kind of suggestion I would really appreciate your input.
 

twinyuki

New Member
Dec 27, 2023
13
6
3
K10temp, which is built into the kernel used by HiveOS, cannot correctly detect Millan's core temperature.
This has been fixed in Kernel 6.4 or later.
The beta version of HiveOS has now been released.
hiveos-0.6-224-beta@230911
Zen4 temperature display is supported, but the Millan issue has also been resolved.
 

NikVince

New Member
Feb 5, 2024
5
0
1
K10temp, which is built into the kernel used by HiveOS, cannot correctly detect Millan's core temperature.
This has been fixed in Kernel 6.4 or later.
The beta version of HiveOS has now been released.
hiveos-0.6-224-beta@230911
Zen4 temperature display is supported, but the Millan issue has also been resolved.
This is such great news. I am incredibly happy now.
I will be trying to update later this afternoon and see what happens. Thank you so much dude!!
 

Cajunitaly

New Member
Feb 9, 2024
4
0
1
Did this work cuz Im having exact same issue dual 7763s. I just had spine surgery so I don’t want to get up to reburn hive unless is works. thanks yall
 

NikVince

New Member
Feb 5, 2024
5
0
1
Did this work cuz Im having exact same issue dual 7763s. I just had spine surgery so I don’t want to get up to reburn hive unless is works. thanks yall
It absolutely worked for me and I have been mining at about 55 degrees per CPU for the last days since!!
The only thing that was giving me a little trouble were the VRMs that were overheating.
I 3D printed this duct and now they sit at about 70-75 degrees but aren’t overheating anymore.
Good luck with your recovery and stay safe!
IMG_8564.jpeg
 

drdepasquale

Member
Dec 1, 2022
77
32
18
That appears to be running awfully hot! What do the temperatures show in other operating systems? I have a machine with the same CPU coolers used for Monero mining and it never runs hotter than 55C even with 225 Watt processors.
 

Cajunitaly

New Member
Feb 9, 2024
4
0
1
It absolutely worked for me and I have been mining at about 55 degrees per CPU for the last days since!!
The only thing that was giving me a little trouble were the VRMs that were overheating.
I 3D printed this duct and now they sit at about 70-75 degrees but aren’t overheating anymore.
Good luck with your recovery and stay safe!
View attachment 34480
Awesome!! Ok will getup and go burn me an updated bootdisk. I was freaking when it overheated originally. Yes the vrm on those SuperMicro boards always been issues. Looks like you got plenty fans! I’m working on turning one my extra bedrooms to a mini fridge to fix the warning issues .. I’m lucky if I have another month before east texas starts its warming phase. thanks for the reply!
 

Cajunitaly

New Member
Feb 9, 2024
4
0
1
It absolutely worked for me and I have been mining at about 55 degrees per CPU for the last days since!!
The only thing that was giving me a little trouble were the VRMs that were overheating.
I 3D printed this duct and now they sit at about 70-75 degrees but aren’t overheating anymore.
Good luck with your recovery and stay safe!
View attachment 34480
On another note.. can you make stands for these dual boards on your 3d printer?
 

NikVince

New Member
Feb 5, 2024
5
0
1
That appears to be running awfully hot! What do the temperatures show in other operating systems? I have a machine with the same CPU coolers used for Monero mining and it never runs hotter than 55C even with 225 Watt processors.
These CPUs are 280watt Tdp, not 225.
As I said yesterday, updating the kernel fixed the wrong temperature display and I'm currently running at 55-60 degrees at max!
 

drdepasquale

Member
Dec 1, 2022
77
32
18
These CPUs are 280watt Tdp, not 225.
As I said yesterday, updating the kernel fixed the wrong temperature display and I'm currently running at 55-60 degrees at max!
I could see that these would run a little hotter with a higher TDP. These are still far cooler and more efficient when compared to a consumer grade CPU of the same or higher TDP.