Disabled temp sensors on RomeD8-2T IPMI

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

antbar

New Member
Jun 27, 2023
3
0
1
Hello everyone, first time posting here and already with a problem to solve.

I just assembled a server with the following specs:
- ASRock Rack RomeD8-2T/BCM
- Epyc 7443P
- 8x64GB RDIMM Micron MTA36ASF8G72PZ-3G2R
- 2x WD SN850X m.2 SSDs (although the issue happens with a SATA SSD instead of the m.2 SSDs)
- no PCIe card installed
- BIOS version P3.50
- BMC version 1.27.00 (but the issue was already there with the P3.40 BIOS and 1.25.00 BMC)

Everything works normally except all temp sensors except the 8 RDIMM sensors show up as disabled in the IPMI and in BIOS. I can read the temp sensors (CPU, Motherboard and a few others) from Windows with HWinfo and from Linux with sensors.

After insisting with reboots, cold boots, BMC resets, I noticed that sometimes the CPU and "Card side temp" are displayed (but not the other missing temp sensors). I don't know what triggers it, but when the board decides to show those sensors, it keeps doing so after reboots and shutdowns. However, turning off the power completely and forcing the BMC to reboot, these sensors are no longer displayed again.

Looking at the IPMI sensors page during a boot process I observed the following:
1. Before reaching the boot menu/enter BIOS screen, the BCM57416, CPU, MB and Card Side temp are displayed (initially with 0 values, after a few seconds with temperature measurements). At this stage the RDIMM sensors are not yet detected.
2. After reaching the boot menu/enter BIOS screen the 8 RDIMM sensors are displayed and all the other temp sensors that were previously displayed are still visible but display a 0 value.
3. When the OS starts booting the temp sensors that showed up initially (BCM57416, CPU, MB and Card Side temp) disappear.

With the CPU temp disabled I cannot use it to control the fans. When I got lucky and the board was displaying the CPU temp, I configured the CPU fan speed based on the CPU temp and everything was working as intended. After a cold boot the CPU temp sensor became disabled and the CPU fans stay at the lowest speed.

Any help would be greatly appreciated.
 

juma

Member
Apr 14, 2021
64
34
18
Had very similar issues with a ROMED8-2T/BCM with 7443P (and surprisingly 2x SN850Xs) as well. I was in contact with William from ASRock support in January about this but he hasn't gotten back since. Also had a weird issue where the BMC would be confusing temperature readings from the -2T version instead of the -2T/BCM version, giving false alarms.
 

antbar

New Member
Jun 27, 2023
3
0
1
Hi juma.

I sent a support request yesterday and I got a reply today. They sent me a new BIOS (L3.55) but it didn't help.

Also had a weird issue where the BMC would be confusing temperature readings from the -2T version instead of the -2T/BCM version, giving false alarms.
Do you mean a LAN_0.83v or something that was displaying like 1v and issuing a warning because of that? I also get that one sporadically.
 

juma

Member
Apr 14, 2021
64
34
18
Interesting, did they give you a changelog for the BIOS?

Do you mean a LAN_0.83v or something that was displaying like 1v and issuing a warning because of that? I also get that one sporadically.
Yep, that one. LAN_0.83v corresponds the to the Intel X550 adapter according to the manual. I forget what the correct name should be for the Broadcom adapter.
 

antbar

New Member
Jun 27, 2023
3
0
1
Interesting, did they give you a changelog for the BIOS?
No, they didn't. The only thing I noticed is that HWinfo now reads an outrageous CPU Tctl/Tdie temperature of mid 80C idling and 102C at full load whereas before it was mid 50 at full load and in line with the core and CCD temps.

Meanwhile I removed the two m.2 SSDs and connected a SATA SSD and the temperature sensors are reading normally. It seems that it is related either to the presence of m.2 SSDs or a specific problem with this model of SSDs.
 

juma

Member
Apr 14, 2021
64
34
18
No, they didn't. The only thing I noticed is that HWinfo now reads an outrageous CPU Tctl/Tdie temperature of mid 80C idling and 102C at full load whereas before it was mid 50 at full load and in line with the core and CCD temps.

Meanwhile I removed the two m.2 SSDs and connected a SATA SSD and the temperature sensors are reading normally. It seems that it is related either to the presence of m.2 SSDs or a specific problem with this model of SSDs.
Hopefully those temperature readings are just wrong conversions and not actual overheating.

Thanks for the cross-analysis, it looks like the M.2 SSDs are the culprit then. I've seen similar issues with ASRock consumer boards but didn't expect it on a server board.