So we have a DL385 G10 server with a single CPU (EPYC 7281). We have Proxmox (Debian 9 based virtualuzation "distro"). The server works just fine it's stable.
Now, i compiled a kernel on it (to be used on desktop computers, very generic with thoudands of modules) and i was lazy to wait for completion on my 4 core desktop so i compiled it on the server with 30 threads. I took the chance to check temperatures and powet usage via iLO.
To my surprise iLO, after the whole cpu was fully loaded for a few minutes (compilation plus the 18 VMs running on it), it showed 40 something degrees for the cpu with the fans staying on 20% (fan profile was the "optimal" one).
I check "sensors", surprise: there are 4 temps reported (one for each die?), all between 70 and 75 degrees.
I redid the check with "increased cooling", something similar, fans almost constant at about 25%, 'sensors' report about 70 C(compilation finished about the time 70C was reached).
Here are my reported temps and fan speeds, first "sensors" then the iLO readings via ipmitool:
Eventually after the sensors-reported CPU temps were at about 70C the fans did spin up a little, up to 26.66%.
Also i checked cpu frequencies with "cpufreq-aperf", they were all above 2.5 Ghz up to 2.68 (top boost is 2.7 Ghz) so they don't seem to clock down internally.
So what's the deal here?
I read up on it and it seems this 40C iLO reporting affected older Intel systems too. Additionally i'm not sure about lm-sensors accuracy either since some reported artificially increased temperatures for Ryzen systems but i'm unsure this affected EPYC aswell.
Should i be concerned?
I have iLO, BIOS, "power management" firmwares to the latest version.
Now, i compiled a kernel on it (to be used on desktop computers, very generic with thoudands of modules) and i was lazy to wait for completion on my 4 core desktop so i compiled it on the server with 30 threads. I took the chance to check temperatures and powet usage via iLO.
To my surprise iLO, after the whole cpu was fully loaded for a few minutes (compilation plus the 18 VMs running on it), it showed 40 something degrees for the cpu with the fans staying on 20% (fan profile was the "optimal" one).
I check "sensors", surprise: there are 4 temps reported (one for each die?), all between 70 and 75 degrees.
I redid the check with "increased cooling", something similar, fans almost constant at about 25%, 'sensors' report about 70 C(compilation finished about the time 70C was reached).
Here are my reported temps and fan speeds, first "sensors" then the iLO readings via ipmitool:
Code:
Every 5.0s: sensors;ipmitool sdr |grep 'CPU\|DutyCycle'
k10temp-pci-00db
Adapter: PCI adapter
temp1: +65.5°C (high = +70.0°C)
k10temp-pci-00cb
Adapter: PCI adapter
temp1: +64.0°C (high = +70.0°C)
k10temp-pci-00d3
Adapter: PCI adapter
temp1: +63.8°C (high = +70.0°C)
k10temp-pci-00c3
Adapter: PCI adapter
temp1: +65.5°C (high = +70.0°C)
02-CPU 1 | 40 degrees C | ok
03-CPU 2 | disabled | ns
Fan 1 DutyCycle | 25.87 percent | ok
Fan 2 DutyCycle | 25.87 percent | ok
Fan 3 DutyCycle | 25.87 percent | ok
Fan 4 DutyCycle | 25.87 percent | ok
Fan 5 DutyCycle | 25.87 percent | ok
Fan 6 DutyCycle | 25.87 percent | ok
CPU Utilization | 0 unspecified | ok
CPU_Stat_C1 | 0x00 | ok
CPU_Stat_C2 | 0x00 | ok
Also i checked cpu frequencies with "cpufreq-aperf", they were all above 2.5 Ghz up to 2.68 (top boost is 2.7 Ghz) so they don't seem to clock down internally.
So what's the deal here?
I read up on it and it seems this 40C iLO reporting affected older Intel systems too. Additionally i'm not sure about lm-sensors accuracy either since some reported artificially increased temperatures for Ryzen systems but i'm unsure this affected EPYC aswell.
Should i be concerned?
I have iLO, BIOS, "power management" firmwares to the latest version.