Hard to explain high CPU temp

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

blood

Member
Apr 20, 2017
42
14
8
44
I've had an X9SCM-f system for a very long time. Until recently I had a Xeon E3-1230 v1 (Sandy Bridge) in it and I always remembered it idling at about 90F - and I used to care a lot about that because I had it in my living room and had a big passive heatsink on it with a gigantic (but SLOW) fan attacked to a big hole in the case I cut out with tin snips. I didn't want to fry it, but didn't want it too loud either.

Fast forward to now. Different house, and I have a server rack in my mechanical room and now it's in a bigger case with much better airflow and much less concern about how noisy - but it's reading 150F for all 4 cores now:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +156.2°F (high = +185.0°F, crit = +221.0°F)
Core 0: +156.2°F (high = +185.0°F, crit = +221.0°F)
Core 1: +152.6°F (high = +185.0°F, crit = +221.0°F)
Core 2: +150.8°F (high = +185.0°F, crit = +221.0°F)
Core 3: +149.0°F (high = +185.0°F, crit = +221.0°F)

which was a bit alarming when I saw it. New thermal paste and reseating the HSF (active now) didn't do anything really. I had an E3-1270 v2 (Ivy Bridge) on the way though, so I swapped it out and put on a brand new HSF and it's still sitting ~150F.

Odd thing is, I can run cpuburn on it for a very long time and that reading hardly budges. And the other readings from lm-sensors make it seem like everything is nice and chill, and nothing feels hot, even an IR thermometer reading of the HSF shows it to be right at room temperature. The BIOS reports the cpu temp as "low" too.

So multiple CPUs, new paste, multiple heatsink-fans - same high temps from the ISA sensor - but it doesn't budge when I put it under a very extreme load and other readings are fine.

What gives?
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
I see two possibilities:
1) something causes high CPU load while your machine sits "idle"
2) sensors not working properly any more. I had serious problems with sensors after switching to kernel version 4.12
 
  • Like
Reactions: blood

blood

Member
Apr 20, 2017
42
14
8
44
Good call on #2 - I should have thought about the software I was running rather than focusing purely on the hardware. The numbers I reported above were from the latest Proxmox after having it running OpenIndiana for a few years and I hadn't bothered to mess with lm-sensors in solarish - so I didn't have a recent baseline to compare against. I just tossed another disk in it and installed Debian 8 onto it (3.16.0-4 kernel) and lo and behold, lm-sensors reports ~90F for each of the cores there.

So it's a software thing it would seem.

I guess I'll go back to Proxmox and just assume that my CPU isn't actually running hot and hope that an update fixes this. I'm not sure if this is a kernel thing, an lm-sensors thing, or some other component - maybe I'll start with a bug report to lm-sensors (after looking for open bugs). Maybe I'll just try a modern Debian Stretch install to see what it shows in case it's something to do with the pve kernel.

Thanks for kicking my brain in the right direction.
 

blood

Member
Apr 20, 2017
42
14
8
44
Stock Debian 9 seems to work fine as well (kernel 4.9.0-6, lm-sensors 3.4.0-4), so it would seem to be something with the Proxmox kernel.
 

blood

Member
Apr 20, 2017
42
14
8
44
Even 4.16.0-0 from stretch-backports gives me the correct temperature - so I don't think it's so much a newer kernel, but really something that Proxmox did to the pve kernel itself. I maybe should move this over to their forum...
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
It seems unlikely given that it should be coming straight from the kernel, but there aren't any weird multipliers defined for coretemp in either /etc/sensors3.conf or /etc/sensors.d are there (assuming proxmox keeps the same debian defaults)?

Do the values reported by the coretemp module line up with the temperatures being reported by sensors? IIRC the sysfs values are just the temperature in C multiplied by 1000 so it's relatively easy to see whether they're in the right ballpark at least (I've no idea how to go about having things report these in Fahrenheit though).
Code:
effrafax@wug:~$ cat /sys/bus/platform/drivers/coretemp/coretemp.0/hwmon/hwmon0/temp*_input
40000
39000
40000
38000
39000
If the numbers coming straight from sysfs are wiggedy and/or whack then you've got fairly certain proof that something's done goof with their coretemp module.
 

blood

Member
Apr 20, 2017
42
14
8
44
The numbers from sysfs correspond with what lm-sensors is telling me:

$ cat /sys/bus/platform/drivers/coretemp/coretemp.0/hwmon/hwmon1/temp*input
67000
67000
66000
66000
64000

$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +106.0°C)
temp2: +29.8°C (crit = +106.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +69.0°C (high = +85.0°C, crit = +105.0°C)
Core 0: +69.0°C (high = +85.0°C, crit = +105.0°C)
Core 1: +65.0°C (high = +85.0°C, crit = +105.0°C)
Core 2: +66.0°C (high = +85.0°C, crit = +105.0°C)
Core 3: +65.0°C (high = +85.0°C, crit = +105.0°C)

So yeah, coretemp seems to be the culprit. So far no answer from the proxmox folks, but I can at least dig in to that and look for changes in their module.
 

blood

Member
Apr 20, 2017
42
14
8
44
I feel obligated to update this again.

I didn't know that Proxmox takes their kernel from Ubuntu (with some additional patches), but they do, so I started to go down the rabbit hole of looking at the differences between that kernel and what Debian shipped. The coretemp module's code did have some changes, but they looked pretty innocent...

Just for kicks, I installed Proxmox 5.0 rather than 5.2 and was happy to see that the temperature was reported correctly. I then dist-upgraded my way to a 4.15.17-3 kernel, and after a reboot it's still correct. So clearly _something_ is different than when I went straight to 5.2, but I don't know what - and it's working correctly now.

I kept the old install on a different disk, so I can flip back and forth easily enough but I'm not sure if I wanna just finish what I set out to do now, or keep digging to understand what is causing the discrepancy. For now I'm going to step away from this box...
 
  • Like
Reactions: Tha_14