Server locking up... Logs show voltage issues... help?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NOTORIOUS VR

Member
Nov 24, 2015
78
7
8
43
So recently I've been experiencing random lock ups.... my server (ESXi) will just lock up and become unresponsive, except for the IPMI.

Checking the server thresholds and logs through the IPMI shows after the lock up shows all motherboard voltages are reading 0V... CPU, memory, etc, all sensors are showing 0 as well at that time.

logs seem to show either the voltages going critical high or low. This time (7:31 EST this morning) the logs show high voltage, all the CPU, system and memory temps are high, etc. It seems like every sensor on the MB just goes crazy. VBATT too (and I changed the CMOS battery some time last year as I had a VBATT error randomly show up at that time).

Resetting the machine doesn't clear the errors either, if I reset the system will not boot and just have the MB beeper screaming at me. I have to power down the server completely and then on again and it boots up as if nothing happened.

The failures (3rd time now) are completely random. The first two were back to back on the same day. The last one took 2 weeks to show up, and I was browsing the web from my phone when all of a sudden I noticed my phone dropped my Wifi AP. No sounds/alarms from the server at all. Not response to pings on any of the subnets related to my VM's/ESXi. only able to ping my hardware devices (switch, IPMI, etc).

I certainly don't believe it is my PSU (EVGA 1300W) as in my experience PSU's either fail or work. Input voltage is stable (and running through a Cyberpower UPS).

I believe it might be my Supermicro MB that is starting to fail, but not sure. Hoping someone might have has something similar happen to them to point me in a direction.

Log from after failure:

Code:
86 09/22/2018 07:32:53 VBAT Voltage Upper Non-Recoverable - Going High - Asserted 
85 09/22/2018 07:32:53 VBAT Voltage Upper Critical - Going High - Asserted 
84 09/22/2018 07:32:52 VBAT Voltage Upper Non-Critical - Going High - Asserted 
83 09/22/2018 07:32:50 +3.3VSB Voltage Upper Non-Recoverable - Going High - Asserted 
82 09/22/2018 07:32:50 +3.3VSB Voltage Upper Critical - Going High - Asserted 
81 09/22/2018 07:32:49 +3.3VSB Voltage Upper Non-Critical - Going High - Asserted 
80 09/22/2018 07:32:47 +3.3V Voltage Upper Non-Recoverable - Going High - Asserted 
79 09/22/2018 07:32:47 +3.3V Voltage Upper Critical - Going High - Asserted 
78 09/22/2018 07:32:47 +3.3V Voltage Upper Non-Critical - Going High - Asserted 
77 09/22/2018 07:32:44 +12V Voltage Upper Non-Recoverable - Going High - Asserted 
76 09/22/2018 07:32:44 +12V Voltage Upper Critical - Going High - Asserted 
75 09/22/2018 07:32:44 +12V Voltage Upper Non-Critical - Going High - Asserted 
74 09/22/2018 07:32:41 +5V Voltage Upper Non-Recoverable - Going High - Asserted 
73 09/22/2018 07:32:41 +5V Voltage Upper Critical - Going High - Asserted 
72 09/22/2018 07:32:41 +5V Voltage Upper Non-Critical - Going High - Asserted 
71 09/22/2018 07:32:38 +1.5V Voltage Upper Non-Recoverable - Going High - Asserted 
70 09/22/2018 07:32:38 +1.5V Voltage Upper Critical - Going High - Asserted 
69 09/22/2018 07:32:38 +1.5V Voltage Upper Non-Critical - Going High - Asserted 
68 09/22/2018 07:32:35 CPU2 DIMM Voltage Upper Non-Recoverable - Going High - Asserted 
67 09/22/2018 07:32:35 CPU2 DIMM Voltage Upper Critical - Going High - Asserted 
66 09/22/2018 07:32:35 CPU2 DIMM Voltage Upper Non-Critical - Going High - Asserted 
65 09/22/2018 07:32:33 CPU1 DIMM Voltage Upper Non-Recoverable - Going High - Asserted 
64 09/22/2018 07:32:32 CPU1 DIMM Voltage Upper Critical - Going High - Asserted 
63 09/22/2018 07:32:32 CPU1 DIMM Voltage Upper Non-Critical - Going High - Asserted 
62 09/22/2018 07:32:30 CPU2 Vcore Voltage Upper Non-Recoverable - Going High - Asserted 
61 09/22/2018 07:32:29 CPU2 Vcore Voltage Upper Critical - Going High - Asserted 
60 09/22/2018 07:32:29 CPU2 Vcore Voltage Upper Non-Critical - Going High - Asserted 
59 09/22/2018 07:32:27 CPU1 Vcore Voltage Upper Non-Recoverable - Going High - Asserted 
58 09/22/2018 07:32:26 CPU1 Vcore Voltage Upper Critical - Going High - Asserted 
57 09/22/2018 07:32:26 CPU1 Vcore Voltage Upper Non-Critical - Going High - Asserted 
56 09/22/2018 07:32:24 System Temp Temperature Upper Non-Recoverable - Going High - Asserted 
55 09/22/2018 07:32:23 System Temp Temperature Upper Critical - Going High - Asserted 
54 09/22/2018 07:32:23 System Temp Temperature Upper Non-Critical - Going High - Asserted 
53 09/22/2018 07:32:11 P2-DIMM3B-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
52 09/22/2018 07:32:10 P2-DIMM3B-TEMP Temperature Upper Critical - Going High - Asserted 
51 09/22/2018 07:32:10 P2-DIMM3B-TEMP Temperature Upper Non-Critical - Going High - Asserted 
50 09/22/2018 07:32:08 P2-DIMM3A-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
49 09/22/2018 07:32:08 P2-DIMM3A-TEMP Temperature Upper Critical - Going High - Asserted 
48 09/22/2018 07:32:07 P2-DIMM3A-TEMP Temperature Upper Non-Critical - Going High - Asserted 
47 09/22/2018 07:32:05 P2-DIMM2B-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
46 09/22/2018 07:32:05 P2-DIMM2B-TEMP Temperature Upper Critical - Going High - Asserted 
45 09/22/2018 07:32:04 P2-DIMM2B-TEMP Temperature Upper Non-Critical - Going High - Asserted 
44 09/22/2018 07:32:02 P2-DIMM2A-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
43 09/22/2018 07:32:02 P2-DIMM2A-TEMP Temperature Upper Critical - Going High - Asserted 
42 09/22/2018 07:32:01 P2-DIMM2A-TEMP Temperature Upper Non-Critical - Going High - Asserted 
41 09/22/2018 07:31:59 P2-DIMM1B-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
40 09/22/2018 07:31:59 P2-DIMM1B-TEMP Temperature Upper Critical - Going High - Asserted 
39 09/22/2018 07:31:58 P2-DIMM1B-TEMP Temperature Upper Non-Critical - Going High - Asserted 
38 09/22/2018 07:31:56 P2-DIMM1A-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
37 09/22/2018 07:31:56 P2-DIMM1A-TEMP Temperature Upper Critical - Going High - Asserted 
36 09/22/2018 07:31:56 P2-DIMM1A-TEMP Temperature Upper Non-Critical - Going High - Asserted 
35 09/22/2018 07:31:53 P1-DIMM3B-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
34 09/22/2018 07:31:53 P1-DIMM3B-TEMP Temperature Upper Critical - Going High - Asserted 
33 09/22/2018 07:31:53 P1-DIMM3B-TEMP Temperature Upper Non-Critical - Going High - Asserted 
32 09/22/2018 07:31:50 P1-DIMM3A-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
31 09/22/2018 07:31:50 P1-DIMM3A-TEMP Temperature Upper Critical - Going High - Asserted 
30 09/22/2018 07:31:50 P1-DIMM3A-TEMP Temperature Upper Non-Critical - Going High - Asserted 
29 09/22/2018 07:31:47 P1-DIMM2B-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
28 09/22/2018 07:31:47 P1-DIMM2B-TEMP Temperature Upper Critical - Going High - Asserted 
27 09/22/2018 07:31:47 P1-DIMM2B-TEMP Temperature Upper Non-Critical - Going High - Asserted 
26 09/22/2018 07:31:44 P1-DIMM2A-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
25 09/22/2018 07:31:44 P1-DIMM2A-TEMP Temperature Upper Critical - Going High - Asserted 
24 09/22/2018 07:31:44 P1-DIMM2A-TEMP Temperature Upper Non-Critical - Going High - Asserted 
23 09/22/2018 07:31:41 P1-DIMM1B-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
22 09/22/2018 07:31:41 P1-DIMM1B-TEMP Temperature Upper Critical - Going High - Asserted 
21 09/22/2018 07:31:41 P1-DIMM1B-TEMP Temperature Upper Non-Critical - Going High - Asserted 
20 09/22/2018 07:31:39 P1-DIMM1A-TEMP Temperature Upper Non-Recoverable - Going High - Asserted 
19 09/22/2018 07:31:38 P1-DIMM1A-TEMP Temperature Upper Critical - Going High - Asserted 
18 09/22/2018 07:31:38 P1-DIMM1A-TEMP Temperature Upper Non-Critical - Going High - Asserted 
17 09/22/2018 07:30:18 Fan5 Fan Lower Non-Recoverable - Going Low - Asserted 
16 09/22/2018 07:30:17 Fan5 Fan Lower Critical - Going Low - Asserted 
15 09/22/2018 07:30:17 Fan5 Fan Lower Non-Critical - Going Low - Asserted 
14 09/22/2018 07:29:09 Fan7 Fan Lower Non-Recoverable - Going Low - Asserted 
13 09/22/2018 07:29:08 Fan7 Fan Lower Critical - Going Low - Asserted 
12 09/22/2018 07:29:08 Fan7 Fan Lower Non-Critical - Going Low - Asserted 
11 09/22/2018 07:29:06 Fan6 Fan Lower Non-Recoverable - Going Low - Asserted 
10 09/22/2018 07:29:05 Fan6 Fan Lower Critical - Going Low - Asserted
Readings after a power off and on (and system is working as normal):

Code:
CPU1 Temp Normal Low 
CPU2 Temp Normal Low 
System Temp Normal  43 degrees C 
CPU1 Vcore Normal  0.92 Volts 
CPU2 Vcore Normal  0.952 Volts 
CPU1 DIMM Normal  1.56 Volts 
CPU2 DIMM Normal  1.56 Volts 
+1.5V Normal  1.504 Volts 
+5V Normal  5.056 Volts 
+12V Normal  12.084 Volts 
+3.3V Normal  3.24 Volts 
+3.3VSB Normal  3.216 Volts 
VBAT Normal  3.216 Volts 
Fan1 Not Available No Reading 
Fan2 Not Available No Reading 
Fan3 Not Available No Reading 
Fan4 Not Available No Reading 
Fan5 Normal  1080 RPM 
Fan6 Lower Critical  675 RPM 
Fan7 Normal  1620 RPM 
Fan8 Not Available No Reading 
Intrusion    Detected 
PS Status    OK 
P1-DIMM1A-TEMP Normal  54 degrees C 
P1-DIMM1B-TEMP Normal  50 degrees C 
P1-DIMM2A-TEMP Normal  55 degrees C 
P1-DIMM2B-TEMP Not Available No Reading 
P1-DIMM3A-TEMP Normal  52 degrees C 
P1-DIMM3B-TEMP Not Available No Reading 
P2-DIMM1A-TEMP Normal  51 degrees C 
P2-DIMM1B-TEMP Normal  53 degrees C 
P2-DIMM2A-TEMP Normal  54 degrees C 
P2-DIMM2B-TEMP Not Available No Reading 
P2-DIMM3A-TEMP Normal  52 degrees C 
P2-DIMM3B-TEMP Not Available No Reading