Supermicro X7DBE+ stopped booting, IPMI sensors all off!

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

lpallard

Member
Aug 17, 2013
276
11
18
Hmmmm so the enclosure I purchased from UnixSurplus a few weeks ago was working fine until this morning.

Yesterday I initiated a memtest that lasted all night long. This morning I stopped it since it was running for 10 hours straight with no errors, then I went to the BIOS and changed a few options (nothing major such as boot sequence, changed the fan control to 4 pin workstation in the hope of making it less noisy, activated POST tests and deactivated quick boot, etc)...

Then I rebooted and it never came back online. The fans are running full speed, the screen stays black.

I went to the IPMI and went to the sensors, OUCH!! See screenshot. Almost everything is off the charts!

Both redundant PSU's LEDs are green, I hear no beeping when it starts. Other funny thing, the IPMI says FANS 1-5-6 are off but I can see then spinning. All chassis fans are spinning. I will confirm later tonight when I crack the case wide open.

What am I dealing with here?

 
Last edited:

lpallard

Member
Aug 17, 2013
276
11
18
Forgot to mention: I just flashed the BIOS and ipmi firware about 3 days ago but everything seemed fine until now..

Other thing, the first time I went to the IPMI my attention was caught on the sensors (they were all green and OK) but CPU 3 & 4 had no readings.. I assumed they were not reading because I thought the IPMI was referring to physical CPU sockets and this chassis has only 2 sockets..
 

lpallard

Member
Aug 17, 2013
276
11
18
OK some follow up for folks interested (or curious) in this issue:

After contacting them, Supermicro recommended to reset (clear) the CMOS by shorting the JPT1 pads on the mobo. They think an incorrect BIOS setting prevented the server from Posting. I did reset the CMOS & the server came back online and booted up.

Then in order:

-Heard 2 short beeps
-Screen came online and displayed "Error 0251 System CMOS checksum bad"
-I entered the BIOS and changed a few settings
-Rebooted server
-Went back in BIOS to see the sensor data

This is when I started to think the IPMI card (daughter card) may be defective and was reporting erroneous sensor values.

In BIOS, under IPMI -> Realtime sensor data I saw:
*CPU temp 1=203
*CPU temp 2=203
*CPU temp 3 = 28
*CPU temp 4 = 203
*CPU Vcore 1=1.65 (upper limit is 1.61)
*CPU Vcore 1=1.65 (upper limit is 1.61)
*3.3V is 1.98 (lower limit is 2.96 and upper limit is 3.63)

All these values are seriously off the charts.

Then in BIOS under hardware system health monitor:
*FAN1=5869RPM
*FAN2=6000RPM
*FAN3=N/A
*FAN4=5921RPM
*FAN5=5152RPM
*FAN6=4963RPM
*FAN7=N/A
*FAN8=N/A
*Fan speed control mode set to 3 pin workstation
*VcoreA=1.084V
*VcoreB=1.074V
*-12V=-12.214V
*P1V5=1.472V
*+3.3V=3.312V
*+12V=11.904
*5VSB=4.920V
*5VTT=4.968V
*P_VTT=1.184V
*Vbat=3.264V

These numbers are MUCH better.

In the IPMI WebGUI, all values are pretty much off as per my screenshot of this morning. Also same numbers as reported in the BIOS -> IPMI realtime sensors.

The server has been running without problems for about 6 hours, and heavily loaded. Nothing abnormal so far, but since everything seems to be working fine (will do more testing but right now I can launch the OS and everything is OK) I am suspecting IPMI add-on card failure or incorrectly flashed firmware since all IPMI numbers are off BUT BIOS reports proper values. Have you guys seen anything like this before? The reason why it was not posting is still unknown and I no longer think the sensor values are responsible as now it works perfectly and the sensor readings are still off..

Should I change the IPMI card? Could it be defective? Are the IPMI sensor readings really showing a serious problem preventing the server from even posting???
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
Pretty cool you got support on a server that old :)

I have never seen IPMI do that, at least on newer platforms.
 

bwillcox

Member
Jan 20, 2013
32
0
6
Tejas
We see that a lot at work. The BMC gets out of sync with the motherboard.

Shut her down and give it a good long power pull (10-15 minutes).

If that doesn't help then reflash the ipmi card. Don't save the settings and let it go back to defaults.

If that doesn't help then maybe time to hit up SM again and/or swap out the SIMSO card with a different one.

-b
 

lpallard

Member
Aug 17, 2013
276
11
18
Hmmm I tried pulling power for a good hour, then restarted the server, to no avail.

Then I flashed the IPMI card again, and it didnt work.

Funny because I am pretty sure when the card had the previous firmware it worked just fine.

Can I SAFELY flash the ipmi card to a previous firmware???

Other than that, it may be defective and I askled the ebay seller to send me a replacement IPMI card.
 

bwillcox

Member
Jan 20, 2013
32
0
6
Tejas
Yes, you can down flash the ipmi firmware. Don't attempt that with the bios on the board though.

The SIMSOs do go duff too and sounds like that may be the case here. Hope your seller fixes you up on it.