SOLVED (OLD Servers) - Two SuperMicro servers died without reason

  • Thread starter Deleted member 36422
  • Start date
Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.
D

Deleted member 36422

Guest
I had two identical 12 drive chassis with a X8DT3-LN4F MB and 48GB of ram in them. I was using them with freenas along with external fibre channel storage units for backups. Initially, I was backing up to the first unit and that was being backed up to the second unit for redundancy.

System: SuperMicro (MB: X8DT3-LN4F)
CPU: Dual Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (2261.05-MHz K8-class CPU)
12GB memory - Kingston KVR1333D3D8R9S/2G PC3-10600 (upgraded to 48GB)
12 x Western Digital WDC WD1002FBYS-0 on LSILogic SAS/SATA Adapter
4 Intel(R) PRO/1000 Network Connection version - 2.4.0

Well, out of the blue, the first one simply died. It's the only word I can use because it stumped me for the longest time. I checked everything possible from power supplies to fans to memory, re-seating and even replacing with redundant parts I had bought to maintain these. I searched high and low on the Internet, found all kinds of posts, things to try including removing practically everything to make sure that every part was seemingly working and never found the problem. I even replaced the power combiner board just in case the server wasn't getting enough power and again, nothing.

I ended up using the second machine as the primary and another non supermicro as a secondary. I always have multiple backups.

Yesterday, the second machine died too. Exactly as the first did, nothing obvious at all. Re-seating everything doesn't help. I got the air out and blew everything clean as best I could, no change. I changed the power supplies, no change.

Just like the other, the two Eth LED's on the front blick fast when it's got power but not turned on. Pushing the power button shows blue LED lights for all drives, then red for a few seconds then nothing. Nothing on the screen, never boots, never beeps, nothing what so ever.

To me, it's as if the mainboard simply dies, just stops fully working but I feel like maybe there is a little trick, something I've not found in the countless documents and posts I've read.

I'm thinking I will never use SuperMicro again since I'm an IBM guy mostly but sometimes I like to use these servers but now nervous about them.
Has anyone experienced anything like this? And do you have any thoughts you could share on what I might check?
 
Last edited by a moderator:

i386

Well-Known Member
Mar 18, 2016
4,241
1,546
113
34
Germany
Can you get into ipmi and see the health status or other information in the logs?
 

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,090
1,507
113
You have systems that are 12+ years old, so hardware failures are not surprising given the age, regardless of manufacturer. Electrolytic capacitors of that era were problematic. I would inspect those as not all of the ones on the motherboard are solid-state.

This might be a good opportunity to upgrade as the power savings from more modern architectures will pay for themselves in reduced power utilization fairly quickly.
 
D

Deleted member 36422

Guest
Yes, I know these are old :). As I mentioned, I use other hardware as well. I like to use hardware as long as I possibly can in some cases and others, I never take chances.

Inspecting the caps would mean having to pull the board which I don't have time to do being totally overwhelmed lately. I figured I'd post to see if there might be something I'm missing like trying the ipmi which I've not done yet. I can do that easily enough and report back.
 
Last edited by a moderator:

eduncan911

The New James Dean
Jul 27, 2015
648
506
93
eduncan911.com
Well, if you can find the time, connect a standard ATX PSU to the motherboard 24-pin + 8-pin(s) and try to power it on. Could be the PDUs in the chassis just don't like your battery backup.
 
D

Deleted member 36422

Guest
>Well, if you can find the time, connect a standard ATX PSU to the motherboard 24-pin + 8-pin(s) and try to power it on.
>Could be the PDUs in the chassis just don't like your battery backup.

PDU's in the chassis? There are no PDU's in these chassis, PDU's are external power bars connected to UPs/Generator. Am I missing something or we are just using different terminology?

>can you access SEL

I posted images above of what I can see in terms of logs. After cycling power from the ipmi, I do see this in the logs.
I doubt it but could it be as simple as replacing the CMOS battery?


32VBATVoltageLower Non-Recoverable - Going Low - Asserted
31VBATVoltageLower Critical - Going Low - Asserted
30VBATVoltageLower Non-Critical - Going Low - Asserted
 
D

Deleted member 36422

Guest
Ah, yes, the PDB is what I changed on the first server that did this. I changed that board twice to be sure and no difference. I'm not sure I have any more handy but I suspect that is not the problem.

I'll try changing the CMOS battery next.
 
D

Deleted member 36422

Guest
Nope, no difference other than no VBAT warnings but nothing much for logging either.
 

funkywizard

mmm.... bandwidth.
Jan 15, 2017
848
402
63
USA
ioflood.com
>Well, if you can find the time, connect a standard ATX PSU to the motherboard 24-pin + 8-pin(s) and try to power it on.
>Could be the PDUs in the chassis just don't like your battery backup.

PDU's in the chassis? There are no PDU's in these chassis, PDU's are external power bars connected to UPs/Generator. Am I missing something or we are just using different terminology?

>can you access SEL

I posted images above of what I can see in terms of logs. After cycling power from the ipmi, I do see this in the logs.
I doubt it but could it be as simple as replacing the CMOS battery?


32VBATVoltageLower Non-Recoverable - Going Low - Asserted
31VBATVoltageLower Critical - Going Low - Asserted
30VBATVoltageLower Non-Critical - Going Low - Asserted
Yeah sure looks like you need to replace the cmos battery. Not too surprising for something of that age.
 
D

Deleted member 36422

Guest
I replaced it with a brand new battery, no change.
 

Blinky 42

Active Member
Aug 6, 2015
615
232
43
48
PA, USA
I have had x8 and x7 boards die on me in strange ways over the years - I always figured it was the on motherboard voltage regulation getting out of spec with the caps or something going bad after 12-15 years. Not uncommon to have one dim at a time report as bad - like the server would crash and restart but missing one stick of memory on reboot. Reboot again and it sees all the populated memory for a day or a few months then crash and repeat. If that continued for too long eventually most end up like your situation where they are just "dead". Sometimes the ipmi would work but it wouldn't give any useful diagnostics. Swapping around memory wouldn't help, nor would pulling a CPU (all dual socket systems).
Since they are hot and slow and noisy, they got retired and replaced.

If you really want to keep them going i might have one or 2 in the basement in 4u 743/742 chassis that are yours for the cost of shipping.
 
D

Deleted member 36422

Guest
Yes, they were old and I'm sure simply worn out but that's kind of the challenge with old hardware too, seeing how long you can keep it going.

I had a 486 running for so many years that when I turned it off to move it, the switch actually disintegrated and it never powered up again.
I think it was on for something like 15 years, completely forgotten about, running some little used application that we didn't even realize was on that machine until it dissapeared at the same time.

I appreciate the offer but I'm more curious to know if it's actually done or if there is still a chance of firing it back up :).
 

RageBone

Active Member
Jul 11, 2017
617
159
43
Of course there is a way to get them working again.
BUT probably not without some effort debugging and board level repairs and workarounds.

First thing to look for would be Postcodes because if it still proceeds a bit away from FF or 00, not all is lost.

Sadly, sensor-data in the impi isn't available that early in post. An oversight for sure.

Another useful tool is a multi-meter to stab the board and measure the present/ not present voltages and resistances.
 
D

Deleted member 36422

Guest
There is no output what so ever on the monitor so nothing to see and no BIOS access of course.
 

i386

Well-Known Member
Mar 18, 2016
4,241
1,546
113
34
Germany
You could try to reflash the bios through ipmi and check then if the system can boot into bios or the os.
 

BlueFox

Legendary Member Spam Hunter Extraordinaire
Oct 26, 2015
2,090
1,507
113
You could try to reflash the bios through ipmi and check then if the system can boot into bios or the os.
Pretty certain that feature wasn't available on the X8 generation and didn't arrive until later.