X11DPi-nT BMC crashing

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gtech1

Member
May 27, 2019
112
8
18
I'm onto my second SM server, purchased brand new from official resellers that has issues. In both cases there appears to have been signs of prior usage of the components but that can't be, right ?

In any case, this brand new X11DPi-NT BMC is crashing. The server remains online but the OS re-registers some USB devices and it retriggers the chassis intrusion.
The seller of the machine tested it for two days before shipping to us. We tested it for 4 days using Memtest86 only. Everything was clean so we put it into production.

1 week later, bam, all signs point to BMC crashing. I checked the IPMI Maintenance logs and I do see some entries from 2023 about the BMC being reset, and I bought this server last month.

So, does anyone know if:
- upon manufacture, those might be the original logs from when they reset the board ? They just say it was done via Interface RMCP and IPMI configuration restored to default successfully , BMC was reset successfully

or

- have you heard about the BMC crashing like this ? Basically same behavoir as described here: IPMI Related kernel log error messages - Is this a sensible course of action?
 

RolloZ170

Well-Known Member
Apr 24, 2016
6,716
2,075
113
I checked the IPMI Maintenance logs and I do see some entries from 2023 about the BMC being reset, and I bought this server last month.
BIOS time settings.
board can be laying around unused for years.
what pcb board revision is that ?
 

gtech1

Member
May 27, 2019
112
8
18
Not sure how I can get the pcb revision. I tried using dmidecode but it doesn't print it. It should be the latest since the board was purchased 1 month ago

Also, good guess about the BIOS time but that's not it. There's clearly a different login mechanism and IP than mine

1732207344586.png

1732207396532.png
 

gtech1

Member
May 27, 2019
112
8
18
its printed in nice letters behind the X11DPi-NT on the motherboard
A bit late for that now. I guess SM can pull it based on the serial #. But assuming it's the latest revision, is this a known bug ?

That other thread describing what I'm seeing goes back to 2019.
 

RolloZ170

Well-Known Member
Apr 24, 2016
6,716
2,075
113
But assuming it's the latest revision, is this a known bug ?
no. but you can see whether its a new or replacement board.
rev. 1.21 and older = VRM bug. that revision are replaced by supermicro for free, even if the warranty period is over.
BMC FW chip can be worn out, reflash helps but only for some day's.
afaik X11DPi-N(T) are not produced new today, but i can be wrong.
 

gtech1

Member
May 27, 2019
112
8
18
wow, solid info - thank you!
do you have more details about this VRM bug ? Does it affect just the BMC or can it cause other more serious issues ? I mean, it's already pretty serious that it reset the onboard NICs and cutt of communication for 30 seconds, but can it get to worse than this ?

Basically I need to know how quickly I need to remove this from production.

And once again, assuming I do find out it's an older Rev, why the hell would they sell me that in October 2024.
 

RolloZ170

Well-Known Member
Apr 24, 2016
6,716
2,075
113
do you have more details about this VRM bug ? Does it affect just the BMC or can it cause other more serious issues ?
no, the VRM blows up, not the BMC. i meant the board can be a replacment of a old board, not brand new.
 

gtech1

Member
May 27, 2019
112
8
18
gtfo outta here. I need to remove this asap then. confirming now the revision with SM
 

gtech1

Member
May 27, 2019
112
8
18
well, SM checked the serial # and they say it's: PCB - 2.01A but that it may not have the latest 'ECO' updates and it needs to be sent back to be replaced ? Do you know what these eco updates are ? First time I hear about it
 

RolloZ170

Well-Known Member
Apr 24, 2016
6,716
2,075
113
but that it may not have the latest 'ECO' updates and it needs to be sent back to be replaced ? Do you know what these eco updates are ? First time I hear about it
upgrades (vrm programming) and replacements to match latest rules and requirements,
e.g. DDR VRMs to supply new big RAM RDIMM.
 

gtech1

Member
May 27, 2019
112
8
18
and that can't be done via firmware ? It's a physical process ?
side question: do you know of a way to get the uptime of the BMC ?
 

gtech1

Member
May 27, 2019
112
8
18
this is crazy. SM says they haven't seen this issue before either on REV 2.01A, but it looks just like that thread where the guy had a 1.21 rev board..
 

RolloZ170

Well-Known Member
Apr 24, 2016
6,716
2,075
113
this is crazy. SM says they haven't seen this issue before either on REV 2.01A
i wrote the 1.21 had this issue, and SM replace those for free.
maybe you have a replacement board(not new as stated by seller)
 

gtech1

Member
May 27, 2019
112
8
18
But replacement boards shouldn't have the issue anyways, right ?

In any case, I triple checked with the seller ( Ava-Direct ) AND SM. Everyone showed me their purchase orders... this was sold and purchased as a brand new board. The only way for it to be what you say, is for SM to have sold Ava a replacement board, as new.