H12SSL-I Stuck at "bmc initiating"

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Debtor8100

New Member
Dec 12, 2022
12
0
1
The symptoms can vary widely because it depends on the kind of damage and the amount of damaged ICs.
Lets say, one of the three, the one for the 1.2V for its memory, is damaged enough and in a way to not produce any output voltage.
That would render the BMC core and logic functional, just the memory for everything else like the OS, management and GPU would not be working.
I find it likely that the BMC would indicate such an issue with fast blinking.


Possibly similar in case just the Core-voltage for the ARM-Cores inside is not being produced but the rest is there.

With more damaged ICs, it gets more likely that the BMC would not even be able to blink a LED.

If the damage is more catastrophic in nature, it could permanently kill the BMC and even further impact the board.

You can measure the output-voltages with a multi-meter and make sure if and what the case is.
That makes sense.

I think my testing with the board is done, I'm done for the time being.

very exaggerated. NO manufacturer mounts useless tiny smd elements on they're motherboards. even if you brick off a little resistor a motherboard becomes fully/partialy unfunctional regardless of the manufacturer.
I understand your point, but this is starting to be an edge case of viability of the design. I have broken a motherboard before, but I knew the moment it happened. Screwdriver slipped on the board and knocked off a surface mounted resistor with its' weight. My bad, off to buy another motherboard.

This one I find harder to swallow. I'am not exaggerating when I say the build was non eventful. Everything went smooth, nothing got dropped or bumped in a way I would've been concerned. The fact that I didn't even consider surface damage as the cause tells a lot. I did not feel that should've even been a possibility with a smooth install.

Of course the fault lies with me since I broke the damn thing, but that does not mean that the board is well designed. For crying out loud, they put three extremely fragile ICs straight under the bottom PCIe slots where cards are going to be installed mere millimeters away from them. The margin for error during installation is ridicilously small. And to add to that the way they handle the RMA's is just "You broke it". The fair way to treat this would just aknowledge the vulnerable design on part of the BMC and treat those RMA's differently.

Other server and consumer boards I've installed have all had the same level of care during install and this is the first one to break without the knowledge on when/how it happened. I even opted for Supermicro because their BIOS and IPMI implementations have been rock solid in my use. I'm just disappointed to see this sort of design. As @RageBone said, considering that you can't replace or obtain the IC's yourself, Supermicro denies RMAs, the IC's are not protected at all and placed the way they are, this really feels like a trap from consumers perspective. Not calling out any conspiracies, just a typical way manufacturers act when their product has faults they don't recognize.
 

RageBone

Active Member
Jul 11, 2017
617
159
43
, considering that you can't replace or obtain the IC's yourself,
That is not what i said.
If i had them, i could replace them on my SSL here.
I don't know the manufacturer, or the model. Searches for the writing on them have not been successful so far.
So those ICs could be available, i just don't know it yet.

I was saying that other experiences with similar ICs from phones and laptops do not bode well.
Those usually are not easily available and can even be under NDA.
The keyword being "usually".
I would not be surprised if that is the case here as well.

I also did not say that SM denies RMAs.
This one specific case i got handed to me got denied because there was strong visible scratching on the board around a screw-hole close to those 3 ICs.
There were pictures exchanged in the conversation with SM that i think were clearly showing that.

I find myself to be very conflicted with this matter.
Because if i were to break it, it is clearly my fault, no discussion about that.
But at what point is it too easy to break?
Is that gona become the default reasoning?
What if those ICs happen to die by them-self after a while?
Is that gona be blamed on me by default?
It is honestly unlikely that they just fail by themselves, but it is not impossible.
Manufacturing defects happen and there is a hole supply chain in the way where someone can **** up, touch it the wrong way and somehow break it.
Could i be getting a DOA and be blamed for breaking one of those ICs?

My perspective on the repair of such damage simple.
It is voltage-regulation, one of the most common and so to speak, easy to fix problems.

What can become a problem: Further damage through a catastrophic failure of the voltage regulator.
In my case on the DSi, the Aspeed AST2600 is surely dead and replacing that is still doable, but a bigger ordeal then just replacing that 6 ball IC.

And then we have the issue of replacement parts.
I can luckily buy the BMC, just not everywhere or at low cost so to speak. Minimum order quantity were 5 at 75$ a piece from a single supplyer.
If i could get those regulator ICs, that'd be great.

Another option of course is replacing those with something else compatible.
Just what? A bit of reverse engineering is required for that in the future i guess.
 

Debtor8100

New Member
Dec 12, 2022
12
0
1
That is not what i said.
If i had them, i could replace them on my SSL here.
I don't know the manufacturer, or the model. Searches for the writing on them have not been successful so far.
So those ICs could be available, i just don't know it yet.

I was saying that other experiences with similar ICs from phones and laptops do not bode well.
Those usually are not easily available and can even be under NDA.
The keyword being "usually".
I would not be surprised if that is the case here as well.
Ok I see, sorry for the misunderstanding on my part

I find myself to be very conflicted with this matter.
Because if i were to break it, it is clearly my fault, no discussion about that.
But at what point is it too easy to break?
Is that gona become the default reasoning?
What if those ICs happen to die by them-self after a while?
Is that gona be blamed on me by default?
It is honestly unlikely that they just fail by themselves, but it is not impossible.
Manufacturing defects happen and there is a hole supply chain in the way where someone can **** up, touch it the wrong way and somehow break it.
Could i be getting a DOA and be blamed for breaking one of those ICs?

My perspective on the repair of such damage simple.
It is voltage-regulation, one of the most common and so to speak, easy to fix problems.
This was exactly my point here. A good comparison in my opinion should be phone screens. They are vulnerable, but rarely are there cases that it is unclear who is at fault when the screens are broken. This is because the screens actually can take some damage, but when the treshold is broken user just has to admit fault.

Here lies the problem: What is that actual treshold with these ICs? If they can be broken so easily that it isn't even noticeable during install, is the desing really good enough? If the parts are this fragile they should be easier to just swap and the parts should be public knowledge, in my opinion. That or SM should be more understanding about these issues.

Personally I wouldn't even go as far as to wonder if the parts could die by themselves or during shipment. Those are edge cases. I would be more worried about the installations. The boards are bought to be installed, as phone screens are to be used. Are the boards durable enough to survive a normal usecase? I think this is cutting it way too close.
 
Last edited:

ocfguy

Active Member
Oct 25, 2022
100
50
28
Update: I requested an RMA with Supermicro and they replaced my board.

I did some research and it seems like BMC issues are not rare on the H12SSL-i. Many reports on Chinese tech forums about BMC failures on H12 series mobos, mostly H12SSL:

 

joet

New Member
May 27, 2018
15
2
3
I'm both glad and sad I found this thread - it confirmed what I'd done to my H12SSL and saved me any more wasted diagnostic time, but ouch what a bad idea to put such delicate components so close to the edge of the board where one clumsy card insertion could scrape them off.

Does anyone know whether putting Kapton tape over those components would protect them mechanically without causing overheating issues?
 

RageBone

Active Member
Jul 11, 2017
617
159
43
Does anyone know whether putting Kapton tape over those components would protect them mechanically without causing overheating issues?
i don't think Kapton tape is enough.
My preference would be something that can take away the "impact-force".
So, mabe a big blob of silicon / conformal coating?
Or have the chips encased in resin / plastic.
But in all of those cases, cooling of those ICs becomes a question.
It could not be a problem, but it could also very well become a problem.
Just because something is low power doesn't mean that it isn't running "hot"
 
  • Like
Reactions: joet

RolloZ170

Well-Known Member
Apr 24, 2016
5,139
1,546
113
Does anyone know whether putting Kapton tape over those components would protect them mechanically without causing overheating issues?
now, knowing the problem you can make a little sheet of cardboard as a temporäry protector.
but is can still not follow what the problem is:
SP3/TR4: let's tackle it, 4094 pins to bent...
 

nasbdh9

Active Member
Aug 4, 2019
164
96
28
None of the dozen or so H12SSL users I know have escaped this BMC problem, the motherboards are bound to have BMC failures after running for a while, SM should recall them...
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,139
1,546
113
the motherboards are bound to have BMC failures after running for a while, SM should recall them...
they do if they found a design failure like SM has done with the X11DPi-N(T) VRM bug.
i was talking about mechanical damage by inserting a PCIe card.
the BMC failure can a result of overheating (BMC voltage regs) or FW issue, X12 and H12 are all RoT Firmware.
 

custom90gt

Active Member
Nov 17, 2016
223
95
28
39
I wish I would have read this topic before purchasing a used H12SSL-I instead of a new one. My exact thought was "I'll buy it used to save $100 since SM makes such reliable boards" lol
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,139
1,546
113
My exact thought was "I'll buy it used to save $100 since SM makes such reliable boards" lol
half of the issues are after BMC FW update. before missing sensor data for some reason.
NEW( :p ) china(ebay) H12SSL maybe have other/strange Firmware ?
U6 runs hot, but the 10G heatsink is hot too, even in standby (no active FANs)
 

custom90gt

Active Member
Nov 17, 2016
223
95
28
39
half of the issues are after BMC FW update. before missing sensor data for some reason.
NEW( :p ) china(ebay) H12SSL maybe have other/strange Firmware ?
U6 runs hot, but the 10G heatsink is hot too, even in standby (no active FANs)
I did order my used MB from Amazon. I'm contemplating returning it and getting a new one so I am at least covered under warranty for a bit. The -I variant doesn't have 10GB right? Dunno if that makes any difference in the BCM failures.
 

custom90gt

Active Member
Nov 17, 2016
223
95
28
39
used: warehouse-deals ? got several supermicro and ASUS C621e sage in horrible condition, bent pins etc.
the advantage is: you can return even a damaged board to AZ.
Sadly through a 3d party, VPCI. We will see how much difficulty I have to go through to return it.
 

RageBone

Active Member
Jul 11, 2017
617
159
43
I did order my used MB from Amazon. I'm contemplating returning it and getting a new one so I am at least covered under warranty for a bit. The -I variant doesn't have 10GB right? Dunno if that makes any difference in the BCM failures.
10GBE (T in the name) should not have any impact on the risk of you killing the Voltage-Regulators.
To call them BMC Failures is a bit misleading.
At least all cases i know of were caused by the owner.
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,139
1,546
113
that picture is sadly not good enough, to really make it out.
But U4 looks uniform and fine, U6 does not look uniform but with a bit of dust / grime, i think that one might still be ok.

U5 is in my opinion very likely dead and splintered.
Those are bare unprotected silicon wafer chips on a few solder-balls tasked with converting voltages for the BMC from 3.3VSB, called Buck-Converter.

If one of those voltages is missing,the BMC can't fulfill its job and to me, that it still can blink is luck.
I have a H12DSi with which that happened, but it killed the BMC by giving it 3.3V instead.

You can make sure of what i say by measuring the black inductors of the Buck-converters.
i have marked all three of them and where you can measure the voltage in this picture.
Please measure the voltages on all 3 and i hope for you that they all have some voltage on them.
I suspect the middle one for U5 to measure 0V.
this P/N is TPS62088YFPR
 
  • Like
Reactions: RageBone

hmw

Active Member
Apr 29, 2019
570
226
43
Ugh - *just* got hit by this. I installed a NIC card and lo and behold - system would not boot. The LE1 LED flashes and turns off. It stays lit when power is applied. But there's no sign of life from the LEDM1 BMC LED. Booting produces nothing, system turns 'on' in the sense the fans power up but no output from the onboard VGA.

Looks like U4 might have had some scrapes and a damaged edge. For those who sent this motherboard back to Supermicro - how much did it cost to repair?


problem.jpg

Keep in mind I've known of this thread and I have been careful when I put the new NIC card in - it is indeed amazing how fragile this board is !
 
Last edited:

RolloZ170

Well-Known Member
Apr 24, 2016
5,139
1,546
113
can someone designe a 3D model of a U4-6 protector ? mounting with the MH1 hole shown on the picture and one or both of the heatsink holes of the Broadcom chip.
some grid holes in it for air circulation.
 
  • Like
Reactions: RageBone

hmw

Active Member
Apr 29, 2019
570
226
43
The motherboard was in a recommended SuperMicro chassis because I thought it would be better protected. And it didn't make any difference.

Will try to send it off for repairs (folks on the thread have commented how Supermicro refused to repair their motherboard) but don't really have much faith in SM. Even if repaired - the fact that this can happen at any time - is deeply concerning. In a homelab, folks often swap cards and generally are more hands on with the systems. Compared to the H12SSL/CS-826 that has been treated with kid gloves - my Dell R440s have been poked, prodded and even dropped - and they survived and are doing just fine
 
  • Like
Reactions: vvkvvk