Tyan S7012 BMC/IPMI stops. AST2050 area overheating.

ninebladed

New Member
Mar 13, 2018
4
1
3
53
I have a S7012GM4NR with a BMC failure problem.

I find that unless I cool the area around the AST2050 chip, the baseboard management controller(BMC) function stops.
And by stop I mean, the flashing green LED goes out, IPMI is no longer accessible via the LAN or locally, and the VGA screen graphics are torn and distorted.

It is not possible to boot the motherboard when the BMC is in the overheated/stalled state as it stops at the 'waiting on BMC' phase. If I apply power and start it quickly before the BMC overheats, it will boot.

The BMC function will stop about 60 seconds after power is applied to the motherboard unless the AST2050 area is cooled by a fan. Its therefore not possible to shut the server down and expect to be able to power it up again remotely via IPMI, as the BMC/AST2050 has overheated and stopped after the fans stopped following the last shutdown.
Basically the motherboard can't have power applied to it without being continually cooled around the AST2050 area.

I have reflashed the AST2050 with the latest 70120103.ima release but that makes no difference.
The idle power consumption is 96 watts with 2 x L5640, 96GB RAM, 1 x SSD , if this gives any comparative clues.

I have not owned a S7012 before but I guess what I have described above is not at all normal behaviour?
It's had this problem since I purchased it used from ebay.

Does anybody have a similar experience or advice on things to try before I go ahead and purchase a replacement?
 

ninebladed

New Member
Mar 13, 2018
4
1
3
53
The board is mounted in a NZXT s340 case with twin 140mm Noctua nf-a14-pwm chassis fans at the front.
The CPUs are cooled by Noctua nh-u9dx-i4.
To keep the AST2050 area cool I have a Noctua nf-b9 mounted directly over the AST2050 chip.
I would say there is pretty good airflow through the case with all the fans running. No component seems hot to the touch.

I guess under normal operation the BMC should be able to function without fan cooling running all the time. Otherwise it wouldn't be possible to remotely power the board off, and on again later via IPMI. Or maybe my assumption is incorrect and in commercial applications the rack chassis that houses the S7012 has a fan running 24/7 to keep the AST2050 cool?
 

Nizmo

Member
Jan 24, 2018
101
17
18
34
Sounds faulty.

You assume correct, you should not need 24/7 cooling across the IPMI chipset when powered off. It should however be warm to the touch, but not "hot"
 

ninebladed

New Member
Mar 13, 2018
4
1
3
53
Hi cliffr. Are you saying that in your experience ex-server motherboards will overheat with power applied to them but not started, unless they are constantly fan cooled?
I see quite a few whitebox builds on the internet that use ex-server motherboards such as the S7012. I have not seen mention of the need to cool those builds while in standby mode. But I have not been looking out for this detail so I may have missed mention of this.
 

Nizmo

Member
Jan 24, 2018
101
17
18
34
None of the servers in my rack have any fans running while powered off but the IPMI chipset will be warm to the touch, providing the network with an IP even when powered off.

If it is overheating killing active IPMI remote connections, it is a Management controller failure. You could put an active fan on it, assuming it has a heatsink already to give you more time on the board, but if this was mine, I couldn't run it like that in a critical load.
 

ninebladed

New Member
Mar 13, 2018
4
1
3
53
Hi Nizmo. Thanks for the guidance you have provided. I tend to agree that the management controller failure I'm seeing is not normal. The controller should be able to function while the motherboard is shutdown but powered without the need for active cooling. It could be time to go shopping for a replacement LGA 1366 motherboard.
 
  • Like
Reactions: Nizmo

Nizmo

Member
Jan 24, 2018
101
17
18
34
Hi Nizmo. Thanks for the guidance you have provided. I tend to agree that the management controller failure I'm seeing is not normal. The controller should be able to function while the motherboard is shutdown but powered without the need for active cooling. It could be time to go shopping for a replacement LGA 1366 motherboard.
No kind of manufacturer warranty? Tyan is pretty good about that kinda thing.

If looking for replacements, I would steer you towards Gigabytes B2B products IMO.

I have a funky IPMI on one board, doesn't overheat but stops letting me login or read temps, power on etc. I just cant rely on that board anymore and has 0 critical machines running.

You could try updating the Management Engine (ME) before commiting the board. Most manufacturers supply not only a BIOS but also the ME firmware as well. Many boards also have a jumper you must set before the ME can be overwritten. If you cant write it, then there is your answer :)