X99 Motherboards, risk of killing CPUs

Frobbit · Jan 5, 2019

I have had an X99 Asrock Extreme4 + i7 5820k as a desktop machine for a couple of years, and it has worked fine. Warranty ran out the motherboard, so I managed to find a reasonably cheap Asus X99-Pro / USB 3.1 with a Xeon E5-1603 v3 as a backup board.

My plan was to use the E5-1603 v3 board as a fileserver while also acting as a backup to my desktop board. For both I had later hoped to upgrade to a more powerful Xeon when they drop in price in the future.

However, after reading up a lot on X99 and Asus boards in particular, I've become really concerned that X99 boards seem to randomly end up both dying and killing the CPU in the process. Mostly this seems to happen when powering on the computer or waking it from sleep. Unfortunately I am not able to leave the fileserver on 24/7, so it will be woke up every day, which is now quite concerning.

I'm not planning on doing any overclocking (since Xeons for this platform can't be usefully overclocked anyhow), just running at stock, or even slightly undervolted to lower heat/power etc. Still, some of the posts online about dying X99 seem to have happened to people running at stock..

I'm wondering if anyone here has had experiences with X99 motherboards dying or killing CPUs? Or am I just worrying too much after having read all the horror stories online?

RageBone · Jan 5, 2019

Frobbit said:
I'm not planning on doing any overclocking (since Xeons for this platform can't be usefully overclocked anyhow)

Actually, they can!
Two options, unlocked CPU, 1650V3, 1660V3 1680V3 for example.
Or microcode 00, turbo bin bug ****ery. Should work on any V3 Xeon with the basic microcode.

TLDR:
If you have a good PSU and you have ample airflow over the VRMs, and you don't touch "deadly voltage settings", it is very, very unlikely that your board will fail in a way that kills the cpu.

The standard excuse is that you only hear from those with problems.
Motherboards probably ship in the millions of units so simple Statistics apply : )

And there are allways those who keep their 550W 7 year old PSU for their new and awesome hardware.
Or go with the cheaper PSU. Who knows.

But that aside, now to the horror stories and argumentation : D

There are most often no sane reasons why a specific Board, or boards from a specific OEM should be prone to killing CPUs under normal circumstances.

There are a few voltages that asus is known for, that if you tweak those for overclocking, the board will very likely kill your CPU.
But only relevant for OC Edgcases, since you don't need to change those voltages in 99% of the time.
Other exceptions may be told, following this : )

One reason for that is that VRMs are pretty similar, even between Manufacturers.
There are only X Types of components and Y ways to put them together and Z ways to lay and route them and C ways of cooling or not cooling them.
To cause a problem in a specific product, there has to be either a design mistake in the above variables, or a problem on the component manufacturer side in terms of quality and reliability of the components.

No one drives some specific IR powerstages at 7V since there seems to be a reliability problem on that voltage.

Asus does like to copy paste designs and some are pretty old now.
In case there is a critical Flaw, killing CPUs by the thousands, Asus would know and do something about that in newer designs!
Since they are pretty consistent with that, i doubt that there is a major problem anywhere.

A VRM is made out of and Inductor, High-Side and Low-Side Fets and some controller chips, switching them on and off to create the wanted Voltage.
This is called a BUCK converter.

In general, most OEMs use either an International Rectifier (IR) Controller with the amount of phases needed, an Intersil equivalent, or their own, rebranded IR or Intersil. Of cause there are other Chip-Manufracturers like Texas Instruments (TI) but that is out of the scope of today.
Asus uses Rebranded IR controllers.

Now to high and Low side fets. There 2 and a half ways to do them.

Either Discrete with "dumb" Fets, or with some kind of Package around them, combining them into one single chip.
Those are most often called Power-Stage.
PowerStages can also include many more things other then just the fets.
Main advantage is a smaller footprint, and the additional features, for example Over-Current Protection, Over-Heat Protection, easy-er board design and many more.

Budget boards most often go for discrete Fets, where as expensive / high end boards go for Power Stages. Not all ways but we are generalizing here.

Haswell and Broadwell have a so called "fully Integrated Voltage Regulator" in short "FIVR" or something like that.
That changes a lot, since the VRM Voltage that the board CPU VRM has to create is around 1.8V instead of the 0.6 to about 1.4V from for instance X79 or AMD Ryzen.
Main advantage is that more voltage means less current at the same Power which in this case also means less power lost in the VRM while switching.
The result in design is that 6 Phases are more than enough on servers, and most good x99 boards go for a 8 Phase VRM.

The main reason why this is nice is that the VRMs are slightly less prone to overheating.

Now to some stories about Failures and their after mess.

I handle already broken hardware, most often without or out of warranty.
Those are most likely Edge Cases, not common, etc.

High end MSI X79 Board, i know that it's not X99, but this had good Intersil Powerstages and did the classical X79 Design ****up of putting parts of the VRM onto the back side of the Board.
The (i think) 14 Phase VRM didn't help here very much.

The failure were two stages, one on the back side, one on the top, who have gone up in "flame". Had to dremel them away.

The board worked flawless with only 12 phases and an Xeon e5 1620V1 for (3/4) of a year, until the next two phases blew up on Christmas eve.
Kind of Funny.

In both Situations, the Phases shortet 12V to ground, and the cpu survived the second one. I think the original owners cpu survived that incident too, but i can't be sure.

I suspect long time Overheating as the main cause why those went bad.
Maybe the load-balancing over the stages favored those, making them the hottest, shortening their life.
Especially with the backside design, overheating is a likely cause, since the backside is cooled very poorly.

I have, and have seen a few P9X79E-WS and Rampages with burned phases like the MSI, discrete Fets in those cases, and again, fets / parts on the backside with little to no cooling.
Shorted was 12V to Ground. This would not have killed CPUs.

Both fricken nice boards with IR3550M and IR3567B Controllers. Good components.
Sadly those are the worst VRM Failures i have, all stages went fully conductive from 12V to VCC, putting 12V into the CPU, and probably killing it.
Both are still on the healing-bench.
I have no proof for any theory, but Overheating is a bit unlikely in case of the EWS, so:
I suspect that the PSU went bad and did bad things on the 12V rail, causing harm to all the Powerstages.
Story from the owner of the EWS was that it suddenly turned off and didn't turn on anymore.
No burn-marks etc.

If you have a good PSU, and it still has warranty, those things should be covered by the warranty and are in general very unlikely to happen.

Frobbit · Jan 5, 2019

Yes, it's a v3 E5-1603 Xeon, I changed it above.

The failures I have been reading have been mostly X99, and generally Asus specific (I'm sure there have been issues on X79 boards as well, but for X99 there seems to be a lot more anecdotes online).

It's very hard to get a clear picture of what exactly might be happening from the various posts, but the general speculation seems to be that there is something going wrong in the firmware/software that is controlling some of the voltages going into the CPU (particularly on Asus X99 boards).

For the X99-Deluxe board Asus apparently did issue some fix to the firmware controlling VRMs: Asus issues a critical bug fix BIOS for x99 Deluxe motherboard - ExtremeRigs.net , but what is worrying me is reading X99 stories about dead CPUs and motherboards well after that (into 2017 and 2018 even), and on other boards than the Deluxe (X99-Deluxe and Rampage V seem to be the worst offenders, but it's quite easy to find stories for most X99 boards, even non-Asus).

Because of that, I'm just wondering if X99 is worth keeping around as an E5 v3/v4 upgrade path, or if it would be more prudent to get 2011-3 workstation/server boards for that.

RageBone · Jan 5, 2019

X99 is the only platform allowing both, udimms or Reg ECC with Xeon CPUs; with Exceptions.
And i'd say, once you are sure that you are on a stable BIOS, one after that issued problem solving version, hopefully one with spectre and meltdown mitigation, i can't think of any other reasons why X99 should be prone to such problems.

Gadgetguru · Jan 5, 2019

I've been running my i7-5820k overclocked to 4.3 in a Gigabyte Ultra Durable X99 board since 2015. It is put to sleep or turned off at least once a day. PSU is a (new at the time) Seasonic Platinum 1050w. No issues but I can't speak to the Asus boards.

madbrain · Jan 5, 2019

Also running my i7-5820k overclocked to 4.3 in an MSI X99A-Raider since 2015. Sleep and resume multiple times a day. Was running a lowly Raidmax 735W PSU until more recently when I upgraded to a Corsair 1200i.

MSI does have its share of BIOS bugs, though.

Search

X99 Motherboards, risk of killing CPUs

Frobbit

New Member

RageBone

Active Member

Frobbit

New Member

RageBone

Active Member

Gadgetguru

Member

madbrain

Active Member