The Intel Atom C2000 Series Bug – Why it is so quiet

  • Thread starter Patrick Kennedy
  • Start date

Yamabushi

Member
Feb 19, 2016
33
5
8
54
My firewall is a Supermicro C2758 running pfSense. It's been running flawlessly for years now, but this definitely has my attention. :(
 
  • Like
Reactions: gigatexal

fossxplorer

Active Member
Mar 17, 2016
498
77
28
Oslo, Norway
Yeah, i wish that was the situation with AMD having similarly spec'ed alternatives.
I'd definitely support underdogs, we don't need monopoly on the CPU side and with vendors like AMD giving competition will come us consumers to an advantage (more options/better prices).


If there were truly competition in the market this is where companies would be substituting atom for whatever AMD's equivalent but they can't because AMD doesn't have one (I guess lowend jaguar or Brazos cores...)
 

Aluminum

Active Member
Sep 7, 2012
431
45
28
I think he's saying that everyone's trying to figure out how to deal with it, without actually replacing every CPU since that seems like overkill for an issue that (while real) is still affecting a small enough number of machines that years after the 18 month accelerated failure still hasn't made enough of an impact for anyone to notice except for the press about the errata.
Who is to say how "small" it is when intel is playing so heavy handed with this NDA horseshit. I'm sure the guys that make your car would rather not do all those little recalls because only a "small" number actually have a problem, thankfully that industry has to play by some basic rules.
 
Apr 13, 2016
56
7
8
51
Texas
Silicon errata are always fun ... and how things get handled are always different case by case. In this instance, outside looking in, I'd guess that there is a long term reliability degradation on a clock synthesizer/generator, probably part of the SoC. The things that influence degradation over time can range all over the place, including environmental factors (i.e. air quality, but that typically influences PCB type of failures) as well as thermal conditions. Voltage levels are another area that heavily influences long term reliability. I am concerned about thermals being the key influence, especially with my passive heat sink Rangely based Netgate box.
 

maxermaxer

Active Member
Oct 28, 2016
238
34
28
46
Thanks! I will change to Xeon-D for the next build. I know this is off-topic but which Xeon-D has the best C/P at the moment? Say like US$200-300 range for MB+CPU. Thanks again!
 

mstone

Active Member
Mar 11, 2015
505
118
43
43
Who is to say how "small" it is when intel is playing so heavy handed with this NDA horseshit.
Anyone who has a bunch of C2xxx deployed for more than 18 or 36 months that aren't dead yet. It's obvious that the problem does not result in certain death after N months, it just makes the failure rate higher than expected. Did you buy the hardware with the guarantee that the failure rate would be 0%? If not, then it's unreasonable to expect total replacement because the failure rate is some non-zero value that's more than the original non-zero value. What is reasonable is probably some somewhat increased warranty time for devices that fail due to the bug, but what constitutes a reasonable time will be a big fight. And that increased warranty time is to make everyone involved look

I'm sure the guys that make your car would rather not do all those little recalls because only a "small" number actually have a problem, thankfully that industry has to play by some basic rules
I'm not aware of any basic rules for how long a computer chip is supposed to last. The manufacturers probably have contract terms on the failure rates, but the customers don't. In cars there are laws/regulations concerning things like emissions requirements (if a model doesn't meet certain requirements it has to be fixed, period) and other issues are life-safety critical. If someone has a life-safety critical system that depends on the total reliability of a single c2xxx processor, they probably have a much bigger problem than this. Think about the bad capacitor problem a few years back: if you had enterprise gear under warranty it got fixed; if you had consumer gear not under warranty, you got new consumer gear when the caps finally died. The key here is whether you're paying someone to ensure that you get new gear when it breaks, and it's that person who can recover damages if the failure rate on the equipment is higher than specified by the manufacture.

Of course, if you're buying direct from intel and you have a contract specifying a failure rate then the above doesn't apply--but most people buying for home labs aren't spending that kind of money.
 
Last edited:

MiniKnight

Well-Known Member
Mar 30, 2012
3,001
911
113
NYC
We've got a bunch and they've been running non-stop for years. There's some old C2758 FreeBSD ZFS backup server pair that a coworker setup 2 years ago. They're at 649 days uptime.

I'm reading this right, after 18 or 36 months higher than normal failure rate. For most of our gear we start looking at replacements around 36 months anyway.

I'm reading many of these comments elsewhere and people clearly don't get how this works. I'm also reading many comments elsewhere by kids with gaming rigs who clearly don't own this gear and don't deal with DC gear. Stuff fails all the time. We're just going to start to look for C3000 options a few months earlier and rotate them in, using retired C2000 to fill the void as short term spares.
 
  • Like
Reactions: leonroy

Deslok

Well-Known Member
Jul 15, 2015
1,122
124
63
31
deslok.dyndns.org
We've got a bunch and they've been running non-stop for years. There's some old C2758 FreeBSD ZFS backup server pair that a coworker setup 2 years ago. They're at 649 days uptime.

I'm reading this right, after 18 or 36 months higher than normal failure rate. For most of our gear we start looking at replacements around 36 months anyway.

I'm reading many of these comments elsewhere and people clearly don't get how this works. I'm also reading many comments elsewhere by kids with gaming rigs who clearly don't own this gear and don't deal with DC gear. Stuff fails all the time. We're just going to start to look for C3000 options a few months earlier and rotate them in, using retired C2000 to fill the void as short term spares.
Do we know if there's going to be a full C3000 lineup? there was only one chip I saw so far
 

mstone

Active Member
Mar 11, 2015
505
118
43
43
Do we know if there's going to be a full C3000 lineup? there was only one chip I saw so far
Depends on how much money they lose on the C2xxx and whether they're willing to try again. :)
 
Last edited:

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,182
1,655
113
CA
The real question is when people start dumping the C2xxx will we be able to buy them dirt cheap and send them in for a 'free' repair still ;) and build-out some dirt cheap storage systems, firewalls, etc...

I'd gladly pickup a handful for dirt cheap and wait through the repair cycle ;)
I could use a couple firewalls right now for dirt!
 

Evan

Well-Known Member
Jan 6, 2016
3,252
559
113
My thoughts on lifetime of components , Chip lifetime I would expect is 7 years, that's how I have seen the extended availability cpu's marketed at least unofficial.
7 years availability with 7 years life.

But keep in mind you don't see many extended life main boards with the special power delivery components so in normal life I would say 5 years since that's typical the longest vendor warranty support at purchase date.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,113
5,133
113
Just added a must-read update. Got the first figure I could cite in terms of failures for a larger population. Summary: some media outlets are hyping this too much.