Bug in Intel Atom C2000 series processors?

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,263
428
83
Assuming this is a SoC-level problem as opposed to a board-level problem, until a new stepping is established, won't RMA'd motherboards just have a newer version of the same potentially vulnerable chip?
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,964
4,921
113
Assuming this is a SoC-level problem as opposed to a board-level problem, until a new stepping is established, won't RMA'd motherboards just have a newer version of the same potentially vulnerable chip?
There is a platform-level fix that all of the vendors are using now.
 

lofie

New Member
Jul 12, 2013
6
1
3
hi,

anyone heard of any C2000 systems which are not affected?
i.e. is there any way that a typical pc (bios based) board will not suffer from this bug?

Have a C2758 system, fairly standard "pc like" motherboard running linux in an "appliance".
e.g. AMI bios, usb, sata ports, boots from sata ssd.

The mac addr has a OUI of "Lanner Electronics". sio Im they or the actual makers/designers of the board.
(fields returned by dmidecode have not been set)

So far the initial response of the appliance manufacturers is that they have not seen any increased failure rates, but they are looking into it. And continuing to sell this kit - a call to a sales rep confirmed the sales team did not know of the bug exisitence let alone if it effected their gear.

Have a call in witht he retailer of the system. Will update here if there is any response.

- L
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,964
4,921
113
@lofie - keep us posted. I can update the main site with the response.

The failure rate on most devices is not high. Any C2000 device 2016 or older is vulnerable.
 

smithse79

Active Member
Sep 17, 2014
196
33
28
40
@lofie - keep us posted. I can update the main site with the response.

The failure rate on most devices is not high. Any C2000 device 2016 or older is vulnerable.
If you're keeping a log of responses, here's how it went for me:

2/7: I contacted vendor, they had not heard anything from SM but would reach out.
2/7 (later): Vendor heard back from SM that there is a recall exchange for this motherboard (A1SRM-2758F) and they will set up an advanced replacement RMA from SM to me

2/9: New board arrives late in the day. Too late to be able to swap them out.
2/10: Swapped motherboards (took less than 30 minutes to unrack and swap)
2/10: got shipping labels from SM/vendor and sent old board back via UPS

Completely painless process for me. It was obvious that Windows knew something had changed, but it booted just fine (Server 2016)
 

mstone

Active Member
Mar 11, 2015
505
117
43
42
2/9: New board arrives late in the day. Too late to be able to swap them out.
2/10: Swapped motherboards (took less than 30 minutes to unrack and swap)
How long ago did you purchase? Were there any obvious blue wire fixes on the motherboard? Is the cpu revision different?
 

djroketboy

New Member
Sep 2, 2011
12
2
3
I contacted SM directly yesterday, they had me open a support ticket, then RMA and it was approved for exchange. It's a A1SRi-2558F. I purchased mine in April.
 

smithse79

Active Member
Sep 17, 2014
196
33
28
40
How long ago did you purchase? Were there any obvious blue wire fixes on the motherboard? Is the cpu revision different?
Mine was purchased August(?) 2015

Nothing obvious on the motherboard, and I didn't get the stats on the Stepping, etc. It is a fairly critical server and I needed it back up ASAP.
 
Oct 21, 2015
33
1
8
40
Italy
www.opensupport.it
I have supermicro board A1SAi-2750F, i contact support via email for this and SM europe (i'm italian) tell me to open RMA for product.

how working the procedure?

if i buy other hardware how i understand if is fixed the problem or is a new revision?
there is a public hardware revision? (supermicro no say nothing to me...)

and i not understand if platform are now affidable and fixed

<and.. sorry for my english :) >
 

trumee

Member
Jan 31, 2016
194
9
18
50
I got my A1SRI-2758F back from Supermicro today. The RMA note said "Reported problem not found. Ran AC power on/off reboot test okay. PC check test okay. Boot into windows 8 and ran burn in test pro okay. Test passed". And that was it!. Called up the RMA centre and was told they have put the Intel fix on the board. However, there doesnt seem to be a visible change in the board. Not sure what to make out of this.
 

BlueFox

Well-Known Member
Oct 26, 2015
1,054
511
113
If they fixed the board, you wouldn't really be able to tell. They should just be removing the old CPU, reballing the board, and attaching the replacement CPU.
 

GaryD9

New Member
Feb 15, 2017
28
7
3
49
Pittsburgh, PA (USA)
If they fixed the board, you wouldn't really be able to tell. They should just be removing the old CPU, reballing the board, and attaching the replacement CPU.
If the fix were made with an updated CPU, the CPU stepping would change. My replacement C2558 board has no visible wire jumpers or changes (compared side-by-side with my older board), is still REV 1, and the CPU has the same B0 stepping :

Origin="GenuineIntel" Id=0x406d8 Family=0x6 Model=0x4d Stepping=8
 

Evan

Well-Known Member
Jan 6, 2016
3,123
522
113
Remember not all chips are defective , only the ones in a given time range and maybe from a given fab, so maybe you're one was not one that will fail early and they can check that (of course a utility the user could run would be more sensible)
 

GaryD9

New Member
Feb 15, 2017
28
7
3
49
Pittsburgh, PA (USA)
Remember not all chips are defective , only the ones in a given time range and maybe from a given fab,
Source?

Everything I've read indicates that ALL the C2xxx chips with the B0 stepping are impacted. Here's what Intel says about it (from http://www.intel.com/content/dam/ww...ion-updates/atom-c2000-family-spec-update.pdf)
AVR54. System May Experience Inability to Boot or May Cease Operation

Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning.

Implication: If the LPC clock(s) stop functioning the system will no longer be able to boot.

Workaround: A platform level change has been identified and may be implemented as a workaround for this erratum.

Status: For the steppings affected, see Table 1, “Errata Summary Table” on page 9.
The referenced table indicates stepping B0 (of course, B0 is the only stepping that exists in the wild.)
(Oddly, I thought I read something about Intel producing a new stepping to resolve the issue, but I guess I was mistaken... Based on the current document, there is (will be) no new chip/stepping to resolve the issue, and it's ONLY a platform change that works around it.)
 

Evan

Well-Known Member
Jan 6, 2016
3,123
522
113
Source was intel about the dates. Early processors not affected, now that may be before B0 production it was not stated. Speculation about the fab and hence the ? as I don't know what fab's are used to make them.

It does not seem clear of the bug is really a pure design issue and/or a material or material application issue. At least that's the way I interpretation it.

Remember the c2000 has been produced since 2013, and it's been hinted at that it's only the chips that are reaching 18month now and will reason 18month soon that have the issue ?? What about the 2013 produced chips ?

I actually can't right now imagine what fix they could also implement on existing product, has anybody see what they add or do ?
 

GaryD9

New Member
Feb 15, 2017
28
7
3
49
Pittsburgh, PA (USA)
Source was intel about the dates. Early processors not affected, now that may be before B0 production it was not stated. Speculation about the fab and hence the ? as I don't know what fab's are used to make them.
Can you please provide a link to something where Intel says that early processors aren't impacted? I'm not doubting that it exists, but it's something I haven't read.

Here's an article that has some interesting (if useless) info from intel quotes: Intel's Atom C2000 chips are bricking products – and it's not just Cisco hit

According to that, Intel declined to comment on when the impacted chips started and stopped shipping. However, they are also quoted on that article as stating: "Additionally, Intel will implement and validate a minor silicon fix in a new product stepping that resolves this issue." THAT statement indicates that it wasn't a case of some bad components mixed in with good, or a bad process at a single fab, etc. Rather, that statement seems to indicate that it was a design flaw (either logical or physical) that they can correct with a silicon fix in a new stepping (that doesn't seem to exist yet.) (If it was just a subset of chips with the issue, then they wouldn't need a silicon fix and new stepping...)

It does not seem clear of the bug is really a pure design issue and/or a material or material application issue. At least that's the way I interpretation it.
If the above linked/quoted material is accurate, it would be a design issue. (Keeping in mind that I'm including the choice of materials used as part of the overall design.)

Remember the c2000 has been produced since 2013, and it's been hinted at that it's only the chips that are reaching 18month now and will reason 18month soon that have the issue ?? What about the 2013 produced chips ?
I've seen many references to "18 months." NONE of them from Intel, and none of them are willing to directly connect 18 months with Intel. However, I think it's fair to look past the smokescreen and see the relationship. In that (likely valid) case, the stuff I've seen indicates that the issue is more likely to be a concern after 18 months. It doesn't indicate that it won't occur until then, and I'm sure there are plenty of 3+ year old chips in 24/7 use that never had an issue.

As for the 2013 chips that HAVE had the issue... I'm sure the majority of those owners were told that they were out of warranty... or in some cases, they got warranty replacements (if the warranty was long enough) without anyone officially relating it to this specific issue. Even now, with everything we think we know now, it STILL hasn't been officially related. Cisco, etc, have all been VERY careful in what they aren't saying.

I actually can't right now imagine what fix they could also implement on existing product, has anybody see what they add or do ?
That's an interesting question. I'd also love to know what the "platform level fix" could be that doesn't force a new motherboard revision, doesn't force any type of BIOS (or microcode) update, doesn't change the chip, and doesn't leave any clearly visible signs on an older board that supposedly has been "repaired."

At the very least, I expected to see some sign of hand SMT re-soldering on the replacement board. I don't see anything on either side. (Of course, I don't know what I'm looking for either, and there are a LOT of solder points on a motherboard!) I have to keep in mind that the "fix" might be as trivial as cutting a trace or replacing one of those extremely tiny components on the board. I might never see that even if I did know what to look for (and where to find it.)

Edit: Here's a page that supposedly answers your "what about 2013 chips?" question: Intel Atom chips have been dying for at least 18 months – only now is truth coming to light (I say "supposedly" because none of the info in that article is directly confirmable.)
 
Last edited:

Evan

Well-Known Member
Jan 6, 2016
3,123
522
113
Let me see what I can find written that I am allowed to share.
I have re-read the emails where I got the info and I would say given the dates and the wording it looks like early info and the vendor really also stated that it mostly affected products in a date range.

Having said that a different vendor (Cisco) has provided the info to use that it's a full recall, I could attach that info but our company info just referenced documents already referenced.

Not the biggest same size as we have very few low end ASA's etc but as far as I know we have also seen no failures. And I know of no single person with first hand experience of any failures on any platform yet so I guess they are few and far between.

I just can't imagine any big vendor re-working product like soldering new chips being able to keep the failure rates of the repair low enough compared to factory shipped items but I guess maybe they do and will.
 

mstone

Active Member
Mar 11, 2015
505
117
43
42
everything I've seen points to mass hysteria caused by bad reporting on very few facts.
 
Oct 21, 2015
33
1
8
40
Italy
www.opensupport.it
i have a long email correspondence with SUPERMICRO support:
and in a mail they respond me:
"....If you purchase another board that came from us it should not have the issue. Otherwise we’re told we can guarantee the problem is not present on motherboards with serial numbers xx173xxxx and xx73xxxx (produced from march)....."
Other info/answers about mainboard fix are denied.

my tow cents :)
 
  • Like
Reactions: Evan