Bug in Intel Atom C2000 series processors?

weust

Active Member
Aug 15, 2014
345
34
28
42
Buy a short piece of dupont line(2.54mm) with female connector on both ends and plug it on the pins.

If the front side is already occupied, try conductive silver paste or copper tape, you can always remove them afterwards.
What does that have to do with it?
 

digitalsawdust

New Member
Jan 2, 2021
2
2
3
Did you contact Supermicro to see if they will repair it for you?
Not sure on warranty (if any) after they repair it.
No, I didn't even bother. Although, from reading various forums around the internet, it sounds like Supermicro has been pretty helpful, and I saw lots of people claiming to get replacement boards even out of warranty. I think this may already be my longest running piece of hardware as it is (5 years next month). If this 5 minute fix keeps it running for another year or two, that's good enough for me.

I'm curious if the rate of failures of these boards will start going up now that it's been a couple of years since the bug was first announced. I made it considerably longer than the initial 18 month estimate that was given when the bug was announced. I'm sure many people will just assume the board died and replace it, like I was getting ready to do. It was just dumb luck that I stumbled upon an article about the bug while I was looking for a comparable replacement.
 

ullbeking

Active Member
Jul 28, 2017
506
66
28
42
London
I am very excited to have found this thread!! I would also prefer to fix any C-2000 boards myself rather than go to the trouble of sending them to SM (Netherlands) for repair. Although I will still try to send one to learn from the experience.

I have a lot of these boards, some pre-RMA, and some purchased after the issue was addressed. Others work but I don't know when they will konk out so they are no good for production.

I will also search for other forums, blog posts. mailing lists, IRC, and other channels, etc, for more information of this nature.

Wouldn't it be great to create a canonical source of information for people who wish to DIY this fix on this very popular board? If I manage to get enough practice I would be very happy to have a crack at this.
 
  • Like
Reactions: RhoTrp

ullbeking

Active Member
Jul 28, 2017
506
66
28
42
London
Anyone know how much Supermicro charges for a c2000 bug RMA?
You just write to Supermicro RMA with the invoice info from where you bought it, etc. Or get the vendor to do it. You can NOT put a board into production with a bug like this, it's a simple argument.
 

zack$

Well-Known Member
Aug 16, 2018
613
253
63
I did start the RMA process but stopped when Supermicro said "only when we get the board can we advise of charges". I'm supposing that they will not commit until they actually know what is wrong (could be something other than c2000).

I remember reading on the forum that a guy got stuck with RMA because the cost to repair was just to high...he was literally offer the board on the forum.

My thing is that I'm just trying to get a ballpark figure to determine whether to even RMA (my c2750 is not in production).

Anyone have an idea?
 

RhoTrp

New Member
Apr 6, 2021
4
0
1
Well then, April 2021 calling :)/

Pretty great thread here. Thanks to everyone here already having added to the knowledge.


I was justed gifted a non-functioning ASRock Rack C2750D4I rev. v1.02 (including 32GB of RAM!) which wouldn't run anymore (since 2018). The owner added that the possible cullprit would/could be "CPU Heat Problems". That was the last thing he saw somewhere in some logs.

I had a look. The board was indeed unresponsive. BMC Heartbed LED was beating, though.
Found something I hadn't heard of: IPMI. That was also working.
IPMI showed a log of CPU Temperature problems <somethingsomething>"unrecoverable"... and those were the last entries in that log (dated 2018)
The IPMI interface also couldn't connect to the board.
I updated the IPMI firmware to 0.30.0, for good measure; this did clear the history log.

The story was "yaddayadda I had to reboot, but then it wouldn't come up anymore".
So, in retrospect, that sounds like he had this degraded clock-circuitry thing going.

I read that the ASRock C2750D4I (and possibly the whole line of C2000) had not 1, but 2 possible bugs, (or perhaps even 3 if CPU overheating is also a thing).

I read here (and elsewhere) people mentioning other boards with their manuals, and having a succesfully fixed their clock/problem. (and some who weren't succesful) This did gave me enough courage to have a go at it.

So from the last forum pages here (6,5, and 4), and the Youtube from EVVblog, my guess guess was to find the 'same' header/port/placeholder on this board.
The ASRock C2750D4I manual:
ASRock Rack > C2750D4I
https://download.asrock.com/Manual/C2750D4I.pdf

On pages 6,7,8,9 you have the layout and a index/glossary/table for the onboard locations.
Nr 18 is the TPM Header.
Page 21 has a layout of the pins.

On this board, the TPM is actually populated with headerpins but a smaller formfactor than all other pins. Standard dupont female cables didn't hold at al. I had a go and soldered an 110 Ohm resistor to the pins (1 and 9). It was pretty non-invasive and non-destructive. can also be removed if needed.

But, alas, this did not help; Board is still unresponsive, IPMI still works.

I have no Osciloscope so can't check if there even is any clock pulse on it.
Currently even my multimeter is misplaced.
Also, my electronics knowledge is near-zero. ("pull-up resistor", wha..?)
I do have a photo (from my phone) of my little fix.

camphoto_1804928587.JPG
this did not fix my problem

So....
  • Was my interpretation of where I should do this wrong?
  • Have I botched the 'fix'?
  • Do I have the Flash-overwrite problem?
  • Is my CPU indeed fried?
  • How do I check all of these?

  • If the CPU is fried, I'm giving up.(it's surface-mounted right?)
  • But what people mentioned about the flash chip (was that this thread or another? It was an SMD chip available at Mouser) I would actually, dare to have a go at replacing that one. (it seems doable with a steady hand)
  • If I need to retry with the Clock fix, perhaps I misinterpreted the location on the board, I would retry that too.
  • Any help or pointers what to check, I would probably have a go at that too.
  • I don't think I can find someone with an Osciloscope here.

Well, cheerio, and I hope for the best

~R
 
Last edited:

RhoTrp

New Member
Apr 6, 2021
4
0
1
Wouldn't it be great to create a canonical source of information for people who wish to DIY this fix on this very popular board? If I manage to get enough practice I would be very happy to have a crack at this.
Have you found anything in these last few months that would give more info about the problem, or more specifically, any DIY fixes?
 

RhoTrp

New Member
Apr 6, 2021
4
0
1
RhoTrp, try actually soldering a 220 ohm resistor in between pin 1 and 8. That worked for me.
Ah thanks for the hint
I’ll look for a 220 Ohm resistor...

This is the first time I hear about a Pin 8 though; why that one?
Do you have the same board? Did you find that solution elsewhere? (And where)
 

rommac100

New Member
Apr 23, 2021
3
0
1
Well you can put two 110 ohm resistors in series to get the 220 ohms. But the reason for pin 8 is that the pin you hooked it up to is the 3v line while the 3.3v line which most guides reference for other boards is on pin 8.
 

RhoTrp

New Member
Apr 6, 2021
4
0
1
Well you can put two 110 ohm resistors in series to get the 220 ohms. But the reason for pin 8 is that the pin you hooked it up to is the 3v line while the 3.3v line which most guides reference for other boards is on pin 8.
i was indeed worried about the fact that everyone talked about 3.3v and my Pin 9 indeed only mentioned 3v.
:) Wow
Pin8 is called LAD2 on this board

I find no mentions of PIN8 being used on many a board, though.


Also, I found this entry:
How to fix Asrock c2750d4i with C2000 bug


This ‘ll be Interesting...
coffee First, though
 
Last edited:

rommac100

New Member
Apr 23, 2021
3
0
1
So I have tested this fix on two 8 core boards, and one 4 core board and it work successfully for all of the boards.
 

bfarnam

New Member
May 20, 2020
16
9
3
St Louis
So I have a spare C2750D4I which apparently had a MFG Board Level Repair done. It is actually quite different from what is described elsewhere and in this post. I have confirmed it is a B0 stepping CPU by checking the S number.

I tried to RMA it and they denied it as out of warranty with no repair option.

I am going to put my OScope on it later and see what I have. I will post more later on the results and the specifics of the MFG Board Repairs.
 
  • Like
Reactions: RageBone and Marsh

Jyrki

New Member
May 14, 2013
11
3
3

I did fix show in above url to my Supermicro board. There are two LPC clocks in use.
My board's symptoms were:
- no boot beeps
- ipmi wasn't reachable via local buss from operating system (ipmitool, kernel ipmi_si module)
- bios couldn't see BMC and BMC couldn't find bios version
- The board booted fine even after full AC power cycle. The BMC could stop and start the system.

I used 150 ohm resistor value which Cisco few other companies seems to use in their similar repairs. I couldn't find information what value Supermicro uses.

I tested also 330 ohm and it was enough to get clock good enough for communication happen, but the clock signal was so horrible that I didn't trust it and went to 150 ohm. Even 150 ohm is horrible shape for a clock, but at least better. Without a pull-up resistor the clock signal was flat 0V, so the up side fet had died totally. Kernel lost ipmi communication about year and half ago.


At below: Yellow is the still working clock without a pull-up resistor. Blue is the dead clock signal with 150 ohm pull-up. Both measured on the SoC side of the ~39ohm series resistors that are on the lines near the SoC.
1637005923976.png

Note: Components are 0402 size, so don't attempt without prior experience.

Edit: Seems I remembered wrong and it wasn't Cisco directly, but for example Sophos.
 
Last edited:
  • Like
Reactions: RolloZ170

Jyrki

New Member
May 14, 2013
11
3
3
Also: Please stop soldering random stuff on your motherboard if you can't measure what it does and really understand what you are doing. I would deeply hate to see boards having own repair done or attempted at sale at used HW market.
 
  • Like
Reactions: RolloZ170

Jyrki

New Member
May 14, 2013
11
3
3
For my own interest I measured the clocks again with better high frequency probing technique. Signals were measured one at time, at same locations as earlier. Both have 150 ohm pull-ups installed, unlike in my earlier picture which had just one. Seems one would still be alive even without pull-up fix.
1637183343349.png
 

bfarnam

New Member
May 20, 2020
16
9
3
St Louis
So sorry that I haven't had time to even look at this on my "dead" C2750D4I MoBo.

What I can tell you is that the "platform" fix that was applied included one surface mount resistor under the CPU in addition to multiple 120 ohm 1% resistors along the underside of the TPM header. I will definitely have some time Thursday/Friday to check into this more. For now, here are the pics:

120 ohm 1% between LAD0, LAD3, PCICLK to 3.3v(?) and 120 ohm 1% between LAD1 and LAD2 to 3.3v(?)

20211112_073243m.jpg

Unknown resistor under CPU...

20211112_073308m.jpg