Stephan

Machine Check Exception (mce) workaround

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Stephan

Well-Known Member
Apr 21, 2017
942
711
93
Germany
Stephan submitted a new resource:

Machine Check Exception (mce) workaround - Don't throw away that ebay Xeon just yet

You bought a cheap off-roadmap Intel Xeon CPU from somewhere, but the hardware crashes and reboots, even when idle. You realize the CPU might have gotten thrown out from the hyperscaler's datacenter for a reason. That reason?

Luckily, your CPU has extensive diagnostics and your Linux distribution supports "pstore" crash saving. In the directory /sys/fs/pstore/ within the saved dmesg* and mce* files you find something like this:

Code:
mce: [Hardware Error]: CPU 2: Machine Check...
Read more about this resource...
 
  • Like
Reactions: Maddox and gb00s

Stephan

Well-Known Member
Apr 21, 2017
942
711
93
Germany
On my AsrockRack board, all I can do is disable a number of cores up to N-1, but not selectively a certain core.

bios.png
 

RolloZ170

Well-Known Member
Apr 24, 2016
5,369
1,615
113
yes thats mostly an issue. had a platinum 8176 (retail) with defective core(s).
BSOD/PSOD at load/boot to OS, disabled all but first 10 cores worked.
same with a Xeon E5-1650v4 (retail)
bought all processors as defective/for parts for small money.
luckily now i have core disable bitmap in gigabyte MS33-AR0 BIOS, but no cpu with defective cores :rolleyes:
 
  • Like
Reactions: tinfoil3d