Help me fix a X10SDV-12C-TLN4F with Memory signal is too marginal error

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Rand0mUser

New Member
Dec 14, 2019
10
4
3
Hello all at STH,

I have here a Supermicro X10SDV-12C-TLN4F which has some issues. I'm trying to diagnose if the board is dead, or if it can be fixed.

Main problem is there's no way to use it with RAM in slot A1, otherwise I always get error "Memory signal is too marginal DIMMA1". RAM is not the problem, RAM sticks are Samsung M391A2K43BB1-CPB approved by Supermicro for this board. RAM sticks are working OK, and I also tried with some others same sticks pulled out from another working server, same issue. I also cleaned up RAM slots using air blower, still the same. I tried lowering RAM frequency to 1800 instead of 2133 MHz, no change.

So I removed RAM from slot A1, and with RAM in slot B1 only I can sometimes boot, sometimes I get error "Memory signal is too marginal DIMMB1":

memory-signal-too-marginal.jpg

I managed to flash BIOS to latest version using recovery, it didn't change anything:

recovery.jpg

I also flashed BMC firmware with latest version, still the same:

flash-bmc-firmware-2.png

I also tried another PSU, now system is running with a brand new Seasonic 300W 80+ Gold, no change at all.

I thought issue was CPU related, so I tried to run CPU with only a few cores activated instead of 12 cores by default. Even with one core only and all CPU features disabled like Hyper-Threading and Turbo-Boost, system was pretty slow but it still had same issues.

So when starting the system, issue is that sometimes it will POST, and sometimes not. It can boot fine once in a while, and sometimes it won't and get stuck in a kind of bootloop while PEI--IPMI Initialisation and DXE-00B Data Initialisation:

ipmi-ok.jpg

I noticed that if I want to go into BIOS, I have to put JBR1 jumper in recovery mode, otherwise I can't go into BIOS. When into BIOS, I can make changes and save, it's working. But then, when I reboot weird behavior is here again, randomly the system will hang at POST or bootloop.

When I manage to boot the system, it's working fine. I managed to install a Freenas on it, and ran a Plex server working on video encoding on all cores for the whole night without any issue. As long as it's powered on, the system will keep on running flawlessly for days.

Issue is at POST or right after, when OS is booted up there's no issue anymore and from tests I made this Xeon D-1557 is as powerful as my Xeon E3-1245-V5 for video encoding, while it's colder and consuming less wattage. This is why I really like this little SoC and want to save it.

I still get lot of memory errors in IPMI logs:

ipmi-memory-errors.png

My guess is the board has one defective component, mostly one of those:

- CPU
- BIOS chip

RAM issues with working RAM sticks and clean RAM slots may come from CPU problem. But when the system is booted, CPU is performing very well, and even running on one core only board just behave the same. CPU is performing so well when server is booted, it's hard to believe CPU could be dead but who knows? Maybe a bad soldering on the BGA?

Weird behavior which happens randomly at POST can be related to a bad BIOS chip too.

If the CPU is dead, well this is BGA and Xeon-D are not sold for retail, so board can't be fixed. But if issue is only BIOS chip, it could be changed at low cost. I just have no real idea if I should try to swap the BIOS chip, or maybe this is a known issue with CPU? I saw the "Memory signal is too marginal DIMM" error seems common with X10SDV owners, do you have any idea what it can be?

If someone has any idea, I'd be happy to save this board. Many thanks, any help is really welcome.
 

RageBone

Active Member
Jul 11, 2017
617
159
43
i have no idea why that described behavior should be caused by the BIOS, or the BIOS Chip.
Please enlighten me!

And to be precise, there can be many more things wrong then just the CPU.
I mean,
- The Sillicon CPU chip itself
- The Chip to substrate bond.
- The Substrate
- The BGA connecting Substrate to Motherboard.
- The Motherboard
- The Ramslots.


So did you rule out dirt in the ram-slot? Sometimes, that can cause very wonky behavior.
And sometimes it is plain old user error, are you sure there is nothing behind the board touching the board and ramslots from behind?
Or that you scratched some traces?
LGA 2011 with the ILM has the possibillity of "long screw" damage wher you screw the cooler into the board.
 

Rand0mUser

New Member
Dec 14, 2019
10
4
3
Hello,

I read that strange POST problems could be caused by a defective BIOS chip, so issue could be there. But obviously, yes it could be lot of other things and it's hard to be sure.

Yes I cleaned RAM slots (it was already clean in fact), still the same. I checked the board with magnifier and I didn't see anything, board is running on a bench for tests and still same problems.
 

Rand0mUser

New Member
Dec 14, 2019
10
4
3
Hello,

Good news, I managed to get invoice for the motherboard from previous company owner. I contacted Supermicro EU and I'm in the process of returning for a RMA repair. Their tech and RMA support gave quick answers to emails and seem very helpful, I hope they can help fixing this. I'll give an update here on STH later when I'll have further details.
 
  • Like
Reactions: RageBone and Tha_14

Rand0mUser

New Member
Dec 14, 2019
10
4
3
Hello,

I have good news about this one, and since I wrote I would give an update here it is. Supermicro EU gave me a RMA number, and I sent back the board to their EU center on 23/12/2019. Parcel was received on 30/12/2019. On 10/01/2020 I received UPS notice about a parcel coming from Supermicro, which was delivered yesterday.

Here is RMA report:

rma-report.JPG

So it appears that they didn't repair the old board, but they sent me a new one (serial number and mac are not the same). In fact, I don't know if this one is brand new or not, but it's very clean anyway. Board has latest BIOS, latest BMC firmware and also activated licence for advanced IPMI features (required to flash BIOS using IPMI web console).

So I don't know what the issue with old board really was, but I guess it must be something pretty bad since they sent another board replacement.

Also, my old board was the one with the passive cooler, here it is with a noctua fan on it:

old-board.JPG

And the new board Supermicro sent me is the one with active cooler, here it is:

new-board.JPG

I'm very pleased about Supermicro EU amazing service. I can say they're very helpful and professionals, I may sell this board and change for a similar model with SFP+ later, my next server board will be a Supermicro one it's 200% sure!
 
Last edited: