Wondering if anyone’s had a similar problem they’ve managed to work through.
I purchased an (allegedly) tested, working 8048-TRFT system (without CPUs or RAM but with 8 MEM1 boards) last month on eBay and have not successfully POSTed it yet. It shipped with only one PSU which I thought may have been contributing to the issue, but after getting the seller to ship me 3 more as advertised and then installing and powering all 4, made no difference.
I’m testing with 4x E7-4880 v2s, and 8x MEM1 1.01 boards with either 32GB 4Rx4 1333 LRDIMMs (Samsung M386B4G70DM0), 8GB 2Rx4 1333 LRDIMMs (Samsung M393B1K70QB0), or 4GB 2Rx8 1600 LRDIMMs (Samsung M393B5273DH0).
Until last night, I was struggling with P#M# DC Detect Failure (usually P1M1 but occasionally a few others). As noted elsewhere in this thread, that ended up being due to the “different sized” mem boards, and shuffling them around resulted in that mostly going away. When getting past that stage, the system would (seemingly) successfully get through memory training, but hang on “System Initalizing… B” with a B7 POST code without displaying the BMC IP address.
I’ve managed to get past that tonight by reseating the CPUs and memboards probably a dozen times, but now I’m getting stuck at “System Initalizing… 7” with a 79 POST code and the BMC IP address being displayed. No matter what configuration I try (1, 2, or 4 DIMMs per board with 4GB, 8GB, or 32GB DIMMs, 1 or 2 boards per CPU, and 2, 3, or 4 CPUs installed - every permutation of those options), I can’t get any further. I’ve also tried another BMC card, in case that was related, which made absolutely no difference.
Is 79 a memory training step, or something else? I’ve removed the CMOS battery and shorted the jumper probably 20 times over the last month, so I doubt there’s a BIOS misconfiguration (although I can’t get far enough into the boot process to check the BIOS configuration). It hangs for about 7-8 seconds on that code and message before rebooting, and I’ve let it loop for as long as 12 hours to no avail. In the most minimal configuration, I have no USB devices attached, nor any PCIe cards installed, nor any hard drives in either the SAS bays or attached to the SATA headers. The only peripheral is the BMC card and a VGA cable. I have also attempted with a bootable Ubuntu USB flash drive attached, as well as a USB keyboard to try to get into BIOS configuration, but I don’t think that’s made a difference since it’s never getting out of POST.
I also have a set of 4 e7 v3 ES CPUs that exhibit identical behaviour. I had ordered the 4880 v2s to determine whether there was just an issue with the ES CPUs, which I believe to have ruled out. I also have a set of 4 8890 v4s on the way to test with, but I expect probably identical behaviour. I have also tested with another set of 4 MEM1 1.01 boards, which only ever resulted in P#M# DC Detect or Coarse errors.
Is there anything else I can try, or anything I’m missing? I’m getting closer to the return for replacement cutoff the eBay seller provides; I’d really like to not have to make use of it (for one, I live in Canada, and have to deal with driving down to the US for shipping and receiving since this seller will not ship internationally), but it really looks like there’s something defective with the motherboard (again, unless I’m missing something). It’s also worth mentioning, the seller shipped it with heatsinks installed but no CPUs or even socket covers, so it wouldn't shock me if the pins were damaged. I can’t see any bent pins on any socket (even with a 10x microscope, and I’ve successfully repaired bent LGA2011 pins before) but it’s not out of the realm of possibility. I also spent easily 6+ hours cleaning thermal paste out of the pins of 3 sockets after I noticed the CPUs coming out of them had paste on the pads, which I assume is related to being shipped with no covers.