X10QBI and v3/v4 cpus (e.g. supermicro sys-4048b-trft)

angel_bee · Mar 31, 2021

NablaSquaredG said:
I assume you have tried the A1,B1,C1,D1 config with different DIMMs to rule out a faulty one?

Resetting CMOS would've been my next idea, there are some LRDIMM specific settings which might not have mattered when you used RDIMMs before.

Which BIOS version are you using?

The X10QBi is a nasty diva, I always pray that I don't get and crude VM memory errors when rebooting the machine

it seems i pre-emptively answered your questions a few minutes ago! im using the very latest BIOS and redfish IPMI firmware

P.S. yes, I am absolutely sure none of the new 64GB sticks are faulty. im running on all 24 of them in Ubuntu right now. the problem is just the DIMMC DIMMD thing

NablaSquaredG · Mar 31, 2021

Hmm... That's bad. Honestly, I'm out of ideas.

My troubleshooting approach would be

Start with 1 memory board in P1M1, single DIMM, check working
Increase DIMM count in the memory board from 1 to 4 (A1, B1, C1, D1) in that order, check working
If it doesn't work: Try with a different memory board
If it still doesn't work: Swap DIMMs
Perhaps swap CPUs to rule out a damaged CPU

Note I've had some hefty issues with P1M1 and various different errors ("TMSE TXC DQS DCC", "VMSE Train read failure", "VMSE Train write failure", "VMSE Read fine failure"), the P1M1 is a bit wonky... I tried everything and the final shitfix, aka heating up the slot with a hair dryer, worked and still works.

BTW, I wouldn't bother contacting Supermicro support if I was you. I've had a LENGTHY argument with them regarding memory compatibility (I've also had some issues), in which the international support has impressively demonstrated their gross incompetence and inability to comprehend complex questions and issues.

I can send you the email chain via PM if you're interested.

angel_bee · Mar 31, 2021

yes, I already increased DIMM count gradually. For 4 memboards, it works completely fine with DIMMA and DIMMB but as soon as I put anything in DIMMC it complains. Swapped memory boards too. Here's the thing: the training failure/VMSE failures are ALWAYS at DIMMC of one or multiple boards. So it's really obviously pointing to something that fails as soon as the memory training moves on from channel B to channel C. And plus it works fine with 64 sticks in all channels slot 1 and 2.

at this point im just trying to test if my current configuration is stable. if it is, im just going to leave it be. I don't see myself needing more than 1.5TB of ram.

thanks for your suggestions anyways. if there are further developments i'll post again here

P.S. oh im so glad someone else voiced this too. supermicro tech support is the worst. actually the first time I emailed them asking for the 8-pin GPU power pinouts they were so nice and gave me all the detailed blueprints. after that, I think they had a change of management maybe? and these new people are so bloody incompetent, condescending and dismissive. but i already emailed them tonight.

also gona call Samsung tomorrow too. fun times. not.

angel_bee · Apr 2, 2021

This has since been worked through. but im keeping this for archive purposes.

Update: I just did an entire full day of rebooting and I have managed to barely get 4 memboards with all channels populated, with random mixing of Q and non-Q RAMs.

However, I am still concerned about the stability.

What happened was that I noticed that "VMSE DC detect failure" is actually relatively benign, and if you get only these messages, you just have to wait for it to reboot and have another crack at training the rams (maybe with looser timings, i dont know). However, "DDR training failure" is bad. VERY bad. If you get this, it means there's something incompatible between the memboard and the slot that's showing the DDR training failure. You must remove the memboard from the complaining slot before you do anything else.

I noticed that:
a) PxM2 slots cannot be filled using my rams. Now I'm no longer sure whether it's due to the presence of Q version RAMs (i do not have enough non-Q to test). However, I do know that using purely Q version RAMs will not work in PxM2 slots. All memboards must be placed in P(1-4)M1 or you get the stupid DDR training failure.

b) I have conclusively shown it is impossible to use all 8 memboards and all 4 channels at this stage. i get DDR training failure regardless of the combination of memboards. It seems like it's a stability issue. The more boards I put into the system, the less stable everything gets and the more "DC detect failures" i have to cycle thru before it POSTs.

c) With my current configuration with only M1 slots filled, I can boot into the OS. However, my geekbench5 scores (xeon E7 8880-v3 here) are SO LOW. I used to get ~32k and now i'm only getting 15k on the multicore test. Is this normal going from 8GB RDIMMs to 64GB LRDIMMs? Did this happen to anyone as well??

At this stage I don't even know whether my performance is severely degraded due to the Q version rams or because 8rx4 LRDIMMs are inherently much slower... I'm seriously considering buying actual non-Q version to see if it indeed works seamlessly.

i dont know

angel_bee · Apr 3, 2021

UPDATE:

it's the CPUs.
somehow.

scratching my previous posts about Q vs. non Q variants. @NablaSquaredG you were right. you actually can mix them. I put in 4x E7 4820 V2s and it instantly worked.

Prior to this, I was running 4x 8880 v3 with MEM1 Rev. 1.01 boards and it was working fine with 64 x 8 GB RDIMMs so I took the compatibility for granted... sigh. there must be something about LRDIMMs that makes the system increasingly unstable the more channels that are populated.

The best way would probably be to update to a new BIOS that supports this configuration or buy some MEM1 Rev 2.00 boards. but no new BIOSes have been released since 2019 and MEM1 rev 2.00 boards are impossible to find.

UPDATE #2 4/4/2021 (it's midnight now): wrapping up my experience with v3 CPUs and 64GB LRDIMM - a guide for using 64 GB LRDIMMs with MEM1 rev. 1.01 getting VMSE DC detect failure/VMSE DDR training failure.

After another day of frustration, I've got my system working in pretty much the best way as I feasibly can. Hopefully the lessons I've learnt can help others in the future.

As others have previously mentioned, the Jordan Creek memory buffer is different from MEM1 rev. 1.01, MEM1 rev. 2.00 and MEM2. Others in this forum have only described usage with up to 32 GB DIMMs, but never with 64GB LRDIMMs. As a general observation, I think the issues I ran into may be specific to 8rx4 LRDIMMs being run on MEM1 rev. 1.01 boards, and this issue does not show with RDIMMs with lower rank or capacity.

I managed to find out that in lockstep 1:1 mode, everything seems to work perfectly. It is only with 2:1 performance mode that the issue emerges. This led me to think that maybe the memory buffer in MEM1 rev. 1.01 does not have a high enough bandwidth to support these massive LRDIMMs? So maybe when it tries to double the data rate with 2:1 performance mode, it chokes.

So you can use ALL 4 channels in ALL memboards with MEM1 rev 1.01 provided you must use lockstep mode. However, because lockstep mode literally halves the DRAM performance, it is highly undesirable because you'd be using LRDIMMs for large in-memory computes anyways.It results in a maximum effective 4 sockets x 2 memboards per socket x 2 channels per memboard = 16 RAM channels because, as I understand, DIMMA is tied to DIMMC and DIMMB is tied to DIMMD for each board for lockstep. So is there a way to get more than 16 effective channels?

It turns out I also managed to discover that you can fill in a maximum of 6 of the 8 available channels for every socket in 2:1 performance mode. Beyond this, I get the DDR training failure and the show needs to be stopped. Technically, I think it means that the MEM1 rev. 1.01 with V3/V4 CPUs somehow can only handle the bandwidth equivalent to at most 6 "doubled" physical channels per socket (or equivalently, 6 independent logical channels per socket). If anyone has a better explanation, I'd like to know.

SO.... this means for each socket, for example, processor #1, you fill in DIMMA1, DIMMB1, DIMMC1 and DIMMD1 for P1M1 (all 4 channels occupied). Then for P1M2, you only fill in DIMMA1 and DIMMB1 (channels 5 and 6). I'm sure that you can fill out more slots per channel e.g. DIMMA2, DIMMA3... but I don't have enough LRDIMMs to find out.

Populating all 8 memboards in this 4/2/4/2/4/2/4/2 configuration in 2:1 performance mode actually POSTS. time to celebrate. What this means is the total effective channels in this configuration will be 4 sockets x 6 independent channels per socket = 24 effective channels

A test on Geekbench confirmed the RAM speed indeed improved, but as expected, the score is consistent with a 25% RAM performance drop from missing the last 2 physical channels on every socket.

In summary, as a general recommendation, using 64GB 8rx4 LRDIMMs brought out the underlying limitations associated with using v3/v4 CPUs in the unsupported configuration (i.e. MEM1 rev. 1.01), despite fully healthy memboards and fully working LRDIMMs which were retrieved straight from the Supermicro compatibility website. Since MEM1 rev. 2.00 boards are practically impossible to find, this is a workaround that enables using cheap DDR3 LRDIMMs, but at the same time it sacrifices 25% of the peak RAM performance and when it POSTS, you'll have to wait a longer time for it to gradually work through the VMSE DC detect failures (which are relatively benign).

UPDATE #3 17/6/2021: VMSE DC Detect Failure seems to be fixed in V4 CPUs but the 24 channel limit before VMSE DDR Training Failure still applies.

synchrocats · Apr 18, 2021

Just to note: If you somewhere in the night stuck with f*cked up bios and cannot recover it with IPMI use Supermicro Update Tool:
/SUM -i 192.168.0.107 -u ADMIN -p ADMIN -c UpdateBios --file <PATH TO BIOS FILE> --force_update --reboot
Also, bios chip for x10qbi is mx25l12835fmi-10g

synchrocats · Apr 18, 2021

Also, does anybody got a working pci bifurcation? I saw it in manual and in bios firmware (with other interesting features like TDP control via AMIBCP 5.02), but I can't see it in bios

NablaSquaredG · May 8, 2021

My X10QBi is probably dead.

After I was able to temporarily fix the issue with P1M1, I'm now getting complete system hangups after max. 10m. Both Windows and Linux, even in the Linux setup (So no heavy load or anything). Once I got a Timeout: Not all CPUs entered broadcast exception handler kernel panic.

I'll test with P1M1 out asap and see what happens.

Honestly I'm underwhelmed by the X10QBi. That board should NEVER freeze. This whole platform is built around reliability (with the possibility to hot swap memory on other servers). Random freezes are definitely something I don't wanna see....

angel_bee · May 8, 2021

NablaSquaredG said:
My X10QBi is probably dead.

After I was able to temporarily fix the issue with P1M1, I'm now getting complete system hangups after max. 10m. Both Windows and Linux, even in the Linux setup (So no heavy load or anything). Once I got a Timeout: Not all CPUs entered broadcast exception handler kernel panic.

I'll test with P1M1 out asap and see what happens.

Honestly I'm underwhelmed by the X10QBi. That board should NEVER freeze. This whole platform is built around reliability (with the possibility to hot swap memory on other servers). Random freezes are definitely something I don't wanna see....

that sounds horrible

i hope that is fixable

would it be a good idea to run with memory mirroring and rank sparing to see whether it's a ram issue?

NablaSquaredG · May 9, 2021

angel_bee said:
i hope that is fixable

Probably not.
I have the suspicion that I damaged the board when I disassembled the server to add GPU power cables...

angel_bee said:
would it be a good idea to run with memory mirroring and rank sparing to see whether it's a ram issue?

Thanks for the suggestion!
I will first check without the P1M1. If that doesn't work, I'll try memory mirroring and rank sparing

dkudrna · May 20, 2021

Unsure if anyone can help me in this situation. I recently got a X10QBI mbd with four Intel XEON E7-8880 v3. When I started the board the first time it worked, everything passed the post and beeped at me. Then when I realized the bios was password protected I ran a CMOS reset. After doing so, the mbd does the "System Initializing" phase and shows postcodes on the bottom right I believe. But it then finishes system initializing and shows the BMC IP on the bottom right, stops at code 79, then restarts the mbd, doing the whole thing all over again. Thus I can't enter bios as the system just restart too early and I can't enter IMPI as it restarts before it truly is seen on the network. So I am stuck as to what happened. I reseated each CPU, ensured the ram was properly placed, reseated, and moved around to other P#M1s. I am lost on what I can do to get into BIOS so that I can actually post or actually update the damn thing in order to possibly post.

NablaSquaredG · May 20, 2021

NablaSquaredG said:
Probably not.
I have the suspicion that I damaged the board when I disassembled the server to add GPU power cables..

There might be hope.
Today I had to troubleshoot a Cisco C460 M4, also with memory issues. Long story short: CPU was dead.
Could be that the 8880v4 (CPU1) is wonky... Shame I don't have another one

Garrett389 · May 31, 2021

gmaxwheel said:
High core count v3/v4 E7 CPUs (as well as v2 for that matter) are often available at relatively low prices. A big reason the prices are low is that compatible boards are scarce, as they're quad socket parts that were only found in relatively exotic and high end systems.

One of the most widely available boards is the Supermicro X10QBI. The X10QBI is an unusual board: it has four 2011-1 (E7 compatible) sockets and 96 dimm slots. The 96 dimm slots are accomplished by the ram being placed on daughter boards, with part-number X10QBI-MEM. The daughterboard based ram creates a lot of confusion because there are multiple versions of the memory boards: MEM1 rev 1.01, MEM1 rev 2.0, and MEM2 rev 1.01.

Mem1 rev 1.01 uses the Intel Jordan Creek 1 chip, the other to boards uses Jordan Creek 2. Mem2 is a DDR4 board (The JC2 chip can do either) while both of the Mem1 boards use DDR3.

X10QBI also has the BMC, nics, and VGA on a daughterboard. According to the manual the board will not boot with the BMC board. For some reason electronics recyclers remove and sell the BMC board separately.

According to the supermicro docs, v3 and v4 cpus are supported in X10QBI but only with Mem1 rev 2 and Mem2. So, contrary to common belief in 2011-1 boards it is possible to run v3/v4 CPUs with DDR3 ram, and it's even a supported configuration.

However, Mem1 rev 2 boards don't appear to be common. I have some, but have seen fairly few show up in the surplus market.

There are also at least two revisions of the x10qbi itself. I have rev 1.01 and rev 1.2a. Both seem equivalent so far.

Getting a v3/v4 cpu to boot in x10qbi requires a new-ish bios. None of the ones I've obtained except for a sys-4048b-tr4ft that had mem2 boards have had a recent enough bios. The bios can be updated from the BMC, though it does the typical supermicro licensed feature thing. There is a one-liner shell keygen that uses openssl for sha1, if you find yourself stuck in the middle of the night trying to fix one of these things.

Now as I mentioned before, the common mem1 rev 1.01 does not officially support v3/v4 cpus. But I have personally found that they work. Some support matrix documents I receive indicate that for JC1 chip intel performs only limited on electrical and operational tests and did not validate w/ the full matrix of supported memory configurations. So YMMV.

One challenge, then, is that systems based on X10QBI are somewhat finicky. There are a LOT of parts. If any memboard is even somewhat mis-seated, the boot process will fail with inscrutable post codes which only sometimes indicate which board is the issue. The 2011 sockets are fragile-- no more so than other modern sockets-- but it's easy to get bent pins when you're working with a *bunch* of CPUs in a crowded system. I would highly recommend that anyone who wants to experiment here obtain a set of v2 cpus to use for testing or as a fallback in case v3/v4 do not work. (I have a bunch of 2.8GHz E7-4890 v2s that I'd be happy to sell for the ~$50/ea I paid for them, if anyone is interested)

These systems boot fairly slowly. Making it all the way to bios takes about 5 minutes, and when you change hardware the system will sometimes reboot and go through the self-tests again-- so 10 minutes to try a configuration is pretty common. If you're inserting memboards one a time it can easily take two hours (including the time it takes to install the ram itself).

Ram compatibility is also at least somewhat complicated: I never got these systems to post with 4GB ECC rdimms. Full performance from these systems require 32x 1333+ MHz (for MEM1) or 32x 1600 MHz ram. The JC controllers do an interesting trick where the 4-way memory interface of the e7 cpu effectively becomes 8-way per socket by running it at the full 3200 MHz while running the ram at half that speed and pairing up two chips at a time. Supermicro has memory compatibility lists, a number of chips on it are widely available.

The systems can be booted with dimms on any memboard, for testing, however. CPUs without ram will be inactive. Some of the JC docs also suggest that 64x dimms can have better performance depending on the chips ranks due to rank interleave.

X10QBI uses a somewhat unorthodox powersupply which has more than the usual number of cpu power connectors, otherwise its an atx psu. Be aware if you're thinking of trying to get a motherboard and use it directly. I'd recommend against that-- esp with the memboards being particular about how well they're seated.

I currently have 6 x10qbi systems each running with four E7-8880 2.2GHz ES cpus each. Three of the systems are Mem1 rev 1.01, two of them are mem1 rev 1.2, and one of them uses mem2 (w/ DDR4). After much tweaking and cursing, dealing with a dead power distribution board, a bad cmos battery, some bent 2011 pins, and @#$@ heatsink compound getting on the cpu pins... all of them are up and appear stable under high load.

If anyone else wants to try out v3/v4 cpus in an unsupported configuration I'd be glad to lend what knowledge I've gained. Understand that I might not have the complete picture and it might not work with some hardware, some ram, or might not actually be stable (though it appears to be for me). Supermicro appears to know nothing about v3/v4 cpus working in these older systems, and certainly doesn't support it themselves. Because these systems are heavy and a pain to ship, you might be able to obtain them much less expensively than the going rate on ebay (I did).

One challenge I had with the BMC board is that 33% of the boards I got from ebay had a password set, and if a password is set and you need the BMC to upgrade the bios before you boot-- you're out of luck. I got around this by using an unpassworded BMC to bring a system up, then after it was working swapped BMCs and used ipmitool to reset the password. I tried using a network exploit against older BMC firmware, but none of mine were vulnerable. If someone here runs into this problem I might be able to help you out by doing the same trick for you in exchange for snack money.

I only discovered the v3/v4 apparently-compatibility myself because I obtained several systems, some of which had the newer mem boards and were supported so I had v4 cpus for them and I tried the cpus in the unsupported systems.

You seem to be extremely knowledgeable on these servers. I know this post is relatively old but I have the same system running 4 E7-8880 V2 CPU's with 4 of the daughter boards for ram with 4 sticks each. Randomly last night the system shut off and wont even power on to try and boot. I have tried clearing the cmos following the instructions in the manual( unplug server, remove cmos battery, short the two pad contacts on the motherboard) and yet still nothing. I also left it unplugged overnight with the cmos battery pulled out. The server doesn't do very much mostly an Unraid server with a few dockers and plex. I checked the log and there were a few overheat warnings on CPU3 but some were from a few days ago and it's definitely not hot now. I'm just lost and would appreciate any help.

Thank You

Garrett H.

NablaSquaredG · May 31, 2021

If

[it] wont even power on to try and boot

chances are good that your board has just died of old age.

Have you tried all the basic troubleshooting steps? Like trying a different power supply, booting in minimal configuration (one CPU and one DIMM on one memory riser), etc....

Garrett389 · May 31, 2021

NablaSquaredG said:
If

chances are good that your board has just died of old age.

Have you tried all the basic troubleshooting steps? Like trying a different power supply, booting in minimal configuration (one CPU and one DIMM on one memory riser), etc....

I haven't tried any dissasebly yet I have tried moving the power supplies around and using only 2 instead of 4 in different configurations. While logging into the IPMI It says system off and when I click power on it just says "Performing power action failed. Please check."

NablaSquaredG · May 31, 2021

Garrett389 said:
"Performing power action failed. Please check."

That's bad.
I've never had a board I was able to revive after getting this error... Sorry for the bad news mate

Eso · Jun 1, 2021

angel_bee said:
UPDATE:

it's the CPUs.
somehow.

scratching my previous posts about Q vs. non Q variants. @NablaSquaredG you were right. you actually can mix them. I put in 4x E7 4820 V2s and it instantly worked.

Prior to this, I was running 4x 8880 v3 with MEM1 Rev. 1.01 boards and it was working fine with 64 x 8 GB RDIMMs so I took the compatibility for granted... sigh. there must be something about LRDIMMs that makes the system increasingly unstable the more channels that are populated.

The best way would probably be to update to a new BIOS that supports this configuration or buy some MEM1 Rev 2.00 boards. but no new BIOSes have been released since 2019 and MEM1 rev 2.00 boards are impossible to find.

UPDATE #2 4/4/2021 (it's midnight now): wrapping up my experience with v3 CPUs and 64GB LRDIMM - a guide for using 64 GB LRDIMMs with MEM1 rev. 1.01 getting VMSE DC detect failure/VMSE DDR training failure.

After another day of frustration, I've got my system working in pretty much the best way as I feasibly can. Hopefully the lessons I've learnt can help others in the future.

As others have previously mentioned, the Jordan Creek memory buffer is different from MEM1 rev. 1.01, MEM1 rev. 2.00 and MEM2. Others in this forum have only described usage with up to 32 GB DIMMs, but never with 64GB LRDIMMs. As a general observation, I think the issues I ran into may be specific to 8rx4 LRDIMMs being run on MEM1 rev. 1.01 boards, and this issue does not show with RDIMMs with lower rank or capacity.

I managed to find out that in lockstep 1:1 mode, everything seems to work perfectly. It is only with 2:1 performance mode that the issue emerges. This led me to think that maybe the memory buffer in MEM1 rev. 1.01 does not have a high enough bandwidth to support these massive LRDIMMs? So maybe when it tries to double the data rate with 2:1 performance mode, it chokes.

So you can use ALL 4 channels in ALL memboards with MEM1 rev 1.01 provided you must use lockstep mode. However, because lockstep mode literally halves the DRAM performance, it is highly undesirable because you'd be using LRDIMMs for large in-memory computes anyways.It results in a maximum effective 4 sockets x 2 memboards per socket x 2 channels per memboard = 16 RAM channels because, as I understand, DIMMA is tied to DIMMC and DIMMB is tied to DIMMD for each board for lockstep. So is there a way to get more than 16 effective channels?

It turns out I also managed to discover that you can fill in a maximum of 6 of the 8 available channels for every socket in 2:1 performance mode. Beyond this, I get the DDR training failure and the show needs to be stopped. Technically, I think it means that the MEM1 rev. 1.01 with V3/V4 CPUs somehow can only handle the bandwidth equivalent to at most 6 "doubled" physical channels per socket (or equivalently, 6 independent logical channels per socket). If anyone has a better explanation, I'd like to know.

SO.... this means for each socket, for example, processor #1, you fill in DIMMA1, DIMMB1, DIMMC1 and DIMMD1 for P1M1 (all 4 channels occupied). Then for P1M2, you only fill in DIMMA1 and DIMMB1 (channels 5 and 6). I'm sure that you can fill out more slots per channel e.g. DIMMA2, DIMMA3... but I don't have enough LRDIMMs to find out.

Populating all 8 memboards in this 4/2/4/2/4/2/4/2 configuration in 2:1 performance mode actually POSTS. time to celebrate. What this means is the total effective channels in this configuration will be 4 sockets x 6 independent channels per socket = 24 effective channels

A test on Geekbench confirmed the RAM speed indeed improved, but as expected, the score is consistent with a 25% RAM performance drop from missing the last 2 physical channels on every socket.

In summary, as a general recommendation, using 64GB 8rx4 LRDIMMs brought out the underlying limitations associated with using v3/v4 CPUs in the unsupported configuration (i.e. MEM1 rev. 1.01), despite fully healthy memboards and fully working LRDIMMs which were retrieved straight from the Supermicro compatibility website. Since MEM1 rev. 2.00 boards are practically impossible to find, this is a workaround that enables using cheap DDR3 LRDIMMs, but at the same time it sacrifices 25% of the peak RAM performance and when it POSTS, you'll have to wait a longer time for it to gradually work through the VMSE DC detect failures (which are relatively benign).

Howdy everyone, this has been a great read. I am currently running my x10QBi with four 8880v3's with four MEM1R1 cards. I have worked through the bios problems and updated the bios successfully, however, I am still running into "VMSE DC Detect Failure" with any of my memcards belonging to the third processor ONLY when four CPUs are installed.

I've verified that all 16 of my 8gb 1333 MHz RDIMMs are working ( MT36KSF1G72PZ-1G4 ). All of my CPUs are verfied working as well as my memory cards.

The system boots in sockets P1 + P3 with all memory cards and ram, but once I go to P1-P4 populated, I get the DC failure on the P3M1 (and M2 when I swapped them) so I'm assuming this is related to the v3 mem1 compatability issues.

Specs:
X10QBi rev 1.2b
4x E7 8880 v3 (retail versions)
16x8GB Micron 1333MHz RDIMMs (MT36KSF1G72PZ-1G4)
4x MEM1 Rev 1.01 Memory Cards
IO Card
(updated to most recent bios through DOS because I can post with two CPUs max)

Status:
Getting stuck on "P3M#: VMSE DC Detect Failure" with various memory configurations and bios changes.

What I've tried:
I've played around with 2:1 and 1:1 without luck.

I am using one dimm in DIMMA1 per card, I also have tried populating DIMMA1+DIMMB1 as well as DIMMA1-D1 with the same problems.

I've swapped each of my single memory sticks out in the third processor's memory card when only DIMMA1 was populated and I have not been able to get it to go to bios with four CPUs.

I've swapped other CPUs into the third socket and they all still read the same DC error in the P3M# regions.

I've checked for bent pins on the third socket with a magnifying glass and good lighting (none)

I've checked grounding under the board, and currently it is sitting with 18 ATX standoffs on a flat metal surface with nothing bridging to the back of the mem cards.

I've updated the bios to the most current revision through DOS and cleared CMOS.

I've tried running in ME Recovery mode.

I've been looking into the Intel troubleshooting guides with the memory controller in these resources, but I have not been able to get much useful information out of them as well as what memory settings do:

SEL Troubleshooting
E7 Product Family Specs Booklet (Intel)
Memory Controller Intel Specs Page

Additionally, the VMSE is also known as Intel SMI 2, so I have been using this for searching troubleshooting solutions.

I'm going to have to try out memory rank sparing tonight when I get time, I am stumped right now.
Anybody have any ideas?

Eso · Jun 16, 2021

Eso said:
Howdy everyone, this has been a great read. I am currently running my x10QBi with four 8880v3's with four MEM1R1 cards. I have worked through the bios problems and updated the bios successfully, however, I am still running into "VMSE DC Detect Failure" with any of my memcards belonging to the third processor ONLY when four CPUs are installed.

I've verified that all 16 of my 8gb 1333 MHz RDIMMs are working ( MT36KSF1G72PZ-1G4 ). All of my CPUs are verfied working as well as my memory cards.

The system boots in sockets P1 + P3 with all memory cards and ram, but once I go to P1-P4 populated, I get the DC failure on the P3M1 (and M2 when I swapped them) so I'm assuming this is related to the v3 mem1 compatability issues.

Specs:
X10QBi rev 1.2b
4x E7 8880 v3 (retail versions)
16x8GB Micron 1333MHz RDIMMs (MT36KSF1G72PZ-1G4)
4x MEM1 Rev 1.01 Memory Cards
IO Card
(updated to most recent bios through DOS because I can post with two CPUs max)

Status:
Getting stuck on "P3M#: VMSE DC Detect Failure" with various memory configurations and bios changes.

What I've tried:
I've played around with 2:1 and 1:1 without luck.

I am using one dimm in DIMMA1 per card, I also have tried populating DIMMA1+DIMMB1 as well as DIMMA1-D1 with the same problems.

I've swapped each of my single memory sticks out in the third processor's memory card when only DIMMA1 was populated and I have not been able to get it to go to bios with four CPUs.

I've swapped other CPUs into the third socket and they all still read the same DC error in the P3M# regions.

I've checked for bent pins on the third socket with a magnifying glass and good lighting (none)

I've checked grounding under the board, and currently it is sitting with 18 ATX standoffs on a flat metal surface with nothing bridging to the back of the mem cards.

I've updated the bios to the most current revision through DOS and cleared CMOS.

I've tried running in ME Recovery mode.

I've been looking into the Intel troubleshooting guides with the memory controller in these resources, but I have not been able to get much useful information out of them as well as what memory settings do:

SEL Troubleshooting
E7 Product Family Specs Booklet (Intel)
Memory Controller Intel Specs Page

Additionally, the VMSE is also known as Intel SMI 2, so I have been using this for searching troubleshooting solutions.

I'm going to have to try out memory rank sparing tonight when I get time, I am stumped right now.
Anybody have any ideas?

Turns out that my P3M1 slot is dead, not worth trying to return but everything is working by using all P#M2 slots.

Garrett389 · Jun 18, 2021

Does anyone know if the 24pin and 8pin CPU power are the same layout as an ATX PSU? I spoke with Supermicro about my issues and they think it is the PDU and told me to plug a 24pin into the board to test it but never responded to my question about this. The last thing I want to do is fry the motherboard if it's just the pdu.

Edit: I couldn't find anything that said it wouldn't work and I kinda said screw it and tried it and it worked. The Power distribution board was the issue. Who knows how long until it will be here but thankfully that was the issue. $250 vs $1600

SnowDigger · Aug 22, 2021

I have been through the posts multiple times but still can't figure out on the exact Lockstep settings to double (or significantly improve) the memory speed. I am running four Intel Xeon E7-8891 v3 procs. I have populated all blue slots in all 8 MemCards (Rev 1.01) with Samsung (32x8GB) [M393B1K70DH0-CK0] PC3-12800R modules. Could somebody be so kind and explain in detail on how to go about it?

X10QBI and v3/v4 cpus (e.g. supermicro sys-4048b-trft)

Member

Bringing 100G switches to homelabs

Member

Member

Member

New Member

New Member

Bringing 100G switches to homelabs

Member

Bringing 100G switches to homelabs

New Member

Bringing 100G switches to homelabs

New Member

Bringing 100G switches to homelabs

New Member

Bringing 100G switches to homelabs

New Member

New Member

New Member

New Member