X10QBI and v3/v4 cpus (e.g. supermicro sys-4048b-trft)

gmaxwheel

New Member
Dec 24, 2019
22
11
3
High core count v3/v4 E7 CPUs (as well as v2 for that matter) are often available at relatively low prices. A big reason the prices are low is that compatible boards are scarce, as they're quad socket parts that were only found in relatively exotic and high end systems.

One of the most widely available boards is the Supermicro X10QBI. The X10QBI is an unusual board: it has four 2011-1 (E7 compatible) sockets and 96 dimm slots. The 96 dimm slots are accomplished by the ram being placed on daughter boards, with part-number X10QBI-MEM. The daughterboard based ram creates a lot of confusion because there are multiple versions of the memory boards: MEM1 rev 1.01, MEM1 rev 2.0, and MEM2 rev 1.01.

Mem1 rev 1.01 uses the Intel Jordan Creek 1 chip, the other to boards uses Jordan Creek 2. Mem2 is a DDR4 board (The JC2 chip can do either) while both of the Mem1 boards use DDR3.


X10QBI also has the BMC, nics, and VGA on a daughterboard. According to the manual the board will not boot with the BMC board. For some reason electronics recyclers remove and sell the BMC board separately.

According to the supermicro docs, v3 and v4 cpus are supported in X10QBI but only with Mem1 rev 2 and Mem2. So, contrary to common belief in 2011-1 boards it is possible to run v3/v4 CPUs with DDR3 ram, and it's even a supported configuration.

However, Mem1 rev 2 boards don't appear to be common. I have some, but have seen fairly few show up in the surplus market.

There are also at least two revisions of the x10qbi itself. I have rev 1.01 and rev 1.2a. Both seem equivalent so far.

Getting a v3/v4 cpu to boot in x10qbi requires a new-ish bios. None of the ones I've obtained except for a sys-4048b-tr4ft that had mem2 boards have had a recent enough bios. The bios can be updated from the BMC, though it does the typical supermicro licensed feature thing. There is a one-liner shell keygen that uses openssl for sha1, if you find yourself stuck in the middle of the night trying to fix one of these things.

Now as I mentioned before, the common mem1 rev 1.01 does not officially support v3/v4 cpus. But I have personally found that they work. Some support matrix documents I receive indicate that for JC1 chip intel performs only limited on electrical and operational tests and did not validate w/ the full matrix of supported memory configurations. So YMMV.

One challenge, then, is that systems based on X10QBI are somewhat finicky. There are a LOT of parts. If any memboard is even somewhat mis-seated, the boot process will fail with inscrutable post codes which only sometimes indicate which board is the issue. The 2011 sockets are fragile-- no more so than other modern sockets-- but it's easy to get bent pins when you're working with a *bunch* of CPUs in a crowded system. I would highly recommend that anyone who wants to experiment here obtain a set of v2 cpus to use for testing or as a fallback in case v3/v4 do not work. (I have a bunch of 2.8GHz E7-4890 v2s that I'd be happy to sell for the ~$50/ea I paid for them, if anyone is interested)

These systems boot fairly slowly. Making it all the way to bios takes about 5 minutes, and when you change hardware the system will sometimes reboot and go through the self-tests again-- so 10 minutes to try a configuration is pretty common. If you're inserting memboards one a time it can easily take two hours (including the time it takes to install the ram itself).

Ram compatibility is also at least somewhat complicated: I never got these systems to post with 4GB ECC rdimms. Full performance from these systems require 32x 1333+ MHz (for MEM1) or 32x 1600 MHz ram. The JC controllers do an interesting trick where the 4-way memory interface of the e7 cpu effectively becomes 8-way per socket by running it at the full 3200 MHz while running the ram at half that speed and pairing up two chips at a time. Supermicro has memory compatibility lists, a number of chips on it are widely available.

The systems can be booted with dimms on any memboard, for testing, however. CPUs without ram will be inactive. Some of the JC docs also suggest that 64x dimms can have better performance depending on the chips ranks due to rank interleave.

X10QBI uses a somewhat unorthodox powersupply which has more than the usual number of cpu power connectors, otherwise its an atx psu. Be aware if you're thinking of trying to get a motherboard and use it directly. I'd recommend against that-- esp with the memboards being particular about how well they're seated.

I currently have 6 x10qbi systems each running with four E7-8880 2.2GHz ES cpus each. Three of the systems are Mem1 rev 1.01, two of them are mem1 rev 1.2, and one of them uses mem2 (w/ DDR4). After much tweaking and cursing, dealing with a dead power distribution board, a bad cmos battery, some bent 2011 pins, and @#$@ heatsink compound getting on the cpu pins... all of them are up and appear stable under high load.

If anyone else wants to try out v3/v4 cpus in an unsupported configuration I'd be glad to lend what knowledge I've gained. Understand that I might not have the complete picture and it might not work with some hardware, some ram, or might not actually be stable (though it appears to be for me). Supermicro appears to know nothing about v3/v4 cpus working in these older systems, and certainly doesn't support it themselves. Because these systems are heavy and a pain to ship, you might be able to obtain them much less expensively than the going rate on ebay (I did).

One challenge I had with the BMC board is that 33% of the boards I got from ebay had a password set, and if a password is set and you need the BMC to upgrade the bios before you boot-- you're out of luck. I got around this by using an unpassworded BMC to bring a system up, then after it was working swapped BMCs and used ipmitool to reset the password. I tried using a network exploit against older BMC firmware, but none of mine were vulnerable. If someone here runs into this problem I might be able to help you out by doing the same trick for you in exchange for snack money. :)

I only discovered the v3/v4 apparently-compatibility myself because I obtained several systems, some of which had the newer mem boards and were supported so I had v4 cpus for them and I tried the cpus in the unsupported systems.
 
Last edited:

sno.cn

Active Member
Sep 23, 2016
206
73
28
36
I was looking at those 15 core ivy bridge E7-4879 V2 CPUs on eBay for a while but couldn't find anything to run them in that matched the $30-40 per CPU price tag.

It was only the X10QBI (with a bunch of add-on parts like memory risers) or from memory, Dell R920, HP DL580 G8, and some cisco blade stuff. CPU price was dirt cheap, but getting them running wasn't worth the total build cost.

So the supermicro board works with V3/V4 too?
 

gmaxwheel

New Member
Dec 24, 2019
22
11
3
So the supermicro board works with V3/V4 too?
Yes, with MEM2 or MEM1r2 officially, and with MEM1r1 in my experience (with 3 systems, and two different sets of v4 engineering sample cpus) but not officially supported.
 

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
Hi, I jumped on board and purchased one of these boards with the M1 Rev 1 Memory Cards and BMC/Vid card. I am in the process of building a custom system. Do you know if I use 4 memory cards what the optimal ram configuration is? 32 pcs 1600 MHz ECC ram? I don't have 8 of the MEM1 cards, only 4. Will it be a performance problem?

Thanks for the excellent write-up on this system.
 

gmaxwheel

New Member
Dec 24, 2019
22
11
3
If you have four mem boards, you can get all the perf available to you with 16 dimms. You'll get a memory bandwidth reduction from that config, if it'll be noticeable depends on your application. Make sure you put 1 mem board per cpu (every other socket on the motherboard).
 

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
Hi Gmax -- I want to build a open rig miner for randomx/argon based coins with this setup. I am trying to determine if I should buy 4 more mem1 v1 cards and the only ones I can seem to find. If it will be a worthwhile improvement? I know this algo is memory intensive but I am not sure if going from 4 to 8 memcards and 32pcs ram to 64 pcs will be justifiable? Thanks for your input.
 

gmaxwheel

New Member
Dec 24, 2019
22
11
3
You can max out the memory speed (pretty much) with 32 pcs of ram, 4 pcs per mem board and all 8 memboards. Some of the docs indicated that it could be slightly faster with 64 pcs of ram (8 per memboard) depending on rank and only with memory configs where the additional dimms don't force a lower clock but that difference is small, a couple percent at most.

I have one system that is down a memboard on one socket, so I could run the xmrig benchmark on it and the neighboring socket tonight and let you know what the speed is.

The baseline power draw of these systems (esp kited out with a full complement of ram) is pretty high, so I'm not sure that it would make for the best monero miner. :) I can give numbers on that too.
 

gmaxwheel

New Member
Dec 24, 2019
22
11
3
Per your question: I benched randomx on a quad socket 8880v4 (22core/2.2ghz) with xmrig in a system with mem1 boards and 1600MT/s memory. The result was 35213 H/s, or 400 H/s per core. On the system with one socket having only one memboard, it was 387.74 H/s on those cores, or about 3% slower so not a huge loss for that application. Power usage for the system running on 240vac according to dcmi was 1276 watts-- though my power usage might be quite a bit higher than yours since mine have 1TB ram, 100gbe and a phi card in them. (I'm not super eager to go mucking with the hardware just to answer this question... though I am a little curious what the idle phi draws)

I would say that it's not worth using more mem boards unless you happen to come across some for free.

I didn't try too hard to tune anything beyond 1gb hugepages, it looks like xmrig is NUMA aware but perhaps it's making some dumb decisions.

For this application the performance per watt doesn't compare particularly favorably to my dual 7742 systems (with 2666MHz DDR4) : they get 87304 H/s and draw a fair bit less power (even if nixing the big ram/phy/100gbe cut the power usage a lot, it still wouldn't be close). ... but then again, those cost a *lot* more per core than a thriftily constructed multisocket broadwell system. :) With what I paid for parts quad socket broadwell is like 3.5x better perf/$.

It seems to me that that particular application *really* likes zen2. Presumably it was more or less designed with that processor (or similar) in mind. For most of my software I find the 2x 7742 systems maybe 30-40% faster than the 4x 8880v4 systems.
 
Last edited:

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
Thanks very much for doing the benchmark. Power usage isn't a concern for me but I am trying to keep a high performance/dollar in the build. I don't have the cash to shell out for a Zen2 system till some price drops. I'd really like to know what the performance per core is with 1 memory board and just 4 sticks of ram run in independent/optmizer mode. I noticed that a fully loaded Opteron 4Processor 6386SE was actually slower running 32 DIMM than using 16 DIMM modules. On this system Randomx performance peaks out with 16 DIMM modules installed.
 

gmaxwheel

New Member
Dec 24, 2019
22
11
3
The system I tested above has 7 memory boards (mem1 rev2), in 2:1 mode running at 1600MT/s. 6 with 4 dimms, and 1 with 8 dimms. The cpus in the socket with only one memboard were 3% slower. This isn't exactly the test you wanted, but I think it's close.

If you google up the fujitsu performance optimization guides for their quad socket systems using jordan creek there is a LOT of discussion in them about the performance of various memory configurations: https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-broadwell-ex-memory-performance-ww-en.pdf (broadwell, presumably using JC2) and https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-ivy-bridge-ex-memory-performance-ww-en.pdf (ivy-bridge, presumably using JC1).

(and looking again at those docs the second seems to be saying that for jc1 independent mode has a max clock speed of 1333 MHz-- looking at my systems, the mem1 rev1 systems are at 1333, while the mem1 rev2 are at 1600).
 

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
WOW - Thanks for all the insight with this system. I will definitely go over that white paper. Looks to be some interesting information in there! MEM1 V1 seems to support 1600MHz as well but it appears only in lockstep mode.
 
Last edited:

gmaxwheel

New Member
Dec 24, 2019
22
11
3
According to that paper only in lockstep (1:1) mode. 2:1 mode basically pairs up dimms to get double the speed (effectively 1333 becomes 2666 and 1600 becomes 3200), so it's faster even if it has to run at a lower clock. I'll check my systems later today. but it appears that the ones I have with mem1 rev2 are running at 1600 and the ones with rev1 are running at 1333. All should be set to 2:1 mode in the bios. FWIW, xmrig was only a couple percent slower on the 1333 MHz hosts.
 

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
All of this is great information for the birds and seems to correlate what I read in the JC white paper. I need to understand better what they are talking about 2:1 Performance Mode... is that 8 DIMMS per memcard? a blue and black slot populated pair? I am specifically interested in performance mode as no redundancy is needed. It will be great if I could get the memory to run double speed.
 
Last edited:

gmaxwheel

New Member
Dec 24, 2019
22
11
3
2:1 mode needs 4 dimms per memboard.

Each CPU has 4 memory interfaces each of which can be run at up to 3200MHz.

There are two memboard slots per socket.

2 memory interfaces are wired to one memboard, 2 memory interfaces are wired to the other memboard which is why full memory bandwidth requires both memboards per socket.

(Though apparently losing half the memory bandwidth didn't hurt your randomx workload that much... maybe it would matter more for a lower core count part, having 22 cores sharing the same memory interface might be doing a pretty good job of hiding slow memory.)

In 2:1 mode, the CPU's memory interface is run at 2x the memory speed and a pair of dimms is combined (via bit-striping?) to work as one. The down side of this is that it limits memory speed to a maximum of 1600MHz due to the CPU's 3200MHz limit (it also, I suppose doesn't improve latency like actually going to 3200MHz would), or apparently 1333MHz for JC1. I think that's a pretty cool hack considering that these systems came from a time long before actual 3200MHz ECC ram was widely available.

In 1:1 mode the cpu memory interface and ram run at the same speed. This sometimes lets the ram run at somewhat higher clock rates (though not as high as 3200MHz, due to electrical limits e.g. because there are three dimm slots wired up).

1333x2 or 1600x2 end up being faster than whatever speed it can run at in 1:1 mode (I think 1600MHz for mem1 rev1, 1866 for mem1 rev2).

You'll see in the Jc2-having fujitsu system docs mention rank interleave, which is the config where I mentioned 64 dimms/system might be slightly faster. I don't really understand the rank interleave, but they didn't claim it was a big difference. It might also only apply to mem2/ddr4.
 
Last edited:

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
Ok, I tried interleaving memory on other systems with a full set of memory but it did not improve anything for my applications. I noticed that when I change memory population configs that without a recompile the changes are not always applied... did you recompile between tests?

I see what you are saying about the dual memcards now and the full bandwidth speed as a result of using them in pairs. 3% doesn't seem like a good enough improvement as you say.
 

Micha

New Member
Apr 3, 2020
5
0
1
Hi, I recently got a X10QBI barebone system with MEM1 Rev.1.01 boards. My previous experiences were from X9 quad-CPU boards, and from skimming the table at 2-17 in the SM manual, I had somehow hoped that my 16GB RDIMMs 2Rx4 1333 (Samsung
M39382G70BH0-YH9) from the X9 system could work. For testing, I put a single E7-4880 v2 CPU, and loaded the MEM1-board in P1M1 successively with one DIMM module in A1, in A1 & B1, and in all four blue sockets (A1, B1, C1, D1). Always got a POST hanging at 15, also when removing all other MEM1 cards. Also the LED control on the MEM1 card never showed another LED lit besides "ON", so I assume the system is just not recognizing the DIMMs.
My primary objective is to get to the bios and find out whether the board is working fine. I have up to four of the CPUs that I could put in, but Im a bit confused about the (minimal) memory configuration when putting one or multiple CPUs / MEM1 cards. Lets start with 1 CPU, in case that is supported. I understood that I can put a memory card in P1M1 and (optionally) another memory card in P1M2. When putting a card with (correct) DIMMs in P1M1, would any additional empty memory cards in P1M2 or in other slots disturb the POST? What would be the minimum configuration DIMM configuration for a MEM1 card, can I put a single DIMM (A1), do they have to be paired (A1/B1) or do I need at least the four blue sockets (A1B1C1D1) occupied with DIMMs?
My next doubt is about the DIMMs themselves. Table 2-17 in SuperMicro's X10QBI manual says that 16GB DRx4 modules are OK, in most SPC/DPC setups with a 1.35V, but also in some 1.5V configuration. Besides that it not listed explicitly as tested module, what would be the main guess that the Samsung M39382G70BH0-YH9 is not recognized? As an alternative, Im currently looking into Samsung M393B2G70DB0-YK0 16GB 1.35V 1600 1-Rank memory, which is listed by SM for the X10QBI and has a price that allows to get multiple DIMMs in case I have to occupy more slots to make the system boot. In case I have to occupy less slots for that, I could also think about 32Gb modules. So my basic question is actually what part(s) I am missing in the memory configuration requirements, to know how many modules I need for testing with 1-4 CPUs, and to guess which modules that are not from Supermicro's Testlist might work.
All the best
 

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
Hi Micha - I can tell you that this board is picky over memory modules. Post 15 seems to be a memory error. I've seen that on many modules. Try modules on the memory support list if you don't want to waste money. I found through trial and error that Samsung 16GB M393B2G70BH0-ck0 work and other similar modules may work too.
 

Micha

New Member
Apr 3, 2020
5
0
1
Hi AP, and thanks for the fast reply. That's a relevant info, OK, so I better stick DIMMs confirmed by SM. Could someone please give me a hand with some minimal number of DIMMs that could/should work in a MEM1 board? Supermicro's X10QBI manual for the board seems quite silent on that, the X9 boards' manuals had a table in which order to populate the onboard RAM banks with an increasing number of CPUs installed. Currently Im really puzzled whether I have to order 1, 2, 4 or maybe even more DIMM modules in order to get to a BIOS. All the best
 

agentpatience

New Member
Mar 3, 2020
18
1
3
Ottawa
Ok, no problem. Sorry I did not answer all your questions I was in a hurry. You can post 1 CPU with 1 or 2 or 4 DIMM per mem1 rev1.x card but not 3. That goes for all the other slots from my experience. I tested with A1 and B1 slots filled and started 1 CPU and 1 Memcard at a time to find this information out. i suggest you do that too so that you don't run into problems with misconfigured memory boot (error 15) Also -- board supports 1, 2, 3 or 4 CPU config.