Supermicro H11DSi only sees 768GB RAM

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Psynapsx

New Member
Feb 21, 2021
29
4
3
Hi guys,

I have a Supermicro H11DSi board (rev 1.0) with 2x Epyc 7551. I installed 16x Hynix ECC 128GB DDR4 LRDIMM 2666 modules. According to manual the rev 1.0 should handle 2TB memory and the rev 2.0 4TB.
But in IPMI "hardware information" menu it only sees 6 DIMMs = 768GB. When I go to sensor readings it correctly displays temperatures for all 16 modules.

Whats wrong with my system? thank you
 
Last edited:

Psynapsx

New Member
Feb 21, 2021
29
4
3
I think I found the issue.
This garbage does not support 2S4R modules probably, only 8R. I have the Hynix 2S4Rx4 128GB 2666 LRDIMM modules.
Porbably because 1st gen AMD EPYC does not support 2S4R 128GB.
On memory.net none of the 2S4R modules are listed as supported for this board.

Now I have 18k USD worth of memory I cannot use.
 

Psynapsx

New Member
Feb 21, 2021
29
4
3
Did you contact supermicro support?
yes. As always, their response was very "useful":)
"Hello,

We did not validate any 128GB modules

When you only popuplate P1-DIMMBA1 or B1 will the memory be detected ?
Did you swap the memory on non detected slots with detected slots to rule out memory failures ?"
 

Psynapsx

New Member
Feb 21, 2021
29
4
3
If the system doesn't support this ram why does it boot? And shows 768GB in ipmi and ubuntu?
good question. But most importantly, why they list 128GB DIMMs as supported size both on their website and user manual if they did not validate any modules?
They also claim max. memory support is 2TB with this board. It's only possible with 128GB modules as it has 16 DIMM slots.
 

Psynapsx

New Member
Feb 21, 2021
29
4
3
I can confirm Epyc 7001 series can handle 2S4R 128GB modules.I tested the DIMMs in an Asrock Rack ROMED8-2T with the same CPU and it handles 8x Hynix 2S4Rx4 128GB 2666 LRDIMM modules without issue (1TB).So it's the Supermicro motherboard, maybe due to it being rev1.0.
I will now test this in a Supermicro H11SSL rev2.0.
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
Maybe a stupid question, but since nobody else dared to ask: are you already on the latest bios version?
 

alex_stief

Well-Known Member
May 31, 2016
884
312
63
38
Have you tried reseating the CPUs and the memory modules? I can't speak from personal experience, but other people have reported issues with memory banks not being recognized on SP3, which could often be fixed by mounting the CPUs again. Maybe even using the mythical torque wrench. Apparently, the socket can be a bit finicky for making proper contact with all pins.
It's just odd that the system boots at all, and some of the DIMMs are recognized and obviously functional. Which would fit that kind of error.
 

Psynapsx

New Member
Feb 21, 2021
29
4
3
Have you tried reseating the CPUs and the memory modules? I can't speak from personal experience, but other people have reported issues with memory banks not being recognized on SP3, which could often be fixed by mounting the CPUs again. Apparently, the socket can be a bit finicky for making proper contact with all pins.
It's just odd that the system boots at all, and some of the DIMMs are recognized and obviously functional. Which would fit that kind of error.
I haven't tried reseating the CPUs yet. Thanks for the tip, I will try this.
I have tried reseating the modules however. Mixed them, installed only a few of them, etc.
I also have 8 Hynix 64GB LRDIMM modules, maybe it is worth a try to see if those are detected properly before reseating the CPUs. Only problem is that I only have 8 so I cannot fully populate all 16 DIMM slots.
 

i386

Well-Known Member
Mar 18, 2016
4,220
1,540
113
34
Germany
Have you tried reseating the CPUs and the memory modules? I can't speak from personal experience, but other people have reported issues with memory banks not being recognized on SP3, which could often be fixed by mounting the CPUs again. Maybe even using the mythical torque wrench. Apparently, the socket can be a bit finicky for making proper contact with all pins.
It's just odd that the system boots at all, and some of the DIMMs are recognized and obviously functional. Which would fit that kind of error.
This could explain why the ram termperatures of all dimms are reported in ipmi (ram spd connected to the bmc) but the os and system overview show only 6 dimms
 

MoMeanMugs

Member
Apr 16, 2018
60
19
8
74
I think everyone is on the right track. I don't have any Supermicro Epyc boards, but I have had issues with Supermicro Intel boards that needed the CPU reinstalled. I had similar behavior on my Supermicro boards where it would give a memory training error, but still boot. Swapping known good RAM into the same slot gave the same error, but reseating the CPU resolved it.
 

RageBone

Active Member
Jul 11, 2017
617
159
43
have a look at the eventlog, maybe it loged issues with specific mem sticks / slots?

Tempsensors are a separate thing from the stick actually working as memory, hence the 16 temp readings don't say anything about the sticks actually working.

CPUs can have dead channels, Boards can have bent pins making channels not work, sticks can be bad and all those together could cause sticks not appearing. so how about you swap a few stick through all slots to make sure all slots are working correctly with known good sticks?
 

Psynapsx

New Member
Feb 21, 2021
29
4
3
have a look at the eventlog, maybe it loged issues with specific mem sticks / slots?

Tempsensors are a separate thing from the stick actually working as memory, hence the 16 temp readings don't say anything about the sticks actually working.

CPUs can have dead channels, Boards can have bent pins making channels not work, sticks can be bad and all those together could cause sticks not appearing. so how about you swap a few stick through all slots to make sure all slots are working correctly with known good sticks?
no issues in the eventlog. I tested all modules in an Asrock Rack board, no issues, I think the sticks are fine.
I tried testing it with a few sticks but it won't boot unless at least the P1-DIMMC1 is populated.
Maybe because according to the table we should always begin with C1?

12.png