Lots of trouble with Z11PA-U12-10G-2S + 4210R

Tinkerer

Member
Sep 5, 2020
45
14
8
So I built me a home server for running VM's and storage for backups.

I considered going for a Dell T440 but it got too expensive quickly once you start adding things. So I dicided to build my own (something I've done plenty before).

Asus Z11PA-U12-10G-2S
Intel Xeon Silver 4210R
6x Kingston KSM29RD8/16HDR (16GB, 96GB total ECC registered, 2 ranks)
PSU is an Antec CP-850

The server is running Arch Linux.

First problem is the memory won't work on 2933, the board supports it, the memory too but it won't go over 2400.

Second issue, board supports triple channel memory when installed in the correct slots. It won't work that way and throws DIMM errors at POST. After hours of fiddling I found a combination that actually has all six dimms working but I have no way of confirming whether triple channel is working or not.

Third issue, when running CPU stress test, the cpu frequency won't always go up, depending on the test. I wanted to test the cooling and test the cpu isnt making mistakes by running Prime95. The frequency of all cores drop to 1700 Mhz the moment the test starts. Another stress test (simply called 'stress' for Linux) I found that the cores do jump to 2900 Mhz and stay there.

I've been rebooting the whole day going into the BIOS to try different cpu and performance settings but nothing changes the behavior with Prime. I'm not sure but I don't think this is right, it should at least stick to its baseline 2.4 Ghz right? Why would it drop to 1.7?

I watched the current from the wall outlet its actually lower with Prime (~ 230 watts) than with the other test that does stick to 2900 (~ 255 watts). Cpu gets hotter too but I havent seen temps over 50C so thats not an issue.

Any ideas about these? I have logged a case with both Kingston for the memory and Asus for the memory and cpu issue.

Thanks!
 

nago

New Member
Feb 20, 2019
6
5
3
First issue: Have a look into the specs of your CPU: WikiChip. Unfortunately Xeon Silver only supports up to DDR4-2400.
Third issue: Prime95 is most likely using AVX512 instructions in which case the frequency drops to 1,7 GHz all core.

For your second issue I'm not sure what you mean. Your CPU supports six memory channels.
 
  • Like
Reactions: Tinkerer

alex_stief

Well-Known Member
May 31, 2016
825
279
63
37
For issue number 2, check the manual. It clearly states which slots to use for 6 DIMMs. Hint: A1, B1, C1, D1, E1, F1.
And yeah, 6 memory channels, not 3.
 
  • Like
Reactions: Tha_14 and Tinkerer

Tinkerer

Member
Sep 5, 2020
45
14
8
Thanks for the quick responses, appreciate it!

First issue: Have a look into the specs of your CPU: WikiChip. Unfortunately Xeon Silver only supports up to DDR4-2400.
Thanks! I don't have experience building with Xeons, because the motherboard supported it, I assumed it was the same as with Desktops where the memory clock can be set independently from the cpu clocks so that memory can run at different speeds. The memory speed is adjustable, it just never goes over the 2400, for the reason you mentioned then.

Oke, not a real problem, I just made the wrong assumptions.

Third issue: Prime95 is most likely using AVX512 instructions in which case the frequency drops to 1,7 GHz all core.
Ow wow! Thanks for that, never knew about that. The wikichip links has specs I've never seen before.

Is there any reason for this? I mean, the power consumption goes down and the cpu doesn't get hot at all, so why would it do that?
Edit: just reading Frequency Behavior - Intel - WikiChip, it explains why that is.
 
Last edited:

Tinkerer

Member
Sep 5, 2020
45
14
8
For issue number 2, check the manual. It clearly states which slots to use for 6 DIMMs. Hint: A1, B1, C1, D1, E1, F1.
And yeah, 6 memory channels, not 3.
Thanks! I actually meant six channels, typed it up last night I was tired and frustrated. That's why I got six dimms for this reason.

And yes, those are the slot configurations I meant when I said
when installed in the correct slots.
The moment I put in A1 and D1 (if you look at the sequence from 2 dimms to 6 dimms, you'll notice it goes up in pairs A1/D1, B1/E1 and C1/F1), it throws an DIMM error on D1 (train error) and the entire dimm goes unrecognized and unused. It does this with each of the slot pairs.

So I found a configuration that uses all 6 dimms, but if I ever want to expand, I'm afraid its going to come back with these errors again and I don't know if its using six channels right now.

Thanks again!
 

Tinkerer

Member
Sep 5, 2020
45
14
8
Forgot to mention that yesterday during my troubleshooting and searches, I found several topics that deal with the memory_train error. More than once it was a cpu socket fault with bent contact or something.

I looked at this, using close up pictures with my phone to zoom in and I think this isn't the issue. I resocketed the cpu making sure its lowered straight into the socket (its attached to the cooler with the bracket).

I understand sometimes it can take up to 20 minutes for the memory to work (learn?) so Ive put it all back into the paired slots and just wait it out for now. We'll see.

Obvioiusly any other tips are welcome and appreciated!
 

nago

New Member
Feb 20, 2019
6
5
3
Hm, which slots exactly are you currently using with your six DIMMs? According to the manual you should be using all the blue memory slots.
When you tried only two DIMMs, which slots did you try? Again according to the manual it should be the most inner blue slots above and under the CPU socket.

I'm sorry for asking this as you already stated you used A1/D1, but it confuses me that you claim you get an error for each pair of DIMMs you add.

To verify your system actually uses six channel configuration you can check your BIOS. Sometimes it is stated there. Maybe Socket Configuration -> Memory Configuration shows something. If you are using Windows you can try CPU-Z, under Memory Tab it lists 'Channel #'.
 
Last edited:

Tinkerer

Member
Sep 5, 2020
45
14
8
Yes, correct. Blue slots first starting with A1/D1. When I do that, D1 won't work (with just 2 dimms installed).

Installing them all as A1/D1, B1/E1, C1/F1, results in C1, E1, F1 not working. One time I had E1 working too, I don't know why or how.

I've been tinkering a bit, removed the cpu, checked the socket again and reseated it and I only saw 16GB, 1 dimm in A1. The rest was unknown. Sometimes I see "unknown DIMM", sometimes I see the entire dimm brand, type, etc but as 0GB.

I reseated it again and gently shuffled it back and forth while seated in its socket and fastened it again. Then it saw the 4 dimms again from above. Reseating the dimms or swapping them around doesn't change the behavior, the problem follows the slots, not the dimms.

The configuration that works is A1, A2, B1, E1, E2, F1. At least its got all dimms working but probably not with six channels.

I believe the motherboard (socket) is a dud. I don't have another cpu to test but I guess that could be the issue too?
 

Tinkerer

Member
Sep 5, 2020
45
14
8
Oke I think I got it working. All dimms are in the correct slots and the system sees it all.

What I did? Took it all out, removed all components, put it back in, reseated all components, reset cmos and powered it back on. First boot took some time, configured the bios, rebooted and it took some more time, it rebooted once or twice by itself. And then it worked.

The only thing I would really like to verify now is that its actually working in six channel configuration. Sysbench memory benchmark doesn't really help, it ran before at around 18GB/s, not it runs at 21GB/s. Not the difference I was looking for. Raw bandwidth I believe should be around 100GB/s but whether that's going to show in a benchmark I doubt it.

Any ideas?

Thanks!
 

Tinkerer

Member
Sep 5, 2020
45
14
8
Nevermind, I booted Windows and checked with CPU-Z, apparently that shows it. Its hexa-channel so Im happy.
 

luflo

New Member
Dec 4, 2020
3
2
3
What I did? Took it all out, removed all components, put it back in, reseated all components, reset cmos and powered it back on. First boot took some time, configured the bios, rebooted and it took some more time, it rebooted once or twice by itself. And then it worked.
Looks like I'm not the only one with this issue. I have tried CMOS, different CPUs and dimms etc..

Would you mind sharing the settings you tweaked in BIOS to get the dimms working in the proper slots?

I currently have 4 dimms in slots A1/D1 and B1/E1 and get POST message: 'DIMM Train Error: DIMM_D1' which shows up as 0GB.


Thanks
 

Tinkerer

Member
Sep 5, 2020
45
14
8
Looks like I'm not the only one with this issue. I have tried CMOS, different CPUs and dimms etc..

Would you mind sharing the settings you tweaked in BIOS to get the dimms working in the proper slots?

I currently have 4 dimms in slots A1/D1 and B1/E1 and get POST message: 'DIMM Train Error: DIMM_D1' which shows up as 0GB.

Thanks
Yeah sorry for not getting back to the thread to post the real solution.

Turned out taking it out and putting it back in wasn't enough and trouble returned hours after I posted. But since it worked for short while, I figured it must have had to do with the physical installation.

Look at the attached picture. Make sure the orange circles don't have metal buses (also called standoffs with mounting holes where the screws go in). The top one shorts a few pins below the memory banks which was the real reason for my troubles.

Mine were fixed on the backplate in the case. I needed plyers to bend them back and forth to break them off.

I was extremely lucky; no damage whatsoever and its been stable 24/7 under high loads since.
 

Attachments

  • Like
Reactions: luflo

luflo

New Member
Dec 4, 2020
3
2
3
Thanks chief for quick reply!

I had 2 MB standoffs in the exact orange locations as in your diagram. Both were making contact with dimm pins on underside of MB. Top was making contact with the C1 dimm slot and the bottom was making contact with the D1 dimm slot, hence the error. Took out the standoffs and issue is resolved now.

Cheers
 
  • Like
Reactions: Tinkerer

Tinkerer

Member
Sep 5, 2020
45
14
8
Glad to hear you got that sorted!

Id' run some tests if I were you, memtest86+, few hours of prime95.
 

luflo

New Member
Dec 4, 2020
3
2
3
Yes, didn’t mention it but after booting I ran dmidecode first to verify channel interleaving and it seemed correct.

I then booted into memtest86 that is available in unraid boot menu. Couple passes so far so it looks promising.

Thanks
 
  • Like
Reactions: Tinkerer

Rand__

Well-Known Member
Mar 6, 2014
6,330
1,597
113
Did you ever have trouble with the onboard SATA devices?
Upgraded to the latest Bios recently and now my boot devices don't show more often than they do on the iSATA ports :(

Nevermind, it seems i played to much with the Bios settings, after Reset to Defaults stuff showed up again.
 
Last edited:

Tinkerer

Member
Sep 5, 2020
45
14
8
Did you ever have trouble with the onboard SATA devices?
Upgraded to the latest Bios recently and now my boot devices don't show more often than they do on the iSATA ports :(

Nevermind, it seems i played to much with the Bios settings, after Reset to Defaults stuff showed up again.
Good to hear its working again.

The behavior sounds a bit too erratic for me to believe a bios setting to be the real cause. Do you remember which settings caused it?
 

Rand__

Well-Known Member
Mar 6, 2014
6,330
1,597
113
No, played around with quite a few (cpu overclocking and such).
I assume it must have had something to do with timing, since the cdrom drive I had attached tro the SATA port was detected all the time, but a SATA drive on the same port failed most of the time (similar to the 8643 ports). Only after a cold boot (not always) it showed the 8643 based Sata drives (and nvme).

Was not in the mood for playing around more, glad it was working again...
 

Tinkerer

Member
Sep 5, 2020
45
14
8
Cool. One of the settings also increases the bus where iSATA is on, that setting also has a warning about it. I guess you may have increased that setting too much then. Makes sense.
 
  • Like
Reactions: Rand__

Rand__

Well-Known Member
Mar 6, 2014
6,330
1,597
113
Ah, that indeed would explain a lot, didnt realize it would do base clock increase but makes sense. Thanks for pointing that out:)
 
  • Like
Reactions: Tinkerer