LGA 1700 Alder Lake "Servers"

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

TheDragon44

New Member
Mar 16, 2023
6
1
3
Im not sure if this is going to be helpful regarding all the confusion re DDR5 ECC memory, but will share what I know...

I'm using the Asus w680 Pro IPMI with Kingston KSM48E40BD8KM-32 modules.

I've attached the output from dmidecode.

I queried the output regarding "single bit ECC" since I bought the memory directly from Kingston and this was the response I got from their technical department -

"Currently Kingston DDR5 ECC DRAM modules are built using a chip-organisation of "x8". "x8" organised modules only offer single bit correction. It's likely if you see "multi bit" correction, then this is likely from modules installed that have a "x4" chip-organisation. So, the single bit error correction displayed in the screenshots is unrelated to ECC onDie."

Does this help anyone understand better the DDR5 ECC specs/products?

Since I bought directly from Kingston, I'm happy to pose any questions that might help
 

Attachments

RolloZ170

Well-Known Member
Apr 24, 2016
6,321
1,930
113
if you see "multi bit" correction, then this is likely from modules installed that have a "x4" chip-organisation
can not be true too. there are same count of parity bits x8 vr. x4
if you look at DDR4 ECC, we have 8 parity bits for 64 data bits.
a DDR5 ECC with 2x 40bit has double the parity bits and can not do more than single bit corr. ?
edit: DDR5 UDIMM pinout supports only 4bit ECC per DIMM channel, 2x (32+4)=72bit
 
Last edited:

twin_savage

Member
Jan 26, 2018
85
43
18
34
can not be true too. there are same count of parity bits x8 vr. x4
if you look at DDR4 ECC, we have 8 parity bits for 64 data bits.
a DDR5 ECC with 2x 40bit has double the parity bits and can not do more than single bit corr. ?
I think this paper sheds some light on the different types of "dedicated" ECC that can be implemented on DDR5; basically DDR5 required more storage overhead for it's ECC to work than DDR4 but there is a chance it can correct more than a single bit error:

1680801091387.png

1680801376084.png

 
  • Like
Reactions: RolloZ170

Alex15326

New Member
Apr 5, 2023
4
1
3
I would like to add some thoughts about the DDR5 ECC parity bits discussion.

If I understand correctly, by the standard the on-die ECC is opaque to the system, so we can assume that it is up to the manufacturer to decide how to handle it, can support whatever number of bit correction and provides data integrity on the RAM itself (so unlike DDR4 and previous versions, we can assume that data going out of the RAM is always correct).

The advertised parity bits therefore have effect on data that leaves the RAM. Even if we don't know the algorithm used for error checking, SECDED Hamming codes dictate that 8 parity bits are enough to protect 64 data bits (7 parity bits are enough for 120 data bits, but are trimmed down to 64 and 1 extra parity bit for multi-bit detection is added), so this means that 72 bit lines should offer at minimum single-bit protection during data transfers.

If bit-flips correlate to system uptime (the more time a system is on, the higher the chance of a bit-flip), the data inside RAM should have the highest chance of bit-flips (but is already covered by on-die ECC), while data in travel should have a very low chance of bit-flips, because it occupies the data lines for insignificant periods of time. So the chance of a multi-bit flip during data travel should be even lower and 72-bit modules should offer the same or even more protection than DDR4 ECC modules.
 
  • Like
Reactions: RolloZ170

RolloZ170

Well-Known Member
Apr 24, 2016
6,321
1,930
113
If I understand correctly, by the standard the on-die ECC is opaque to the system, so we can assume that it is up to the manufacturer to decide how to handle it, can support whatever number of bit correction and provides data integrity on the RAM itself
yes, but this is specified by JEDEC too because there is except the auto On-Die-ECC(default), an additional way to control that by memory conroller host comands.
 

TheDragon44

New Member
Mar 16, 2023
6
1
3
have you used stock settings or manualy enabled ECC polling ?
How do you enable ECC polling?

I'm guessing this could be why I'm having an issue getting memtest86 to show anything ECC related as when I hit enter on ECC polling in memtest86 it just says ECC is disabled, despite dmidecode suggesting otherwise in Linux
 

saf1

Member
Nov 27, 2022
56
10
8
Anyone here with the Supermicro board update to the latest 2.0a bios? I tried today and received an error. Need to look into it a bit further as I've not seen this error before.

1680883268059.png
 

saf1

Member
Nov 27, 2022
56
10
8
Good point. Checked just now in case I forgot. I'll download it again and use a different USB stick in case something failed and I missed it. The install did revert it and running 2.0 so nothing major.

stargate:~$ sudo dmidecode -s baseboard-product-name

X13SAE-F
 

reasonsandreasons

Active Member
May 16, 2022
141
94
28
This is unrelated to most of this thread since it doesn't involve DDR5, but I'm currently setting up a new TrueNAS Scale box with an Asrock Industrial IMX-X1314. It's running a i5-12600K and two sticks of Kingston KSM32ED8/32HC no problem. I think this is the only W680 DDR4 board in any circulation, though it's only available on Ebay so far. My board shipped with BIOS 1.10, so no Raptor Lake support out of the box. The 1.20 update adds support for those processors and resizable BAR.

A downside is that this board doesn't have IPMI, but it does support AMT. I'm running my board with MeshCommander loaded into firmware for web KVM. Note that I don't think this works if you have a discrete GPU as the primary display output. If anyone needs it, the correct download link for the firmware loader is is here; Intel laid off the engineers responsible and the link on the site has been broken for a few months. All in all it's close enough for me, even if the software won't be maintained in the future.

Happy to answer any questions folks have about this board. If you'd like to skip the hassle of DDR5 and are okay with a less robust IPMI solution and PCIe 4.0, I think this is a good option with an excellent PCIe layout that makes the most of the chipset.
 
Last edited:
Jan 3, 2023
62
28
18
Good point. Checked just now in case I forgot. I'll download it again and use a different USB stick in case something failed and I missed it. The install did revert it and running 2.0 so nothing major.

stargate:~$ sudo dmidecode -s baseboard-product-name

X13SAE-F
Make sure you are using the BIOS for the correct board. The X13SAE and X13SAE-F (with the onbaord BMC) are different. I was having a problem when I originally built my workstation, and it turns out I was trying to apply the X13SAE-F bios ver 2.0 to my non-F board.
 

steve0

New Member
Jan 19, 2023
10
7
3
Did the Kingston memory you ordered work ok in the asus pro w680-ace ipmi, and show up as ECC enabled?

I tried to get the Hyinx modules but couldn't find them to buy anywhere in my country, the Kingston modules seem to be widely available though
Hi, sorry for the late reply. Like "ddr5ecc", I could not boot at all if running at 4800mhz on the W680-ACE. To boot with four sticks of 32GB running at 4800mhz, I had to change the memory controller voltage to 1.15v. I also set each stick to 1.10v using the option "By per PMIC", which I've heard can assist with stability when running four sticks at 4800mhz+. After I made those changes I was able to post and my system has been running rock solid for about 40 days.

I have not even had a chance to test whether ECC is working, but I will report back.
 

infuriatedream

New Member
Feb 6, 2023
7
1
3
Hi, sorry for the late reply. Like "ddr5ecc", I could not boot at all if running at 4800mhz on the W680-ACE. To boot with four sticks of 32GB running at 4800mhz, I had to change the memory controller voltage to 1.15v. I also set each stick to 1.10v using the option "By per PMIC", which I've heard can assist with stability when running four sticks at 4800mhz+. After I made those changes I was able to post and my system has been running rock solid for about 40 days.
Are you talking about the KSM48E40BD8KM-32HM modules? Are you using the first release BIOS 0203 or have you updated to 2305? What CPU are you using?

I have upgraded today from 2x32GB Kingston to 4x32GB Kingston and was expecting memory speeds to drop to 4400MHz (the official 2DPC speed for Alder Lake) but was surprised when the BIOS remained at 4800MHz (I had at no point changed any settings regarding memory speeds, so it's not overclocked or manually set to 4800MHz). Initial boot with 4 modules took some time (memory training!) but cold booting afterwards was back to normal speeds.

System survived ~50 minutes of MemTest86 Free v10.4b1000 after which I aborted the test and started the server again. Keeping my fingers crossed it won't crash or misbehave... I continue to be VERY happy with the ASUS W680-ACE.
 

RolloZ170

Well-Known Member
Apr 24, 2016
6,321
1,930
113
I have upgraded today from 2x32GB Kingston to 4x32GB Kingston and was expecting memory speeds to drop to 4400MHz (the official 2DPC speed for Alder Lake) but was surprised when the BIOS remained at 4800MHz
probably the "Enforce POR" setting is disabled by default.
 
  • Like
Reactions: UhClem
Jan 3, 2023
62
28
18
You mean the X13SAE-F right?

I have the non-F model, but hopefully this will be useful.

This is with OpenSUSE 15.4 (5.14.21-150400.24.60-default)

The non verbose version:
00:00.0 Host bridge: Intel Corporation Device a700 (rev 01)
00:01.0 PCI bridge: Intel Corporation Device a70d (rev 01)
00:06.0 PCI bridge: Intel Corporation Device a74d (rev 01)
00:0a.0 Signal processing controller: Intel Corporation Device a77d (rev 01)
00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11)
00:15.0 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 (rev 11)
00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11)
00:16.3 Serial controller: Intel Corporation Device 7aeb (rev 11)
00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11)
00:1a.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 (rev 11)
00:1b.0 PCI bridge: Intel Corporation Device 7ac0 (rev 11)
00:1b.4 PCI bridge: Intel Corporation Device 7ac4 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 (rev 11)
00:1c.1 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #2 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Device 7a88 (rev 11)
00:1f.3 Audio device: Intel Corporation Alder Lake-S HD Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (17) I219-LM (rev 11)
01:00.0 VGA compatible controller: NVIDIA Corporation AD103 [GeForce RTX 4080] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 22bb (rev a1)
02:00.0 Non-Volatile memory controller: Sandisk Corp Device 5030 (rev 01)
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
06:00.0 PCI bridge: Integrated Technology Express, Inc. IT8893E PCIe to PCI Bridge (rev 41)
08:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 03)

I have attached the very verbose version as a file.
 

Attachments

  • Like
Reactions: niekbergboer

niekbergboer

Active Member
Jun 21, 2016
166
74
28
47
Switzerland
You mean the X13SAE-F right?

I have the non-F model, but hopefully this will be useful.
Thank you. I actually do mean the X13SAZ-F though, specifically a board with IPMI. What I was trying to find out is whether one can use the Intel IGP (for, say, Plex) when also using the IPMI for remote management, which is one of the use cases that I would like to have.
 

Hubi

New Member
Aug 28, 2015
7
2
3
Currently Kingston DDR5 ECC DRAM modules are built using a chip-organisation of "x8". "x8" organised modules only offer single bit correction. It's likely if you see "multi bit" correction, then this is likely from modules installed that have a "x4" chip-organisation
for ECC UDIMM, the JEDEC DDR5 DIMM label guide JESD401-5A (Document is dated March 2023) defines 1 type only:
EC4 Unbuffered DIMM ("EC4 UDIMM"), dual sub-channel with 32-bit data and 4-bit ECC per sub-channel -> Code E
Others are:
EC8 Load Reduced DIMM (“EC8 LRDIMM”), dual sub-channel with 32-bit data and 8-bit ECC per sub-channel -> Code L
EC4 Registered DIMM ("EC4 RDIMM"), dual sub-channel with 32-bit data and 4-bit ECC per sub-channel -> Code P
EC8 Registered DIMM ("EC8 RDIMM"), dual sub-channel with 32-bit data and 8-bit ECC per sub-channel -> Code R

The designated key letter for this should be directly after the PC rating and speedbin, e.g. PC5-5600B-Exxx-

There seems to be no Code for EC8 UDIMM, therefore i guess this does not exist.
However all x72 EC4 UDIMMs carry 10 or 20 x8 chips.
 

infuriatedream

New Member
Feb 6, 2023
7
1
3
I continue to be VERY happy with the ASUS W680-ACE.
Well, I guess I have to take this back. The W680-ACE has completely died after less than 5 weeks in use.

I had one Proxmox crash on April 26th while copying a large amount of data from an USB 3.0 harddisk that I had directly passed-through to a guest.
Code:
pcieport 0000:00:1c.0: AER: Uncorrected (Non-Fatal) error received: 0000:05:00.0
igc 0000:05:00.0: PCIe Bus Error: serverity=Uncorrected (Non-Fatal), type=Transaction Layer
igc 0000:05:00.0:   device [8086:125b] error status/mask=00004000/00000000
igc 0000:05:00.0:    [14] CmpltTO
usb 2-6: Device not responding to setup address.
usb 2-6: Device not responding to setup address.
usb 2-6: device not accepting address 2, error -71
That might have been a Proxmox bug or an i225/i226 ethernet bug, this ethernet chipset is known to be extremely buggy. I reduced memory speed from 4800 to 4400 at this time to be safe, but this was probably not necessary.

Last night however, the system shut off without apparent reason. When I came to the office early this morning to see what's up and troubleshoot the mainboard it refused to properly turn on and stay on. When I press the power button it turns on for 2-3 seconds, the diagnostic LED displays 00 and the system turns off again. I removed 75% of my memory, tried different modules and different memory slots. I've tried two other power supplies. I've removed and reseated the CPU and removed everything from the case. I have tried the CMOS clear jumper and have removed the CMOS battery for some time. I have closely inspected the socket for damage (looks fine).

Nothing helped, the system continues to almost immidiately shut off after turning on.

The only thing then left to swap was the CPU which I swapped for an 12th gen Celeron, with which the mainboard behaves almost the same but turns off even quicker. Without a CPU the system stays on but of course the diagnostic code stays at 00. I would say it's dead. It's probably not the CPU because it's never the CPU unless it is the CPU. And I've tried the Celeron but I noticed unfortunately that on the CPU support list for the W680 Pro only i3/i5/i7/i9 is listed and no Celerons, so I can't be 100% sure of that.

Very frustrating, I've moved the Proxmox system to my personal PC (that was easier than expected, very nice) to keep the office running and ordered another ASUS W680 Pro to reward ASUS financially for delivering a board that lasted only 5 weeks.
 
Last edited:

infuriatedream

New Member
Feb 6, 2023
7
1
3
It's probably not the CPU because it's never the CPU unless it is the CPU.
Update: It was a dead 13700K cpu. I was lead astray by the board not booting with the Celeron G6900, but the replacement board did also not boot (= quickly turns off after powering on) with either the G69000 or the 13700K and a replacement 13700K booted in both the original board and the replacement board.

So did the CPU just die or did the board kill the CPU? ASUS is currently in hot water for killing the Ryzen 7000 X3D chips due to excessive overvolting... I guess we'll see if the replacement CPU will live for more than the 5 weeks the original CPU stayed alive.