ElI5 - Why would CXL "replace" Optane/3D-Xpoint?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
I see this in a few places (mostly Chris Evan's excellent blog), and I assume neither Micron nor Intel engineers/managers are idiots to discontinue the tech, but why would CXL "replace" Optane?
CXL is an extension to PCIe architecture which allows even early on add more memory, including adding dram blocks onto a PCIe bus.
Optane (aka 3D-Xpoint) is a specific persistent memory type, which, unlike traditional NAND Flash storage, allows very low latency.
Optane, specifically PMem, was meant to supplement a more expensive DRAM to allow near-Dram speeds in-ram processing.

CXL would address the architecture where pmem was meant to fill, but it doesn't make dram any cheaper.

Am I missing something here?
 

i386

Well-Known Member
Mar 18, 2016
4,221
1,540
113
34
Germany
(I can't access the link, it wants a subscription or a login)

I think intel killed nvdimm (un)intentionally; optane was proprietary for the longest time (nvdimm-p became a jedec standard in 2021, optane was 4+ years old at that time) and other nvdimm standards were barely supported when intel had the biggest marketshare (certain 2011-3 boards supported nvdimms from viking technology but removed that support in later bios versions)
The "market" evolved around these problems and developed cxl :D
 

Stephan

Well-Known Member
Apr 21, 2017
920
698
93
Germany
I had dealings with Intel on manager level and after that would not want to exclude 'stupid'. Maybe Intel saw that big industry and the two handful hyper-scalers didn't bite. And those rather continued to use the known tech DRAM and flash and some (the horror) ARM CPUs. It is the better tech. Tiny latencies and basically unkillable through writes. I guess once patents expire we will see a Cambrian explosion out of China like with 3D printers out of the Republic of Prusa. So sometime around 2040. I guess that's an argument to half patent protection times.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
@i386 - it does ask for free registration (at least for now). Chris's stuff isn't bad, but I can't see myself paying for it.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
Bump. Does anyone have any idea about this? @Patrick maybe, you seem to know a hell lot about CXL and where it's going.
 

acquacow

Well-Known Member
Feb 15, 2017
784
439
63
42
I wonder if any of these decisions were made to avoid all of the Netlist patent disputes/etc...
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Sure. I think we have discussed this in a few articles in passing. Maybe in the glorious complexity piece.

Optane was not designed as a storage medium to compete with NAND on pricing. Instead, it was designed to compete with RAM.

The small dollar problem was to make faster storage. Some devices need low QD performance to Optane levels (disclaimer I love Optane), but it is a very small portion of the NAND SSD market. Not enough to support Optane as a storage medium, especially as capacities vastly diverge.

Many of the write performance use cases are easily served by CXL-enabled devices that can ingest to DRAM that has PLP, then data is dumped to slower storage.

As a memory, CXL fixes a bigger issue. With Optane PMem DIMMs, DRAM is used as a cache, then cold memory pages are pushed to slower Optane media. As a result, there was a natural floor with how inexpensive you can get an Optane setup since you need more DRAM to increase performance.

Once memory can be added via cards, put behind switches, and with CXL 3.0 hot swapped, not only can you build modules to do something similar (with cheaper NAND for bigger capacity), but you get other features as well. One really good example on "making RAM cheaper" is re-use. Behind CXL, one can have different media types. Today we are talking about DDR5 behind a CXL 1.1/2.0 controller. In the future, there will be DDR4 CXL cards. For a hyper-scaler, imagine the cost savings of re-using DDR4 as a slower memory tier behind CXL. Then layer on top of that the pooling/ sharing capabilities.

CXL makes Optane compete not just against current-gen DDR5, but also against previous generations. It also allows the concept of having a shelf of memory that spans many generations, so one does not need to upgrade DRAM at the same time as servers.

Hopefully, that helps. I just made it home after several canceled flights this week and am pretty tired.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
It makes sense. I did realize about future CXL versions and support for using older memory types - which would still be very fast but cheaper than the main ddr5/6. Thanks for the detailed response, Patrick.
I am looking forward to seeing CXL v2.0+ in servers very soon.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
I've found this very interesting video from SNIA talking about CXL, memory, and a candidate for Optane replacement - NRAM:
 

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
Sure. I think we have discussed this in a few articles in passing. Maybe in the glorious complexity piece.

Optane was not designed as a storage medium to compete with NAND on pricing. Instead, it was designed to compete with RAM.

The small dollar problem was to make faster storage. Some devices need low QD performance to Optane levels (disclaimer I love Optane), but it is a very small portion of the NAND SSD market. Not enough to support Optane as a storage medium, especially as capacities vastly diverge.

Many of the write performance use cases are easily served by CXL-enabled devices that can ingest to DRAM that has PLP, then data is dumped to slower storage.

As a memory, CXL fixes a bigger issue. With Optane PMem DIMMs, DRAM is used as a cache, then cold memory pages are pushed to slower Optane media. As a result, there was a natural floor with how inexpensive you can get an Optane setup since you need more DRAM to increase performance.

Once memory can be added via cards, put behind switches, and with CXL 3.0 hot swapped, not only can you build modules to do something similar (with cheaper NAND for bigger capacity), but you get other features as well. One really good example on "making RAM cheaper" is re-use. Behind CXL, one can have different media types. Today we are talking about DDR5 behind a CXL 1.1/2.0 controller. In the future, there will be DDR4 CXL cards. For a hyper-scaler, imagine the cost savings of re-using DDR4 as a slower memory tier behind CXL. Then layer on top of that the pooling/ sharing capabilities.

CXL makes Optane compete not just against current-gen DDR5, but also against previous generations. It also allows the concept of having a shelf of memory that spans many generations, so one does not need to upgrade DRAM at the same time as servers.

Hopefully, that helps. I just made it home after several canceled flights this week and am pretty tired.
I think there is a problem here: The CXL(dram) is more expensive than normal DIMM dram modules in the long run, while being much slower than the normal DIMM dram modules.

The point of the front half is, aside from the DRAM chips, you need an extra controller, and that controller isn't going to be cheap. We know that Micron and Intel took a long time to develope a worthy controller for 3dxpoint, and they are still way under-performed compared to the real capability seen on DCPMs, both for sequential speed and for random latency and iops, even on block mode. For a normal dram stick you don't need to design and manufacture such a delicated low latency controller, saving a lot of money. There are also other components such as super capacitors that need to be added, which both cost money and require attention and maintainence.

They may use their stock of older generation DRAM chips, but eventually the stock will run out, and they will have to make new ones which, obviously, will not cost less just because it's not used on memory sticks. You may argue that using older lithography is cheaper, but if they really care about it, they could just enlarge the die size or give the stick more ranks. A 4Rx4 DRAM stick can hold 64(72) chips already, with 3D packaging if needed. By the way this can not give it any advantage against 3DXP, because 3DXP can also do the same.

They may use NAND, but that means it's basically a normal SSD with a better protocol, which still has an FTL, a high latency, a steady state perf drop, and a write endurance limit.

The point of the back half is, not only the bridge controller is hindering the performance, the PCI-e controllers in the CPU are also slower than the memory controllers. We can also see the differences when comparing both versions of Optane devices. sequential read speed is about 30GB/s per stick from my testing, and the latency in MM mode is 200-300ns. The PCI-e version is not even close. If they make a cxl DRAM drive, you would expect it to be no better than a Optane drive except the sequential write perf.

The only disadvantages of DIMM form factor comparing to PCI-e form factor(CXL)are system compatibility, CPU's capacity support, the number of DIMM slots and some other logical designs, which, all aren't really of any problem if vendors are willing to change.

I think why intel claimed to "wind down" optane is because of the cheap price of DRAM chips currently. This severely dwindled 3dxp's price advantage, making its profit margin way less than ideal compared to what intel planned. They also have to dump a lot of development resources to keep scaling the capacity if they continue. Actually, if the market isn't as worse as we have now, 3DXP should be an excellent storage medium for a CXL drive.
 
Last edited:

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
I have to think Both Micron and Intel cancel Optate must have been a combination of factors:
a) On the higher end - P.Mem - Optate is running into CXL.Memory territory and its margins are too small due to lowering DRAM prices
b) On the NVM side, NVMe NAND is getting faster and lower latency with each generation, pushing the already small Optane margin lower.

Some of the CXL/Memory complexities, latency issues, and traditional memory connectivity designs were nicely covered in the SNIA video, which I again highly recommend watching.
 

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
I have to think Both Micron and Intel cancel Optate must have been a combination of factors:
a) On the higher end - P.Mem - Optate is running into CXL.Memory territory and its margins are too small due to lowering DRAM prices
b) On the NVM side, NVMe NAND is getting faster and lower latency with each generation, pushing the already small Optane margin lower.

Some of the CXL/Memory complexities, latency issues, and traditional memory connectivity designs were nicely covered in the SNIA video, which I again highly recommend watching.
I don't think NAND ssds will get even close to the performance of 3DXP though, unless it's severely bottlenecked by the controller, or the application simply doesn't care about the performance uplift.

Actually I think the NAND's development is entering a bottleneck. The speed and capacity per die have been stalling for some time now, and the write endurance and stability can only get worse when the gate size decreases during each update. The controller is getting harder to make as well. Right now as far as I know only Samsung has made an actually worthy PCI-e 5.0 SSD controller. Kioxia's CD8 won't bring any notable perf uplift compred to previous gen, according to the data we have now.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
I don't think NAND ssds will get even close to the performance of 3DXP though, unless it's severely bottlenecked by the controller, or the application simply doesn't care about the performance uplift.

Actually I think the NAND's development is entering a bottleneck. The speed and capacity per die have been stalling for some time now, and the write endurance and stability can only get worse when the gate size decreases during each update. The controller is getting harder to make as well. Right now as far as I know only Samsung has made an actually worthy PCI-e 5.0 SSD controller. Kioxia's CD8 won't bring any notable perf uplift compred to previous gen, according to the data we have now.
What "speed"? Speed is a bit vague. Memory speed is mainly throughput and latency. On throughput, NAND with PCIe v5 already surpassed existing Optane implementations. It still falls behind on latency and low queue performance, but Imho the gap is shrinking, and again, as it was earlier, the application where this write latency on NV storage would possibly matter is a fairly small market.
Not sure where you get your info on controllers, but Silicon motion already has the PCIe v5 product Adata XPG series. Phison, while showing some improvements already, it's not quite a massive upgrade, but it's typical for Phison to deliver generational performance improvements very slowly. In a few months, I expect a much improved PCIe v5 controller from them.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
Nand is slow by design, ssds get faster by "raiding" nand via the controller (they get faster with the newer generations)
you're correct, but again, details matter. TLC NAND is indeed slower by design than SLC, and both ram cache and "raiding" or striping it across NAND chips. Faster NAND raiding processor => Lower latency.
 

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
What "speed"? Speed is a bit vague. Memory speed is mainly throughput and latency. On throughput, NAND with PCIe v5 already surpassed existing Optane implementations. It still falls behind on latency and low queue performance, but Imho the gap is shrinking, and again, as it was earlier, the application where this write latency on NV storage would possibly matter is a fairly small market.
Not sure where you get your info on controllers, but Silicon motion already has the PCIe v5 product Adata XPG series. Phison, while showing some improvements already, it's not quite a massive upgrade, but it's typical for Phison to deliver generational performance improvements very slowly. In a few months, I expect a much improved PCIe v5 controller from them.
By “performance” I mean both latency and bandwidth. For bandwidth, I have done some testing with some DCPMs on block mode. Iometer tests may not be very intuitive but, here is a CDM result. One single DCPM with 512GB capacity, provides nearly 30GB/s of sequential read. 256GB sticks are said to be even better due to design limitations. There are some imperfections on write bandwidth(1.7-1.8GB/s), but considering that it only has 11 chips, and that it's only the first gen, it can definitely get better.
DCPM_CDM2.jpg

For Random latency, one of my friend has tested it. Here's where, in most cases, the performance was bottlenecked by the block-device emulation algorithm, FS and software, rather than hardware.
DCPM_latency.jpg

In fact, the first generation of DCPM has already decimated any PCI-e 5.0 SSDs to come in almost all aspects except sequential write, and now we are already looking at the third gen(DDR5-4400 CPS DCPM). The PCI-e version falls behind only because there's no pci-e 5.0 controller for now, but as you can see from DC P5800X, NAND can only compete when there's an outside limitation, like PCI-e bandwidth, that prevents 3DXP chips to unleash its full potential. NAND is chasing the speed of PCI-e bus, but Optane is limited by it.

As for SMI and Phison, they are mostly solution providers for desktop drives. What they care most about is to use SLC cache along with PCI-e 5.0 to boost benchmark scores and sell these drives to the unassuming computer fanboys. They never cared about latency, consistency, performance scaling or anything else a enterprise drive needs. They are simply not the same kind of manufacturer as those which make real enterprise drives.
 

_Robert

New Member
Dec 2, 2019
3
0
1
For bandwidth, I have done some testing with some DCPMs on block mode. Iometer tests may not be very intuitive but, here is a CDM result. One single DCPM with 512GB capacity, provides nearly 30GB/s of sequential read. 256GB sticks are said to be even better due to design limitations. There are some imperfections on write bandwidth(1.7-1.8GB/s), but considering that it only has 11 chips, and that it's only the first gen, it can definitely get better.
I do not understand how a sequential read of 30 GB/s from one Optane persistent memory module is possible.
The Pmem 300 data sheet lists a bandwidth at 100% read of 8.96 GB/s.

reference: https://www.servethehome.com/intel-...-pmem-100-pmem-200-and-pmem-300-optane-dimms/
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,320
800
113
I doubt that 30GB/s is possible, because one DDR4-2666 memory channel peaks out at 21333.33MB/s peak transfer rate, DDR4-3200 at 25600MB/s peak transfer rate... And as far as I know, Optane DPCMM is based on the DDR4 interface with extensions, so it can't overcome that limitation.
 
  • Like
Reactions: BoredSysadmin

111alan

Active Member
Mar 11, 2019
290
107
43
Haerbing Institution of Technology
I do not understand how a sequential read of 30 GB/s from one Optane persistent memory module is possible.
The Pmem 300 data sheet lists a bandwidth at 100% read of 8.96 GB/s.

reference: https://www.servethehome.com/intel-...-pmem-100-pmem-200-and-pmem-300-optane-dimms/
It only lists bandwidth for 256B RW, which is even smaller than 4KB. With some simple calculation we know that 8.96GB/s for 256B read equals to more than 37 million IOPS, in which case the CPU itself may be the bottleneck.

Although the pmem was based on DDR4 protocol, there may be a reason why this thing can't run on any platform other than Xeon Scalable. More specifically, first gen DCPM can only run on 2nd gen Xeon scalable and 2nd gen DCPM can only run on 3rd gen, no cross compatibility at all even for engineering samples. Intel's new IMCs may have some proprietary designes specifically for DCPM.

Edit: the max speed of gen3 DCPM is 10.5GB/s, for 128B blocks it's 88 million IOPS, which equals to 58 PM1735 SSDs.