Intel Optane PMEM 200 256GB DDR4 3200MHz $199

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

iraqigeek

Active Member
Sep 17, 2018
138
112
43
It's perfect for LLM use case. I'm using it right now. It's intended for any software that supports appdirect mode which inference engines like llamacpp supports. Super fast model loading and better latency when storing cache that doesn't stutter in the case of NVME offloading.
Do you mind sharing some details about how you're using it? I have four 256GB sticks sitting doing nothing and a dual Cascade Lake ES system (with six Mi50s) for LLM inference.
 

foureight84

Well-Known Member
Jun 26, 2018
458
386
63
Do you mind sharing some details about how you're using it? I have four 256GB sticks sitting doing nothing and a dual Cascade Lake ES system (with six Mi50s) for LLM inference.
I have the Optane modules set for AppDirect. This can be done in either the Bios or using impctl (depending in your distro, you might need to compile from source -- which I would recommend getting an AI agent to help since there's some complexity that isn't mentioned on the repo. It was difficult with my supermicro motherboard due to an external dependency--a package called edk2).

If you're using ipmctl then it's something like sudo ipmctl create -goal PersistentMemoryType=AppDirect .

You want to set your optane goal in the bios to use 100% of optane memory for appdirect (there's an appdirect max on no-suffix, M and L suffix CPUs. this cap was dropped in later gens).

Then use ndctl (you should be able to install this through your package manager). sudo ndctl create-namespace --mode=fsdax This turns the appdirect capacity to a DAX block device. Then format it to a DAX supported file extension, fs4, xfs and a few other. xfs will probably offer the maximum performance but I am just currently using fs4. (the block should show up as /dev/pmem0 if you run lsblk)

After this you can mount it and add it to your fstab for automount. Run ls -l /dev/disk/by-id and use the device id for fstab mounting.

The next step is to put your gguf on the DAX drive and use it with llamacpp. In my instance, I am using ik_llamacpp. I am seeing the models load within just a few seconds. You'll also want to put your llamacpp cache on there as well. This is super useful when you use llama-swap and now you can switch models for different usage scenarios from your llm agent and only have to wait a few seconds.

If you're running llamacpp or ik_llamacpp in docker, make sure to also passthrough the /dev/pmem0 device as well (I don't think it's necessary but I do it anyway).

You can also offload your docker data to the DAX drive as well.
sudo systemctl stop docker
Create a folder ex. /mnt/pmem0/docker-data
edit /etc/docker/daemon.json and add:

JavaScript:
{
  "data-root": "/mnt/pmem0/docker-data"
}
sudo rsync -aP /var/lib/docker/ /mnt/pmem0/docker-data
sudo systemctl start docker

If you're running a database on docker, instead of using docker volume mount, use bind mounting where the path for the database storage is bound to a folder on the DAX drive (e.g. /mnt/pmem0/postgres/data and docker run ... -v /mnt/pmem0/postgres/data:/data). Volume mounting uses docker's overlayfs which negates the benefits. Lastly, don't forget database specific configuration flags specifically for running with pmem.
 
Last edited:

richardm

Member
Sep 27, 2013
67
34
18
Before I retired we had a bunch of Optane RDIMMs for VMware on HP Gen10. ESXi can leverage these things quite nicely even if you don't have workloads designed to leverage PMEM.

In a nutshell ESXi would scan VMs for the most idle memory pages to migrate into Optane. It'd pull them back into normal DRAM were they to become active again. There's a bunch of new-ish vCenter stats that illustrate the effectiveness and depict what's going in behind the scenes.

It worked well for us. I think we had 512GB DRAM paired with 2TB Optane in each host. In our DC a random ESXi host might have 80-85% of the total memory pages marked as idle at any given time.

After Intel pulled the Optane plug, ESXi started doing this same thing with NVMe as a tech preview. I suspect VMware was looking to salvage/recycle their engineering investment.