I've been wrestling with insanity on this one.
I'm trying to find the optimum configuration for raw performance for 3x M.2 NVMe SSDs on my Proxmox server to be used for VM images and containers. For the life of me, I cannot understand the performance numbers I'm getting for various zfs and mdadm configs I have tried.
Perf testing:
I'm using fio for testing, with the individual tests configured to be somewhat like CrystalDisk default tests. In retrospect I wish I had added 64k r/w, but I'm not restarting my testing from scratch. I have also used CrystalDisk from inside a windows VM in a few instances. All perf tests were done on the host other than a couple purposely done in-VM.
Issue 1 ZFS)
Putting all 3 NVMe's in either a striped/raid0/aka 3x single vdev ZFS pool, or a putting all 3 nvme's in a raidz1 ZFS pool, or all in a 3 drive mirror pool, yields perf numbers at or below the single disk performance of the NVMe drives. I tested with many zfs recordsize values, and the performance of the pool is always on par or less than a single drive with the exception of 4k rand read/write perf tests which I don't really care about, and the magical 512k r/w test that for some reason tends to perform better than other perf tests, but still nowhere near the perf I would expect. In all variations, performance tests of NVMe pool are on par or worse than my 4 drive SATA raidz1 ZFS pool running on the same machine.
WTF?
Issue 2 Mdadm)
Putting all 3 NVMe's in a mdadm raid0 array gets perf numbers I would expect, roughly 2.5x the single drive perf. Using a raid10 array also yield perf that I would expect for that configuration. But when I move a VM's Qcow2 to the md array, the perf numbers from running CrystalDisk from within the VM gives perf number not very different than running the VM on my spinny disk array, although boot is faster and apps start faster so io latency seems much better, but not getting the overal BW perf I'm expecting, the perf of the NVMe array as seen by the VM is not much better than spinny disk overall io throughput. Using PCIe passthrough and running CrystalDisk inside the VM for a "native" nvme gets perf numbers roughly on par with single disk perf numbers obtained on the host system with fio.
I'm mostly concerned with "Issue 1 ZFS". I want to run my NVMe array uncovered at max perf, but do regular backups of it to my spinny raidz1, so I'd like to use ZFS for block level snapshot/backup, but not with the bizarre performance I'm seeing right now.
For "Issue 2 Mdadm", I'm scratching my head due to the massive performance drop seen inside the VM compared to host perf tests. Obviously it will be less, much less, but on the host I get about 5 GB/s and only seeing about 500MB/s inside the VM when that VM is the only thing touching that disk array, the qcow2 literally the only image file on the drive.
Can anyone shed some light on what is going on with these numbers, or should I see if Amazon can overnight a straight jacket? thanks!
[EDIT]
Solved Issue #2 "Poor perf seen from inside VM". Solved by making sure guest had Virtio drivers installed, and changing HDD emulation settings in Proxmox to virtio, now seeing inside-VM io perf on par with host side perf ~8GB/s seq read with virtual drive on host mdadm raid0 nvme array.
I'm trying to find the optimum configuration for raw performance for 3x M.2 NVMe SSDs on my Proxmox server to be used for VM images and containers. For the life of me, I cannot understand the performance numbers I'm getting for various zfs and mdadm configs I have tried.
Perf testing:
I'm using fio for testing, with the individual tests configured to be somewhat like CrystalDisk default tests. In retrospect I wish I had added 64k r/w, but I'm not restarting my testing from scratch. I have also used CrystalDisk from inside a windows VM in a few instances. All perf tests were done on the host other than a couple purposely done in-VM.
Issue 1 ZFS)
Putting all 3 NVMe's in either a striped/raid0/aka 3x single vdev ZFS pool, or a putting all 3 nvme's in a raidz1 ZFS pool, or all in a 3 drive mirror pool, yields perf numbers at or below the single disk performance of the NVMe drives. I tested with many zfs recordsize values, and the performance of the pool is always on par or less than a single drive with the exception of 4k rand read/write perf tests which I don't really care about, and the magical 512k r/w test that for some reason tends to perform better than other perf tests, but still nowhere near the perf I would expect. In all variations, performance tests of NVMe pool are on par or worse than my 4 drive SATA raidz1 ZFS pool running on the same machine.
WTF?
Issue 2 Mdadm)
Putting all 3 NVMe's in a mdadm raid0 array gets perf numbers I would expect, roughly 2.5x the single drive perf. Using a raid10 array also yield perf that I would expect for that configuration. But when I move a VM's Qcow2 to the md array, the perf numbers from running CrystalDisk from within the VM gives perf number not very different than running the VM on my spinny disk array, although boot is faster and apps start faster so io latency seems much better, but not getting the overal BW perf I'm expecting, the perf of the NVMe array as seen by the VM is not much better than spinny disk overall io throughput. Using PCIe passthrough and running CrystalDisk inside the VM for a "native" nvme gets perf numbers roughly on par with single disk perf numbers obtained on the host system with fio.
I'm mostly concerned with "Issue 1 ZFS". I want to run my NVMe array uncovered at max perf, but do regular backups of it to my spinny raidz1, so I'd like to use ZFS for block level snapshot/backup, but not with the bizarre performance I'm seeing right now.
For "Issue 2 Mdadm", I'm scratching my head due to the massive performance drop seen inside the VM compared to host perf tests. Obviously it will be less, much less, but on the host I get about 5 GB/s and only seeing about 500MB/s inside the VM when that VM is the only thing touching that disk array, the qcow2 literally the only image file on the drive.
Can anyone shed some light on what is going on with these numbers, or should I see if Amazon can overnight a straight jacket? thanks!
[EDIT]
Solved Issue #2 "Poor perf seen from inside VM". Solved by making sure guest had Virtio drivers installed, and changing HDD emulation settings in Proxmox to virtio, now seeing inside-VM io perf on par with host side perf ~8GB/s seq read with virtual drive on host mdadm raid0 nvme array.
Last edited: