ZFS NVMe performances questions

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Don't know with which consumer drives you tested, but I can assure you after hours of testing, the 990 pro don't fell flat at all. I did a more than 1 hour test with TBs of data and never got a single drop of performance on these drives.
If you're referring to write performances, I already know that I got around 1,4GB/s outside their internal cache, so it's not a surprise.
I tested each drive alone, and got a steady ~6GB/s ready and ~1,4GB/s write on the whole drive.
In a mirror pool, I got ~12GB/s read and ~2,3GB/s write.
In case you're wondering, I got these results with compression off.
I wasn't measuring sequential writes, but random 4K writes.
Slow\on-their face is relative, my testing was years ago, so it's likely your recent tests were > than mine, but still < enterprise rated drives.
Sequential transfer never was a concern though, only 4K writes and\or mixed 4K workload really...

How's 4K write and\or mixed work-load once steady-state is reached on the 990 PRO?
I'm still considering consumer NVME for home capacity NAS due to their LOW idle power vs Enterprise, but SATA Enterprise are slotting in here already :D :D which is NICE, because sata was always soooo much more $$ prior
 

jgswinkels

New Member
Oct 14, 2023
1
0
1
Be aware of the limitations of your PCIe components of your motherboard.
see specs of the board:
Hard disk bus (internal) M.2 (PCI-e 4.0 x4 + SATA), 2x M.2 (PCI-e 4.0 x4), 8x SATA-600
M2 PCI-e is only x4 !!
Check also the PCI-e extention specifications / usage.


What is the maximum bandwidth of PCIe Gen 4?
To understand the maximum bandwidth of a PCIe Gen 4 device, you must know the number of PCIe lanes that it supports. PCIe devices use “lanes” for transmitting and receiving data, so the more lanes a PCIe device can use, the greater the bandwidth can be. The number of lanes that a PCIe device supports is typically expressed like “x4” for 4 lanes, “x8” for 8 lanes, and so on.

x1x2x4x8x16
PCIe Gen 3 bandwidth1 GB/s2 GB/s4 GB/s8 GB/s16 GB/s
PCIe Gen 4 bandwidth2 GB/s4 GB/s8 GB/s16 GB/s32 GB/s

2 x M2 PCI-e 4.0 x4 disks in Stripe = max 2 * 8GB/s = max 16GB


some PCI-e connectors share internal M2 sockets and reduces the used lanes.. be aware of that.
PCIe gen4 x16 to single nvme adapters. ( what kind of PCIe card is this? )
In which PCIe slot installed ?
 

RonanR

Member
Jul 27, 2018
47
2
8
Be aware of the limitations of your PCIe components of your motherboard.
see specs of the board:
In which PCIe slot installed ?
I'm well aware of that, no worries.
M2 PCIe is PCIe 4x4 per slot, so 8GB/s per slots, which is more than enough. With 3 slots it's 24GB/s (in theory).
If you add an additional NVMe plugged in a PCIe 4 16x slot, it's now 4 NVMe, each working at PCIe x4, so 4x8 = 32GB/s max. This NVMe is announced at 7GB/s, but real test gives me a steady 6GB/s per NVMe, as stated in my initial post. With 2 NVMe drives in strip or mirror, I got 12GB/s.
If you read my first post, you will see that I can create 2 pools of 2 NVMe each (one with 2 on M2 slots, and one with 1 M2 slots and one on a PCIe card), and each pool got 12GB/s. So it's not a PCIe link degradation problem.
 

CyklonDX

Well-Known Member
Nov 8, 2022
848
279
63
The read numbers are likely from your memory | to correctly troubleshoot and see what happens i recommend setting up influx with grafana and visualizing all components like disk throughput, zfs stats, and memory crap.

More vdevs gives you more load spreading.
vdev has to wait until each drive has finished; so the limiting factor is (in the worst case) always the slowest drive. *This is great oversimplification of things - there are lot of caveats in there.


This is a good read
 

bvansomeren

New Member
Jul 19, 2016
1
0
1
Sorry if I'm raising an old thread but I've ran into this problem as well with ZFS before on Linux.
You may be interested in this mergerequest from ZoL: Direct IO Support by bwatkinson · Pull Request #10018 · openzfs/zfs
This is because of issue: NVMe Read Performance Issues with ZFS (submit_bio to io_schedule) · Issue #8381 · openzfs/zfs

In short, the data is being copied back and forth between the ARC and being bottlenecked, if you run the same test on the same hardware with XFS you'll probably see numbers closer to what you'd expect.

Eventually they will merge this fix and you'll be able to bypass some of the ARC on these very fast setups
 

Psmitty88

New Member
Nov 8, 2023
1
0
1
I'm well aware of that, no worries.
M2 PCIe is PCIe 4x4 per slot, so 8GB/s per slots, which is more than enough. With 3 slots it's 24GB/s (in theory).
If you add an additional NVMe plugged in a PCIe 4 16x slot, it's now 4 NVMe, each working at PCIe x4, so 4x8 = 32GB/s max. This NVMe is announced at 7GB/s, but real test gives me a steady 6GB/s per NVMe, as stated in my initial post. With 2 NVMe drives in strip or mirror, I got 12GB/s.
If you read my first post, you will see that I can create 2 pools of 2 NVMe each (one with 2 on M2 slots, and one with 1 M2 slots and one on a PCIe card), and each pool got 12GB/s. So it's not a PCIe link degradation problem.
Welcome to "Combined PCIe lanes" you can make the x16 go slower but you can never make an x4 go faster. Let me simplify it...You're on the wrong (PCIe) BUS...

TLDR solutions Not enough PCIe lanes on the chipset for the amount of storage drives attached directly to the motherboard. Move all pool drives to PCIe x16 card of some type.


The TRX40 you have only has 16 PCIe lanes on the chipset and that's what the on board m.2 slots use exclusively.
The M.2 slots are all capable of PCIe 4.0x4 but only under specific conditions.

That motherboard only has 16 total PCIe lanes on the board. 8 reserved for the chipset and 8 lanes dedicated to the m.2 and SATA III ports.
You get 4 lanes on the bottom slot only if there's nothing detected in POST on the SATA controller.
And/Or
You get 4 lanes in either the top or the middle m.2 slot when only one or the other is populated, if both are populated then its 2 lanes each.

The long version...links sources ect at bottom. If you think I'm full of crap watch the LTT video.

The M.2 slots on your MB only have 8 lanes between themselves and the SATA III controller.
Allocation as follows..

'2 x Hyper M.2 Sockets (M2_1 and M2_2), support M Key type 2260/2280 M.2 PCI Express module up to Gen4x4 (64 Gb/s)*'
this is gigabits not bytes, someone will inevitably bring that up. 8GB/s max per m.2 slot but not in every slot all at once.

These two slots get 2 lanes each, it may be 4 lanes if only one is populated. Most likely un-effected by adding other devices, though your mileage may vary.


'1 x Hyper M.2 Socket (M2_3), supports M Key type 2230/2242/2260/2280/22110 M.2 SATA3 6.0 Gb/s module and M.2 PCI Express module up to Gen4x4 (64 Gb/s)'


This slot, oddly enough is the bottom one and as with many many others the bottom generally is the "lowest priority" M.2 slot, is sharing its PCIe 4.0 x4 with the MB SATA III ports. Once the bios detects something on the SATA bus its going to split the bottom slot into 2 @ PCIe 4.0x2, one for the SATA controller and one for the NVME drive.

All 3 m.2 slots will support SATA m.2 drives but only the the top two will boot from an NVME m.2.

Not 100% on why or how but it feels like splitting the drives between a PCIe card and the m.2 slots on the motherboard, from what I can gather causes the PCIe bus to pseudo hand that slot off to the chipset for management or maybe it just matches the chipset Gen and link width I haven't been able to get a straight answer on this one. This cannot work the other way around thought do the the physical limitations of the M.2 key(s) standard only being 4 lane pinout.

The only ways I know to pass the 12GB/s is to put all of the drives in the pool on a single PCIe 4.0X16 slot and boot from the motherboard m.2 slot with an NVME drive not in the pool. You can still boot from an M.2 NVME but don't pool it with the other drives. This puts the entire pool on the CPUs PCIe lanes.

The other option...boot from a PCIE card and run ONLY the M2_1 and M2_3 slots. Leave M2_2 empty as well as leaving the motherboard SATA empty. Then you get 2 x M.2 slots @PCIe 4.0x4 totaling 16GB/s in theory. Populating the top two slots is how you end up back at 8GB/s (6GB/s in your test from the looks of it)

I would use the slot 1 PCIe 4.0x16 for the NVME m.2s in a raid card(unless GPU is needed at x16 link) and slot 3 for the next most important item. Leave the board m.2 spots for a boot drive and maybe a Cache if it'll let you. I know TrueNAS doesn't like mixing CPU and Chipset lanes in the same pools. It looks fine, doesn't throw errors but it straight up ignored the 2tb NVME cache drive on the motherboard and wrote directly to the HHDs in the pool when I tried a few weeks back.

Additionally your 4 @ PCIe 4.0x16 slots are technically two 4.0x16s and two 4.0x8s due to slot 2 and slot 4 splitting an x16 set when all 4 slots are populated. So like, in theory you could run 4xGPUs but two would be cut down to 8 lanes each while the other two were free to run at x16.


'- 4 x PCI Express 4.0 x16 Slots (PCIE1/PCIE2/PCIE3/PCIE4:

single at x16 (PCIE1);

dual at x16 (PCIE1) / x16 (PCIE3);

triple at x16 (PCIE1) / x16 (PCIE3) / x8 (PCIE4);

quad at x16 (PCIE1) / x8 (PCIE2) / x16 (PCIE3) / x8 (PCIE4))* '



Sources & Tools used to reach these conclusions.



LinusTechTips has a great video about chipset vs CPU lanes here >>

Me who added an m.2 drive to a system with 1 m.2 and 8 SATA SSDs plugged directly into the motherboard and couldn't figure out why the bios would only see the M.2 drive in the top m.2 slot.
 

zachj

Active Member
Apr 17, 2019
159
104
43
Pcie land sharing is also what I suspected as root cause…

I wouldn’t run any zfs components through chipset lanes if I didn’t have to, and if I did it would NOT be ZIL or metadata devices.