Unexpectedly slow SSD pools using Proxmox 8.3 > TrueNAS SCALE 23.10

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Prophes0r

New Member
Sep 23, 2023
28
18
3
East Coast, USA
System
  • Host - Proxmox 8.3.1
    • EPYC 7302p
    • Supermicro H11SSL-i
    • 128GB ECC DDR4 3200
    • 118GB Optane system drive
  • VM - TrueNAS SCALE 23.10
    • 8 cores
    • 64GB RAM
    • Passthrough - EPYC SATA #1 - HDDs
    • Passthrough - NVMe drives
  • Drives
    • HDD
      • 10TB HGST 7200RPM SATA - HUH721010ALE601
    • SSD
      • 1TB Samsung 970 Evo Plus
      • 118GB Optane (Bonus)
Problem
Unexpectedly slow SSD pool.​
I initially created the following pools in TrueNAS with the default settings.​
  • z1 pool with the SATA HDDs (7+1).
  • z1 pool with the flash SSDs (3+1).
Then I ran Tom Lawrence's fio script directly on the TrueNAS vm with the following settings.​
  • 128kb block size (The default block size for TrueNAS VDEVs)
  • IODEPTH = 1 (Most of my use will be queue depth 1)
  • DIRECT = 1 (Shouldn't buffer to RAM)
  • NUMJOBS = 4
  • FSYNC = 0 (Let Linux control flushing)
  • NUMFILES = 4
  • FILESIZE = 1G
I was surprised when the HDD pool beat the SSD pool in every test.​
I then decided to get more data, so I recreated the HDD z1 pool with 3+1 drives to compare apples to apples.​
I also created a z1 (3+1) pool of 118GB Optane SSDs to since I had them on hand.​
Then I destroyed the pools and created a 2x2 mirror with each drive type.​
Then I did it again with a single drive of each type.​
Test Results

HDD
Single
Test 1
HDD
Single
Test 2
HDD
Mirror (2x2)
Test 1
HDD
Mirror (2x2)
Test 2
HDD
Z1 (3+1)
Test 1
HDD
Z1 (3+1)
Test 2
Flash
Single
Test 1
Flash
Single
Test 2
Flash
Mirror (2x2)
Test 1
Flash
Mirror (2x2)
Test 2
Flash
Z1 (3+1)
Test 1
Flash
Z1 (3+1)
Test 2
Optane
Single
Test 1
Optane
Single
Test 2
Optane
Mirror (2x2)
Test 1
Optane
Mirror (2x2)
Test 2
Optane
Z1 (3+1)
Test 1
Optane
Z1 (3+1)
Test 2
RandWrite​
Avg Write IOPS​
30,304​
31,379​
54,817​
54,991​
54,859​
53,378​
57,790​
55,293​
52,530​
46,468​
45,399​
53,605​
66,150​
55,342​
91,207​
98,435​
49,305​
60,700​
Avg Write Bandwidth (MB/s)​
3,788​
3,922​
6,852​
6,874​
6,857​
6,672​
7,224​
6,912​
6,566​
5,809​
5,675​
6,701​
8,269​
6,918​
11,401​
12,304​
6,163​
7,588​
RandRead​
Avg Read IOPS​
189,586​
187,055​
129,109​
125,094​
132,208​
130,306​
130,085​
114,845​
97,178​
116,528​
123,406​
108,696​
125,343​
133,103​
120,084​
118,757​
116,429​
114,938​
Avg Read Bandwidth (MB/s)​
23,698​
23,382​
16,139​
15,637​
16,526​
16,288​
16,261​
14,356​
12,147​
14,566​
15,426​
13,587​
15,668​
16,638​
15,011​
14,845​
14,554​
14,367​
SeqWrite​
Avg Write IOPS​
10,581​
9,996​
38,740​
41,158​
45,501​
51,877​
68,529​
63,708​
59,413​
56,299​
69,756​
61,142​
72,704​
82,168​
66,495​
73,178​
57,941​
59,824​
Avg Write Bandwidth (MB/s)​
1,323​
1,250​
4,843​
5,145​
5,688​
6,485​
8,566​
7,963​
7,427​
7,037​
8,720​
7,643​
9,088​
10,271​
8,312​
9,147​
7,243​
7,478​
SeqRead​
Avg Read IOPS​
105,757​
108,675​
129,736​
132,685​
135,717​
125,864​
91,569​
86,329​
90,618​
85,153​
95,559​
86,721​
85,194​
86,805​
84,785​
84,331​
93,551​
93,820​
Avg Read Bandwidth (MB/s)​
13,220​
13,584​
16,217​
16,586​
16,965​
15,733​
11,446​
10,791​
11,327​
10,644​
11,945​
10,840​
10,649​
10,851​
10,598​
10,541​
11,694​
11,727​
ReadWrite​
Avg Read IOPS​
16,262​
17,511​
55,241​
51,071​
67,637​
62,805​
57,773​
59,744​
56,200​
55,578​
52,114​
61,671​
57,690​
59,669​
54,201​
54,797​
46,795​
50,756​
Avg Write IOPS​
16,727​
18,012​
56,821​
52,531​
69,571​
64,601​
59,426​
61,453​
57,807​
57,168​
53,605​
63,434​
59,340​
61,376​
55,751​
56,365​
48,134​
52,208​
Avg Read Bandwidth (MB/s)​
2,033​
2,189​
6,905​
6,384​
8,455​
7,851​
7,222​
7,468​
7,025​
6,947​
6,514​
7,709​
7,211​
7,459​
6,775​
6,850​
5,849​
6,345​
Avg Write Bandwidth (MB/s)​
2,091​
2,251​
7,103​
6,566​
8,696​
8,075​
7,428​
7,682​
7,226​
7,146​
6,701​
7,929​
7,418​
7,672​
6,969​
7,046​
6,017​
6,526​


There is definitely something wrong with the write numbers in the script.​
But the problem should scale across each pool, and it doesn't.​
The single drive pool numbers are closer to what I expect. But even these are off.​
Single Samsung drive only twice the RandWrite speed vs HDD?​
RandRead substantially lower for both NVMe drives vs HDD?​

Anyone got any ideas?

I guess I'll recreate the pools, connect to them to a Windows VM, and try CrystalDisk to see if those numbers make sense?
Probably disable ARC to make sure it's not caching to RAM?
 

zack$

Well-Known Member
Aug 16, 2018
715
347
63
Have you followed this?: Yes, You Can (Still) Virtualize TrueNAS

Many moons ago FreeNAS only reliably supported virtualisation on ESXI as other hypervisors (i.e. proxmox) did not completely make drives pass-thru to the VM etc. I personally tested this with a failed drive that would show up on bare metal as failed but, when passed through proxmox, showed no sign of failure. That spooked me enough to never want to run FreeNAS (now TrueNAS) on anything other than ESXI.

However, based on the blog post cited above, TrueNAS/Scale seems to be now officially supported by proxmox. That being the case, following their latest guide on virtualisation should help alleviate any problems or, it could be that proxmox 8.1 could have broken something that previously worked well.

Hope this helps.
 

SnJ9MX

Active Member
Jul 18, 2019
130
83
28
You believe that your HDD pool can do 30k IOPS at 3.7GB/s? HDDs generally don't do more than a few hundred IOPS. Let's say 300 for math purposes: Does your pool contain 30000 IOPS / 300 IOPS per HDD = 100 HDDs?

And geez 189k read IOPS @ 24GB/s - you're dreaming if you think a HDD pool of any reasonable size can deliver that.

I am 99% sure you're benchmarking your memory.

The title of your post should be "unexpectedly fast HDD pools"
 

Prophes0r

New Member
Sep 23, 2023
28
18
3
East Coast, USA
Update:
There was clearly an issue with write caching muddying the numbers.​
So, I tried a few things to get rid of that problem.​
  • Restrict the VM RAM to try eliminating write caching. (Drives still have their own cache but I WANT those numbers)
    • 4GB still seemed to cache a LOT. And it crashed eventually.
    • 3GB gave more believable write numbers for the HDDs but didn't change writes for the SSDs. Crashed much sooner.
    • 2GB crashed immediately for the HDDs. SSDs still WAY too slow.
  • set arc_max to 1MB to try eliminating read caching numbers.
  • I disabled swapping.
  • Sync writes?
Doing the tests with sync writes gave MUCH more believable numbers...for writes...on single drives.​
It seems wrong to me that a stripe with 1 drive writes faster than a 2x2 striped mirror, but it's about 20% slower.​
I spun up a Windows VM and set up SMB shares for the 3 pools.​
Then ran CrystalDiskMark tests with the Real World Performance profile and Read&Write option.​
Annoyingly, the Windows VirtIO network driver is limited to 10Gbit, and seems to actually obey that limit.​
Even with the networking limit, the numbers are still weird.​
1MB-Q1T1 Sequential Writes of 400MB/s for 3+1 z1 Optane?​
14k µs Random 4k write latency for the z1 HDDs?​
Speaking of latency, I'm flabbergasted that the Optane drives are being beaten by the NAND ones.​
When I tested the drives on their own I was getting ~25µs NAND vs ~4µs Optane.​
I expected all the overhead from zfs and networking to make this worse.​
But I didn't expect to be seeing 250µs NAND vs 300µs Optane.​
I'm sure this is all down to configuration.​
I'm equally sure that I don't have enough experience to understand where the issues are.​
Does anyone know any specific tests/settings to try to identify issues?​
This is a fresh system that isn't being used yet so I can create/destroy as-needed, but I would eventually like to nail this down so I can move the stuff off my old NAS/VM-Host and retire it.​