New TrueNAS build - Slow Performance?

Kartright

New Member
Feb 16, 2022
4
1
3
Hi all,

I recently created a new NAS using TrueNAS to replace my old self-built hardware RAID-based NAS. The NAS is part of my homelab and consists of two datasets:
1) The first dataset contains media files for Plex
2) The second dataset contains application data for various applications. An example would be Nextcloud, so all user data managed by Nextcloud will be saved to this dataset.

The hardware I picked for the system is the following:

HP DL380 Gen9 12xLFF storage server
2x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (56 Threads)
128GB DDR4 ECC
LSI 9305-16i (Firmware: 16.00.12.00, IT mode)
1x Samsung Datacenter SSD PM893 240GB SATA 6G as boot device
12x Seagate Exos 16TB HDD SAS 12G for storage pool
2x Intel Optane SSD 900P as SLOG in storage pool

After I built up the system I installed TrueNAS-12.0-U8 and configured all 12 HDDs in a RaidZ3 to create the main NAS pool using both Intel Optane SSDs as SLOG.

I configured the ZFS pool in the following way:
ashift=12
sync=standard
compression=lz4
recordsize=128KiB
atime=off
exec=off
The rest is left default in TrueNAS.

I don't know if the performance I measured is expected. So I'm looking for advice, is the NAS really slow?
Here are my results for various block sizes using async and sync IO:

Async:
Code:
fio --filename=test --ioengine=posixaio --rw=randread --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --ioengine=posixaio --rw=read --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --ioengine=posixaio --rw=write --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
TestIOPSMB/s
4K QD4 rnd read39800163
4K QD4 rnd write1040042,5
4K QD4 seq read87200357
4K QD4 seq write57400235
64K QD4 rnd read342002242
64K QD4 rnd write9303610
64K QD4 seq read302001979
64K QD4 seq write12700831
1M QD4 rnd read54845751
1M QD4 rnd write741778
1M QD4 seq read57236002
1M QD4 seq write855897

Sync:
Code:
fio --filename=test --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --sync=1 --rw=read --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
TestIOPSMB/s
4K QD4 rnd read2010082,5
4K QD4 rnd write488520,0
4K QD4 seq read2650001087
4K QD4 seq write995940,8
64K QD4 rnd read170001113
64K QD4 rnd write3549233
64K QD4 seq read299001962
64K QD4 seq write4373287
1M QD4 rnd read19592055
1M QD4 rnd write634665
1M QD4 seq read18891981
1M QD4 seq write651683

I'm aware of the fact that the tests may not workaround any caches, so if you have further tests I could execute, please tell me. I do not have any performance results of a similar system, thats why I'm quite unsure if the measured performance is "good".

Thanks in advance!
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,189
1,536
113
Depending on how you want to use it I wouldnt say its slow? 5751 MB/s does not sound slow for a z3. O/c thats mostly cache, but if your filesizes are 10G only then thats fine (and a realistic test).

What do you think is slow?

O/c I dont see any use case where you'd need an slog unless Nextcloud works safer with a sync enabled pool?
 
  • Like
Reactions: T_Minus

Kartright

New Member
Feb 16, 2022
4
1
3
As I do not have a reference from other users in similar configurations or any experience with ZFS pools, I'm quite unsure if the performance is good. I never had a ZFS pool like this, I tried to examine best hardware, but I'm still not convinced if the performance is to be expected and what I can do to improve it (e.g. different layout, etc.). Maybe I'm wrong, but what you said is a good point in the right direction for me now. Anything to improve or tweak on the ZFS pool?

The main usecase of my storage pool is for Plex media and for application data. For example a movie has a file size of 4 to 16 GB, so as far as I know this is written sequentially, isn't it?

Initially, I didn't include a SLOG and used a HP 840ar Raid Controller in HBA mode. I was told to swap it with the LSI 9305-16i and flash it to IT mode, to get a true HBA and a much better performance. Still, I could not verify where the better performance relies, as it seems to me the LSI and the HP are almost equal (very interesting, because everyone says you need a true HBA).
I included the SLOG after that, to make things more safe in case of failure and I thought it could improve my sync write speeds, but it has less or any effect (sounds reasonable as a SLOG is not a write cache). So this is why I have a SLOG now.

Although I may or may not need the SLOG, its good to have it for e.g. Backups of Application data, as Backups should be written sync. So when I backup my Nextcloud instance or save a backup of Gitlab I can ensure that I have the file written correctly. If I will mount the pool to Nextcloud, sync is not a requirement as far as I know.
 

Rand__

Well-Known Member
Mar 6, 2014
6,189
1,536
113
box performance is good enough is performance is good enough for you;)
So, just run what you want to run and try to finde out if its fast enough for you, or if there is something thats much slower than you would have expected.

O/c there are tweaks (buffer sizes etc) but usually they don't do much unless something is really out of line (like they used to have 1g optimized buffers that were shitty on 10g cards, but thats been a while).

O/c it can be discussed if you need a Z3, or if a 13 drive z3 is good and if 2 6 drive z2's + a hot spare might be better, but then we come back to the first point :)
 

Kartright

New Member
Feb 16, 2022
4
1
3
I understand. So I should leave it as it is now and be happy? :D

Unfortuantely I do not have any additional slot free, so I cannot go with a hotspare. My hotspare is currently in safe place to be ready when needed :)
 

Rand__

Well-Known Member
Mar 6, 2014
6,189
1,536
113
if you have 13 drives in a z3 and you switch to 2x6 drives u only use 12 slots so the #13 is unused. You loose 2 disks worth of space and trade it for a second mirror improving write performance.

If you need that, or how much the impact for your particular use case is - is something you could find out if you want to experiment more.

Else it sounds fast enough doesnt it? :)
Again - if you have something where it feels slow, that can be investigated, but just asking Is it slow - or why is it slow is not gonna get you moved forward.
Finding someone with a similar setup might help of course, but thats not me;)
 

Kartright

New Member
Feb 16, 2022
4
1
3
Many thanks for your response.

I took your advice and experimented around. I swap the LSI again back to the HP 840ar controller and could verify, that the LSI 9305-16i HBA is definitly faster. I understand now, why everyone recommends it.

I also tested different layouts using my 12 disks and my SLOG devices:
1) 12 disks in a single RAIDz2 vdev
2) 6 disks in two RAIDz2 vdevs in same pool

The results between RAIDz2 and the original RAIDz3 are equal, the only difference is that I loose only 2 disks instead of 3 (more usable capacity left). Two vdevs each 6 disks in RAIDz2 resulted in double IOPS (as expected), but little slower sequential speeds and 4 drives lost in total.

Additionally I changed the place where the LSI HBA is located. Before testing the PCI device order in the server was the following:
1) Intel SSD 900P
2) Intel SSD 900P
3) LSI HBA

I changed to the following order:
1) LSI HBA
2) Intel SSD 900P
3) Intel SSD 900P

This change definitly changed something as the performance increased. The system configuration did not change, so the only thing I can imagine of is the LSI runs much cooler now due to improved airflow (as it is on the top now)

These are the "new" results I got:
Async:
Code:
fio --filename=test --ioengine=posixaio --rw=randread --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --ioengine=posixaio --rw=read --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --ioengine=posixaio --rw=write --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
TestIOPS (old)MB/s (old)IOPS (new)MB/s (new)
4K QD4 rnd read3980016339800163
4K QD4 rnd write1040042,51760072,2
4K QD4 seq read8720035787000356
4K QD4 seq write5740023557400235
64K QD4 rnd read342002242333002181
64K QD4 rnd write9303610159001041
64K QD4 seq read302001979301001974
64K QD4 seq write12700831204001337
1M QD4 rnd read5484575173777736
1M QD4 rnd write74177817151799
1M QD4 seq read5723600271507498
1M QD4 seq write85589718991991

As you can see, the performance increased. Sync IO also changed to higher speeds.

I think I can leave it now :)
 
  • Like
Reactions: cesmith9999

Rand__

Well-Known Member
Mar 6, 2014
6,189
1,536
113
Airflow usually does not impact performance much (unless we are talking about m2's and throttling), more likely the old slot was somehow bandwith limited or is now closer to the cpu and such less latency...

But glad you got a satisfactory setup now , enjoy:)