Disk layout vs performance vs record size

Railgun · Mar 14, 2022

I hope I'm not conflating too many things here, but since they're all interrelated, perhaps I'm not.

I'm moving from a 15x2 VDEV mirror with 3TB Hitachi disks (bare metal, E5-2687W with 64GB RAM, multiple 9207-8i HBAs) to either a 11x2VDEV mirror or 7x3 RaidZ1 of of WD60EFZX on a VM behind an EPYC 7282 (8 cores) with 128GB RAM and a 9305-24i passed through.

The capacity advantage of the latter is nice, but I'm interested primarily in two things.

Here's the current performance. (TrueNAS Core)

Code:

Write speed

dd if=/dev/zero of=/mnt/Test/tmp.zero bs=2048k count=50k
107374182400 bytes transferred in 81.991915 secs (1,309,570,362 bytes/sec)


Read speed

dd if=/mnt/Test/tmp.zero of=/dev/null bs=2048k count=50k
107374182400 bytes transferred in 81.675046 secs (1,314,651,018 bytes/sec)

And here's the difference of the two with both topologies... (TrueNAS Scale)

Code:

11x2 VDEV Mirrors
59.78TiB capacity

Write
dd if=/dev/zero of=/mnt/Test/tmp.zero bs=2048k count=50k
107374182400 bytes (107 GB, 100 GiB) copied, 64.7347 s, 1.7 GB/s

Read
dd if=/mnt/Test/tmp.zero of=/dev/null bs=2048k count=50k
107374182400 bytes (107 GB, 100 GiB) copied, 104.433 s, 1.0 GB/s

----------------------------------------------------------------

7x3 VDEV RaidZ1
76.13TiB capacity

Write
107374182400 bytes (107 GB, 100 GiB) copied, 62.4074 s, 1.7 GB/s

Read
107374182400 bytes (107 GB, 100 GiB) copied, 104.742 s, 1.0 GB/s

----------------------------------------------------------------

22 VDEV Stripe
119.56TiB capacity

Write
107374182400 bytes (107 GB, 100 GiB) copied, 35.03 s, 3.1 GB/s

Read
107374182400 bytes (107 GB, 100 GiB) copied, 98.1277 s, 1.1 GB/s

While I appreciate there's a difference between BM and VM, I'd not expect virtually the same performance between the set of mirrors vs the Z1 setup. Although the disks are somewhat similar in performance, the WDs should out run the Hitachi's a bit. Additionally, I'm somewhat surprised at the read performance of the 22 disk stripe. I'm guessing this is a VM limitation. Unfortunately I'd not tried in a bare metal setup, but I can't imagine that in and of itself is a bottleneck. That's to say somewhere there's a bottleneck I think.

Both pools have a 128k record size. And that leads me into the next question.

After the testing above, I created my media dataset with a 1M record size. Most of this is lossless music, and various movies, all well beyond 1M in file sizes. There are a few raw disk backups which will have smaller files undoubtedly, but this is a temporary location until they go to cold storage. I'd been playing with the idea of leveraging SSDs for metadata to possibly speed things up a bit. Looking at the histogram, I have no idea what this really tells me interms of what I should set the size to be...

Code:

Block Size Histogram

  block   psize                lsize                asize
   size   Count   Size   Cum.  Count   Size   Cum.  Count   Size   Cum.
    512:  10.7K  5.36M  5.36M  10.7K  5.36M  5.36M      0      0      0
     1K:  36.7K  40.5M  45.9M  36.7K  40.5M  45.9M      0      0      0
     2K:  14.9K  39.5M  85.4M  14.9K  39.5M  85.4M      0      0      0
     4K:   277K  1.09G  1.17G  10.6K  59.0M   144M      0      0      0
     8K:   171K  1.62G  2.79G  15.7K   178M   322M   177K  1.38G  1.38G
    16K:   186K  4.03G  6.82G  28.8K   555M   878M   320K  5.81G  7.20G
    32K:   330K  14.7G  21.5G  18.1K   812M  1.65G   249K  10.4G  17.6G
    64K:  1.59M   156G   177G  12.4K  1.04G  2.69G   453K  41.8G  59.5G
   128K:  36.4M  4.55T  4.72T  38.8M  4.85T  4.85T  37.8M  7.03T  7.09T
   256K:  5.71K  2.14G  4.72T    131  44.1M  4.85T  3.99K  1.46G  7.09T
   512K:   347K   287G  5.00T    109  74.0M  4.85T  10.8K  8.56G  7.10T
     1M:  22.9M  22.9T  27.9T  23.3M  23.3T  28.1T  23.2M  34.8T  41.9T
     2M:      0      0  27.9T      0      0  28.1T      0      0  41.9T
     4M:      0      0  27.9T      0      0  28.1T      0      0  41.9T
     8M:      0      0  27.9T      0      0  28.1T      0      0  41.9T
    16M:      0      0  27.9T      0      0  28.1T      0      0  41.9T

And lastly, after a reboot of the VM, my read dropped considerably:

(107 GB, 100 GiB) copied, 145.16 s, 740 MB/s

So to sum up...

1) Regarding the similar reads across the different topologies, is this expected in a VM? Something seems off here.
2) Any guidance for a metadata disk would be appreciated.

gigatexal · Mar 16, 2022

I can’t speak authoritatively about layout but I can suggest using fio instead of dd for testing. I use this guide for it: Fio - Flexible I/O Tester Synthetic Benchmark

Search

Disk layout vs performance vs record size

Railgun

Active Member

gigatexal

I'm here to learn