Performance troubleshooting.

modder man · Aug 19, 2018

Hello all,

Looking for some thoughts on some odd array performance I have seen when building out a new box. I am running ZoL as a VM in ESXi currently with the array controller passed through. The VM has 16 CPU's and 24GB of memory. I recognize 24GB is not enough for an optimal setup but I was simply testing out different array configs and how that affected performance. The first test was with and 8disk z2 configuration, I then added two stripes to that to make and array with 24 disks in 3 Z2's.

8 disks in a Z2

virtadmin@ubuntu:/storage$ sudo fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test2 --filename=test --bs=128k --iodepth=1 --size=384G --readwrite=write --rwmixread=2
test2: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K, ioengine=libaio, iodepth=1
fio-2.2.10
Starting 1 process
Jobs: 1 (f=0): [W(1)] [100.0% done] [0KB/448.0MB/0KB /s] [0/3584/0 iops] [eta 00m:00s]
test2: (groupid=0, jobs=1): err= 0: pid=6784: Sun Aug 12 11:57:58 2018
write: io=393216MB, bw=433290KB/s, iops=3385, runt=929292msec
cpu : usr=1.61%, sys=21.90%, ctx=82079, majf=0, minf=491
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=3145728/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: io=393216MB, aggrb=433290KB/s, minb=433290KB/s, maxb=433290KB/s, mint=929292msec, maxt=929292msec

And 24 disks in a striped z2 were only able to manage 300MB/s write with the same benchmark

gea · Aug 20, 2018

I would try to identify possibilities and rule then out based on propability against conservative performance expectations example

Expectation
ZFS performance scale with number of datadisks, their quality, RAM, fillrate and iops performance in a Raid setup. Calculate 100 MB/s as a lower limit per datadisk. This means that a raid-Z2 pool of example 8 disks (6 datadisks, empty pool, RAM > 8GB) should deliver at least 600 MB/s read and write up to limits defines by other hardware ex 1-3 GB/s

Disk related
a single bad disk can kill the pool performance.
- compare load for all disks or compare two or three pools with 8 disks each

ESXi related
- compare a barebone setup
- reduce vcpu to less than physical cpu, compare one or two vcpu

HBA related
I asume you have a high quality HBA with a current firmware.
There were troubles example with LSI 2008 based ones and firmware 20.0-20.004

Settings related
disable sync and that your RAM ex 24GB is available for ZFS and read/write caching

OS related
There are differences between the Unix options Solaris with native ZFS, Illumos (Solaris fork with Open-ZFS) Free-BSD (Open-ZFS) and Linux (Open-ZFS). This can result in a huge difference mostly when bad drivers are involved. While I have seen that Oracle Solaris with native ZFS was 50% faster in some benchmarks, the differences between Open-ZFS are much smaller and mostly < 10%. A check ZoL vs Solaris vs OmniOS vs Free-BSD shows the difference but you would need the same benchmark sequence.

Others
bad RAM or other bad hardware like HBA, power, cables, backplane
- If you have another server that works, move the disks and/or HBA

modder man · Aug 20, 2018

gea said:
I would try to identify possibilities and rule then out based on propability against conservative performance expectations example

Expectation
ZFS performance scale with number of datadisks, their quality, RAM, fillrate and iops performance in a Raid setup. Calculate 100 MB/s as a lower limit per datadisk. This means that a raid-Z2 pool of example 8 disks (6 datadisks, empty pool, RAM > 8GB) should deliver at least 600 MB/s read and write up to limits defines by other hardware ex 1-3 GB/s

Disk related
a single bad disk can kill the pool performance.
- compare load for all disks or compare two or three pools with 8 disks each

ESXi related
- compare a barebone setup
- reduce vcpu to less than physical cpu, compare one or two vcpu

HBA related
I asume you have a high quality HBA with a current firmware.
There were troubles example with LSI 2008 based ones and firmware 20.0-20.004

Settings related
disable sync and that your RAM ex 24GB is available for ZFS and read/write caching

OS related
There are differences between the Unix options Solaris with native ZFS, Illumos (Solaris fork with Open-ZFS) Free-BSD (Open-ZFS) and Linux (Open-ZFS). This can result in a huge difference mostly when bad drivers are involved. While I have seen that Oracle Solaris with native ZFS was 50% faster in some benchmarks, the differences between Open-ZFS are much smaller and mostly < 10%. A check ZoL vs Solaris vs OmniOS vs Free-BSD shows the difference but you would need the same benchmark sequence.

Others
bad RAM or other bad hardware like HBA, power, cables, backplane
- If you have another server that works, move the disks and/or HBA

First I want to say thanks for the detailed response.

I have two questions,

Do you expect striped vdevs to scale linearly in performance?
How would you check the load/perfomance on individual disks during a benchmark?

gea · Aug 20, 2018

Basically iops and sequential performance scale over vdevs linearly but you are quite fast in regions where you saturate other parts of a server. You can check this when you add vdev by vdev, each from a single basic disk (pure raid-0 stripe). With a few vdevs it scales linearly and up from a certain number the performance advantages are quite low.

To test single disks build a pool from one vdev with a basic disk.

modder man · Aug 20, 2018

gea said:
Basically iops and sequential performance scale over vdevs linearly but you are quite fast in regions where you saturate other parts of a server. You can check this when you add vdev by vdev, each from a single basic disk (pure raid-0 stripe). With a few vdevs it scales linearly and up from a certain number the performance advantages are quite low.

To test single disks build a pool from one vdev with a basic disk.

It looks like the "zpool iostat" command should be good for tracking down a weak disk in the array. Based on the output it doesn't look like I am chasing a weak disk though.

sudo fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test2 --filename=test1 --bs=128k --iodepth=1 --size=384G --readwrite=write

the disks ranged from 53-55MB/s in write performance with the FIO test above.

I do notice interesting CPU behavior during the benchmark

Search

Performance troubleshooting.

modder man

Active Member

gea

Well-Known Member

modder man

Active Member

gea

Well-Known Member

modder man

Active Member