ZFS Napp-it ZVOL 100% Busy disk only 50% - bottleneck?

freeman · Apr 21, 2020

I have a strange issue, on my ZFS Napp-it back-up storage.
When my back-up jobs are running the pool off the storage is most off the time busy @ 100% or almost 100% but the disk are busy around 50%.
I am using SATA spinning disk so they are already slow and I don't want the disk to be the bottle neck.

Can someone please advice me what is wrong with my setup?
Or how to troubleshoot? (I had a single lun and changed it to 2 luns but that didn't help)

Specs:
Supermicro Server 4U Chassis 846E16-R1200B
X8DTN+-F 24GB RAM xeon cpu
LSI2008 IT hba
2x 1TB rpool
16x 8TB 2x 8 Raidz2 in stripe Tank1
Dual port 10GB single link
Supermicro 4u

Pool
128KB
compression lz4
dedupe off
ZFS_ARC_MAX 17GB

2x ZFS volume 128KB iSCSI LUN

connected to
ESXi server with single VM Server 2016 (Veeam Back-up) with 8 cores, 88GB Ram, 10GB VMXNET3 iSCSI

# zpool status
pool: Tank1
config:
NAME STATE READ WRITE CKSUM
Tank1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c0t5000C500A45A4389d0 ONLINE 0 0 0
c0t5000C500A45A73EAd0 ONLINE 0 0 0
c0t5000C500A45A75A2d0 ONLINE 0 0 0
c0t5000C500A45B6EB5d0 ONLINE 0 0 0
c0t5000C500A45D5DFFd0 ONLINE 0 0 0
c0t5000C500A45E6E92d0 ONLINE 0 0 0
c0t5000C500A45E8218d0 ONLINE 0 0 0
c0t5000C500A45EB962d0 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
c0t5000C500A52BBA32d0 ONLINE 0 0 0
c0t5000C500A52BD754d0 ONLINE 0 0 0
c0t5000C500A52C33E7d0 ONLINE 0 0 0
c0t5000C500A52C80CAd0 ONLINE 0 0 0
c0t5000C500A52C9BD5d0 ONLINE 0 0 0
c0t5000C500A52F1CF1d0 ONLINE 0 0 0
c0t5000C500A5D84CFDd0 ONLINE 0 0 0
c0t5000C500A60E9679d0 ONLINE 0 0 0
errors: No known data errors

#iostat -x 1
device r/s w/s kr/s kw/s wait actv svc_t %w %b
rpool 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
Tank1 1220.0 1557.0 18759.8 83255.0 36.1 9.6 16.5 15 96
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd3 79.0 107.0 1180.0 6587.9 0.0 0.6 3.0 0 36
sd4 80.0 109.0 1316.0 6567.9 0.0 0.5 2.6 0 32
sd5 79.0 107.0 1108.0 6563.9 0.0 0.8 4.2 0 42
sd6 88.0 114.0 1452.0 6567.9 0.0 0.7 3.3 0 42
sd7 33.0 107.0 320.0 6579.9 0.0 0.5 3.2 0 30
sd8 80.0 109.0 1368.0 6543.9 0.0 0.6 3.1 0 36
sd9 87.0 110.0 1468.0 6567.9 0.0 0.5 2.6 0 36
sd10 34.0 120.0 332.0 6547.9 0.0 0.3 2.1 0 23
sd11 74.0 87.0 828.0 3836.0 0.0 0.6 3.9 0 39
sd12 81.0 86.0 1396.0 3880.0 0.0 0.5 3.1 0 39
sd13 78.0 86.0 1296.0 3852.0 0.0 0.7 4.0 0 38
sd14 79.0 86.0 1308.0 3872.0 0.0 0.5 3.1 0 36
sd15 92.0 90.0 1560.0 3800.0 0.0 0.8 4.6 0 47
sd16 79.0 84.0 812.0 3844.0 0.0 0.7 4.4 0 43
sd18 94.0 83.0 1612.0 3808.0 0.0 0.6 3.6 0 37
sd19 83.0 88.0 1404.0 3836.0 0.0 0.6 3.5 0 38

# sar -d 1 10
device %busy avque r+w/s blks/s avwait avserv
stmf_lu_ 91 2.3 643 74852 0.2 3.4
stmf_lu_ 10 1.1 211 36170 0.0 5.3
stmf_tgt 92 3.5 854 111021 0.2 3.9
Tank1 89 47.6 3935 328682 8.7 3.3
most busy disk on that moment:
sd10 43 1.2 293 20429 0.0 4.1

# mpstat 5
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 102 0 40 512 107 3926 5 504 1057 21 82 0 8 0 92
1 183 0 46 157 11 3862 6 566 1296 24 178 1 8 0 92
2 271 0 35 141 8 2749 5 420 1161 24 236 1 7 0 93
3 278 0 43 147 9 2780 6 508 1110 24 222 1 7 0 92
4 293 0 38 3926 3803 2749 29 459 1339 20 251 1 9 0 90
5 363 0 33 1906 1766 2448 6 380 1081 23 316 1 8 0 91
6 355 0 45 901 761 3638 6 505 1250 23 273 1 8 0 92
7 207 0 43 188 10 3079 4 346 1754 17 123 0 9 0 90

# smartctl -a /dev/rdsk/c0t5000C500A60E9679d0
smartctl 6.6 2017-11-05 r4594 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: ST8000NM0055-1RM112
Serial Number: ZA19TZPY
LU WWN Device Id: 5 000c50 0a60e9679
Firmware Version: SN04
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)

# zfs get all Tank1
NAME PROPERTY VALUE SOURCE
Tank1 type filesystem -
Tank1 creation Wed Nov 22 21:41 2017 -
Tank1 used 33.3T -
Tank1 available 46.6T -
Tank1 referenced 188K -
Tank1 compressratio 1.35x -
Tank1 mounted yes -
Tank1 quota none default
Tank1 reservation none default
Tank1 recordsize 128K local
Tank1 mountpoint /Tank1 default
Tank1 sharenfs off default
Tank1 checksum on default
Tank1 compression lz4 local
Tank1 atime off local
Tank1 devices on default
Tank1 exec on default
Tank1 setuid on default
Tank1 readonly off default
Tank1 zoned off default
Tank1 snapdir hidden default
Tank1 aclmode discard default
Tank1 aclinherit restricted default
Tank1 canmount on default
Tank1 xattr on default
Tank1 copies 1 default
Tank1 version 5 -
Tank1 utf8only off -
Tank1 normalization none -
Tank1 casesensitivity sensitive -
Tank1 vscan off default
Tank1 nbmand off default
Tank1 sharesmb off default
Tank1 refquota none default
Tank1 refreservation none local
Tank1 primarycache all local
Tank1 secondarycache all local
Tank1 usedbysnapshots 119K -
Tank1 usedbydataset 188K -
Tank1 usedbychildren 33.3T -
Tank1 usedbyrefreservation 0 -
Tank1 logbias latency default
Tank1 dedup off default
Tank1 mlslabel none default
Tank1 sync standard local
Tank1 refcompressratio 1.00x -
Tank1 written 0 -
Tank1 logicalused 41.4T -
Tank1 logicalreferenced 36.5K -
Tank1 filesystem_limit none default
Tank1 snapshot_limit none default
Tank1 filesystem_count none default
Tank1 snapshot_count none default
Tank1 redundant_metadata all default

# zfs get all Tank1/ZFS2_Veeam2
NAME PROPERTY VALUE SOURCE
Tank1/ZFS2_Veeam2 type volume -
Tank1/ZFS2_Veeam2 creation Tue Apr 21 9:10 2020 -
Tank1/ZFS2_Veeam2 used 1.10T -
Tank1/ZFS2_Veeam2 available 46.6T -
Tank1/ZFS2_Veeam2 referenced 1.10T -
Tank1/ZFS2_Veeam2 compressratio 1.05x -
Tank1/ZFS2_Veeam2 reservation none default
Tank1/ZFS2_Veeam2 volsize 20T local
Tank1/ZFS2_Veeam2 volblocksize 128K -
Tank1/ZFS2_Veeam2 checksum on default
Tank1/ZFS2_Veeam2 compression lz4 inherited from Tank1
Tank1/ZFS2_Veeam2 readonly off default
Tank1/ZFS2_Veeam2 copies 1 default
Tank1/ZFS2_Veeam2 refreservation none default
Tank1/ZFS2_Veeam2 primarycache all inherited from Tank1
Tank1/ZFS2_Veeam2 secondarycache all inherited from Tank1
Tank1/ZFS2_Veeam2 usedbysnapshots 0 -
Tank1/ZFS2_Veeam2 usedbydataset 1.10T -
Tank1/ZFS2_Veeam2 usedbychildren 0 -
Tank1/ZFS2_Veeam2 usedbyrefreservation 0 -
Tank1/ZFS2_Veeam2 logbias latency default
Tank1/ZFS2_Veeam2 dedup off default
Tank1/ZFS2_Veeam2 mlslabel none default
Tank1/ZFS2_Veeam2 sync standard inherited from Tank1
Tank1/ZFS2_Veeam2 refcompressratio 1.05x -
Tank1/ZFS2_Veeam2 written 1.10T -
Tank1/ZFS2_Veeam2 logicalused 1.15T -
Tank1/ZFS2_Veeam2 logicalreferenced 1.15T -
Tank1/ZFS2_Veeam2 snapshot_limit none default
Tank1/ZFS2_Veeam2 snapshot_count none default
Tank1/ZFS2_Veeam2 redundant_metadata all default

gea · Apr 21, 2020

A single disk with an unexpected high wait or busy value indicates a problem as load of disks in a pool should be similar. A pool high load indicates that the computer is fast enough to saturate a pool as it works as fast as possible.

Can you add the output of napp-it menu Pools > Benchmark (insert here as code to keep the table). This is a series of random and sequential read/write tests with sync disabled/enabled to determine if results are as expected. Set lz4 in the test to off and caches to on as you want to test regular disk performance not compress quality.

freeman · Apr 22, 2020

A pool high load indicates that the computer is fast enough to saturate a pool as it works as fast as possible.
I don't understand how can de pool be saturate and the disk %busy @around %50 if the computer is fast enough?

Here is the test, i set the comp to off:

Code:

begin test 1 .
begin test 2 .
begin test 3 .
begin test 4 .

begin dd write test 5 time dd if=/dev/zero of=/Tank1/_SimpleWritetest/syncwrite.tst bs=500000000 count=10
5000000000 bytes transferred in 11.818508 secs (423065259 bytes/sec)

begin dd write test 6 time dd if=/dev/zero of=/Tank1/_SimpleWritetest/syncwrite.tst bs=500000000 count=10
5000000000 bytes transferred in 5.249479 secs (952475494 bytes/sec)

hostname                           zfs2
pool                               Tank1
write loop                         10 s
remark                             

test 1/2 results                   writes via cache + sync logging    writes via cache only             
data per write                     8KB                                8KB                               
sync setting                       always                             disabled                           
compress setting                   off                                off                               
recordsize setting                 128K                               128K                               
write actions                      380                                2212                               
write actions/s                    38                                 221                               
throughput                         304 KBytes/s                       1.8 MBytes/s                       

test 3/4 results                   writes via cache + sync logging    writes via cache only             
data per commit                    256K                               256K                               
sync setting                       always                             disabled                           
compress setting                   off                                off                               
recordsize setting                 128K                               128K                               
write actions                      29                                 1092                               
write actions/s                    2                                  109                               
throughput                         512 KBytes/s                       27.9 MBytes/s                     

test 5/6 results                   writes via cache + sync logging    writes via cache only             
data                               5GB                                5GB                               
sync setting                       always                             disabled                           
dd sequential                      423.1 MB/s                         952.5 MB/s

gea · Apr 22, 2020

Your pool is at 96% busy. This is the view over all disks each at around 40% busy and no wait%.

Your values: I would prefer filebench tests, they are more related to a real world workload.
The 8KB values indicated the behaviour with small datablocks. This is where disks are weak.

The dd values are strict sequential but when writing to a Raid, you have a mixed workload (sequential/random).

Your pool consists of 2 x 8 disks Z2 what means 12 datadisks. Your sequential performance is 950 MB/s what means an average of 80 MB/s per datadisk, low but within expectations (80-120 MB/s in a mixed workload/Raid). The random io values especially with sync enabled are bad as expected for a disk based Z2 pool where your iops is equal to two disks, around 200 raw disk iops. I suppose you have disabled sync so sync values do not matter.

Can you increase arc ram ex to 40 GB to check RAM effect.

freeman · Apr 22, 2020

filebench fileserver.f comp and sync disabled

Code:

start filebench..
Filebench Version 1.4.9.1
  459: 0.000: Allocated 126MB of shared memory
  459: 0.004: File-server Version 3.0 personality successfully loaded
  459: 0.004: Creating/pre-allocating files and filesets
  459: 0.254: Fileset bigfileset: 10000 files, 0 leafdirs, avg dir width = 20, avg dir depth = 3.1, 1254.784MB
  459: 0.260: Removed any existing fileset bigfileset in 1 seconds
  459: 0.260: making tree for filset /Tank1/filebench.tst/bigfileset
  459: 0.292: Creating fileset bigfileset...
  459: 1.537: Preallocated 8015 of 10000 of fileset bigfileset in 2 seconds
  459: 1.537: waiting for fileset pre-allocation to finish
  459: 1.548: Starting 1 filereader instances
  477: 1.629: Starting 50 filereaderthread threads
  459: 2.674: Running...
  459: 32.732: Run took 30 seconds...
  459: 32.738: Per-Operation Breakdown
statfile1            92867ops     3090ops/s   0.0mb/s      0.1ms/op       19us/op-cpu [0ms - 89ms]
deletefile1          92873ops     3090ops/s   0.0mb/s      2.8ms/op      104us/op-cpu [0ms - 260ms]
closefile3           92878ops     3090ops/s   0.0mb/s      0.0ms/op        5us/op-cpu [0ms - 33ms]
readfile1            92878ops     3090ops/s 409.2mb/s      0.1ms/op       83us/op-cpu [0ms - 177ms]
openfile2            92878ops     3090ops/s   0.0mb/s      0.2ms/op       28us/op-cpu [0ms - 159ms]
closefile2           92879ops     3090ops/s   0.0mb/s      0.0ms/op        6us/op-cpu [0ms - 23ms]
appendfilerand1      92892ops     3090ops/s  24.1mb/s      2.9ms/op       78us/op-cpu [0ms - 230ms]
openfile1            92895ops     3091ops/s   0.0mb/s      0.2ms/op       29us/op-cpu [0ms - 134ms]
closefile1           92895ops     3091ops/s   0.0mb/s      0.0ms/op        6us/op-cpu [0ms - 103ms]
wrtfile1             92904ops     3091ops/s 388.4mb/s      4.5ms/op      114us/op-cpu [0ms - 249ms]
createfile1          92916ops     3091ops/s   0.0mb/s      3.3ms/op      100us/op-cpu [0ms - 266ms]
  459: 32.738:

IO Summary:
1021755 ops, 33993.342 ops/s, (3090/6181 r/w), 821.7mb/s,    482us cpu/op,   4.7ms latency
  459: 32.738: Shutting down processes

ok.

filebench randomread.f comp disabled and sync disabled

Code:

start filebench..
Filebench Version 1.4.9.1
21437: 0.000: Allocated 126MB of shared memory
21437: 0.003: Random Read Version 3.0 personality successfully loaded
21437: 0.003: Creating/pre-allocating files and filesets
21437: 0.003: File largefile1: 5120.000MB
21437: 0.007: Removed any existing file largefile1 in 1 seconds
21437: 0.007: making tree for filset /Tank1/filebench.tst/largefile1
21437: 0.007: Creating file largefile1...
21437: 5.414: Preallocated 1 of 1 of file largefile1 in 6 seconds
21437: 5.414: waiting for fileset pre-allocation to finish
21437: 5.414: Starting 1 rand-read instances
21488: 5.420: Starting 1 rand-thread threads
21437: 6.421: Running...
21437: 36.421: Run took 30 seconds...
21437: 36.421: Per-Operation Breakdown
rand-read1           2574053ops    85800ops/s 670.3mb/s      0.0ms/op        8us/op-cpu [0ms - 0ms]
21437: 36.421:

IO Summary:
2574053 ops, 85799.857 ops/s, (85800/0 r/w), 670.3mb/s,     12us cpu/op,   0.0ms latency
21437: 36.421: Shutting down processes

ok.

when running randomread.f

Code:

device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  tin tout  us sy dt id
rpool     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0    0 1511   1 39  0 60
Tank1     0.0 17778.7    0.0 1118581.8 256.8 90.0   19.5 100 100
sd0       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
sd1       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
sd2       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
sd3       0.0 1299.5    0.0 85637.1  0.0  6.1    4.7   1  74
sd4       0.0 1055.6    0.0 84341.5  0.0  7.7    7.3   1  87
sd5       0.0 1237.6    0.0 86020.9  0.0  6.6    5.4   1  81
sd6       0.0 1356.5    0.0 85844.9  0.0  6.1    4.5   1  73
sd7       0.0 1254.5    0.0 85085.2  0.0  5.8    4.7   1  72
sd8       0.0 1229.6    0.0 85952.8  0.0  6.9    5.6   1  79
sd9       0.0 1265.5    0.0 85105.1  0.0  6.3    5.0   1  76
sd10      0.0 1449.5    0.0 88175.9  0.0  5.2    3.6   1  68
sd11      0.0  971.6    0.0 53800.4  0.0  4.0    4.1   1  46
sd12      0.0  924.7    0.0 53788.4  0.0  4.3    4.7   1  48
sd13      0.0  908.7    0.0 51593.2  0.0  5.3    5.8   1  59
sd14      0.0 1045.6    0.0 55803.6  0.0  4.0    3.9   1  69
sd15      0.0 1037.6    0.0 53788.3  0.0  3.7    3.6   1  44
sd16      0.0  935.7    0.0 53940.2  0.0  6.0    6.4   1  71
sd18      0.0  777.7    0.0 56019.5  0.0  8.3   10.7   1  87
sd19      0.0 1029.6    0.0 53788.2  0.0  3.2    3.2   1  38

filebench randomread.f comp disabled and sync always

Code:

start filebench..
Filebench Version 1.4.9.1
16330: 0.000: Allocated 126MB of shared memory
16330: 0.003: Random Read Version 3.0 personality successfully loaded
16330: 0.003: Creating/pre-allocating files and filesets
16330: 0.003: File largefile1: 5120.000MB
16330: 0.007: Removed any existing file largefile1 in 1 seconds
16330: 0.007: making tree for filset /Tank1/filebench.tst/largefile1
16330: 0.108: Creating file largefile1...
16330: 147.441: Preallocated 1 of 1 of file largefile1 in 148 seconds
16330: 147.441: waiting for fileset pre-allocation to finish
16330: 147.441: Starting 1 rand-read instances
17764: 147.607: Starting 1 rand-thread threads
16330: 148.607: Running...
16330: 178.609: Run took 30 seconds...
16330: 178.609: Per-Operation Breakdown
rand-read1           2538379ops    84611ops/s 661.0mb/s      0.0ms/op        8us/op-cpu [0ms - 0ms]
16330: 178.609:

IO Summary:
2538379 ops, 84610.926 ops/s, (84611/0 r/w), 661.0mb/s,     11us cpu/op,   0.0ms latency
16330: 178.609: Shutting down processes

ok.

while running randomread.f why is %w 23 on the pool how can I see where it is waiting for? And why is the pool not @100%busy when running randomread.f?

Code:

device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  tin tout  us sy dt id
rpool     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0    0 1495   1  7  0 92
Tank1     0.0 1415.6    0.0 23930.3  9.8  3.5    9.4  23  31
sd0       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
sd1       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
sd2       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
sd3       0.0   89.0    0.0 1172.5  0.0  0.2    1.8   0   9
sd4       0.0   92.0    0.0 1184.5  0.0  0.1    1.5   0   8
sd5       0.0   90.0    0.0 1156.5  0.0  0.2    1.7   0   9
sd6       0.0   87.0    0.0 1152.5  0.0  0.2    1.8   0   9
sd7       0.0   88.0    0.0 1160.5  0.0  0.2    1.8   0   9
sd8       0.0   90.0    0.0 1192.5  0.0  0.2    2.1   0  13
sd9       0.0   88.0    0.0 1148.5  0.0  0.1    1.6   0   7
sd10      0.0   86.0    0.0 1164.5  0.0  0.1    1.7   0   9
sd11      0.0  114.0    0.0 1800.8  0.0  0.2    2.0   0  13
sd12      0.0  116.1    0.0 1816.8  0.0  0.2    2.0   0  12
sd13      0.0  114.0    0.0 1832.8  0.0  0.3    2.3   0  16
sd14      0.0  113.0    0.0 1824.8  0.0  0.2    2.1   0  13
sd15      0.0  118.1    0.0 1828.8  0.0  0.3    2.5   0  19
sd16      0.0  114.0    0.0 1820.8  0.0  0.4    3.2   0  26
sd18      0.0  121.1    0.0 1848.8  0.0  0.3    2.1   0  15
sd19      0.0  120.1    0.0 1824.8  0.0  0.3    2.8   0  22

The server is @ the datacenter so I can't at ram easy.
Isn't it strange that for a 5GB test file whe need more thant 24GB RAM?
Can you please explains how this works?

With sync to always it is very slow we have sync standard and the iSCSI volume with writ-back enabled is this the way to go?

gea · Apr 22, 2020

About the RAM

Per default, 10% of free RAM/ max 4GB is used as write cache. When it is half filled, content is written as a fast sequential write to pool while the other half continues caching. If you look at benchmarks over blocksize, you see an increased performance with larger blocksizes. Sizes below say 100kB are becoming slow so your ZFS write cache should not be below 200 kB as absolute minimum what is achieved with 2GB RAM. Plus the 2GB for the OS what gives the 4GB RAM minimum for a 64bit ZFS system (without much help from readcaching). Non Solaris systems require slightly more RAM for ZFS and other services or management tools. Some appliances expect 8 or even 16GB as minimum.

Per default setting 80% or other free RAM is used for readcaching unless no other process wants RAM. This readcache is called ARC. It works on a read most/ read last base but not filebased but ZFS blockbased. This is why it does not help on sequential reads/ file reads.

This intensive RAM usage is the reason why ZFS always consumes most of the RAM does not matter the load situation as this is the source of performance. On random access more than 80% of reads can be delivered from RAM without disk load. This is also the reason why more RAM improves even write performance as it reduces concurrent read and even for writes you must read metadata. There are workloads ex mailserver where > 100 GB RAM can make sense. Mostly SSDs instead HD + much RAM is more efficient.

Overall I would say your performance is quite ok. Not really fast but no visible problem from values. More RAM can improve performance. For sync an Slog can massively improve performance. A slight boost may be achievable with a special vdev mirror for metadata and small io (care about same ashift as pool with special vdevs). A multi mirror instead the dual Z2 will also improve performance.

Your filebench are not too different to the dd values, lower as this are more real world values, dd is a bad benchmark. Nothing really special.

If you enable or disable sync for a filesystem you force this setting. With sync=default the writing application can decide. In case of iSCSI this is the writeback setting. Enable means sync=off.

btw.
A Pool > Benchmark with default settings gives a quick overview with several filebenches and sync=always vs disabled

freeman · Apr 23, 2020

Thanks a lot for these answers this clarified a lot for me and I know now my storage is working reasonable.
But I still have a few questions:

sync=standard
I have iSCSI and I have sync=standard.
In Windows I see 2 options:

1. Enable write caching on the device
Improves system performance by enabling write caching on the device, but a power outage or equipment failure might result in data loss or corruption.
- 1.1 Turn off Windows write-cahe buffer flushing on the device
  To prevent data loss, do not select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure.

I have 1. enabled and 1.1 not, is this the same as async=off?
or do I need to enable 1. and 1.1? Can you explain the diffrence?

Slog
You wrote: For sync an Slog can massively improve performance.
Do you mean if the application enables sync or when setting hard sync=alway --> only then an Slog wil improve performance? (so probably in my case it won't?)

More Memory
My back-up files are around 4TB, so when making a new back-up I gues ARC doesn't help that much.
Is it in my case smart to try lowering ARC to MAX 10GB and keep the rest 14GB for OS2GB/write cache2GB and iSCSI?
In stead of adding additional RAM to upgrade to 40GB?

Single disk explain
Sorry but I still don't get why a single disk won't report 100% busy when the Pool is working at 100% busy?
Can you try to explain this to me?

gea · Apr 23, 2020

write cache
There are write caches on several layers starting from the OS cache (ex ZFS or Windows), the driver/ controller cache (ex on a raidadapter) and the disk write cache. Each of these caches can commit a write although the write is not landed physically on disk. Each cache setting address another layer. In the end the only important point is, that every writecache must be disabled unless there is crash protection or writes are not safe.

If you use ZFS, the important thing is, that you use sync write where the ZFS cache is protected by an Slog or the onpool ZIL. Do not allow any other writecache. Usually you force sync and most other cache settings become irrelevant. ZFS cares about the disk cache automatically. If you set sync=default, you must care about a client setting like writeback. If you disable writeback cache, you enable sync. The last thing is a write cache on client OS level like your Windows cache setting here. This mus be disabled as there is no protection.

Slog
If you need sync security (writecache protection), add an Slog. If you do not need sync, disable for better performance.

Arc
is readcache. It helps also for write as it caches metadata but most effect is on random read.
ZFS readcache is more efficient than OS or iSCSI readcache. ZFS writecache is also more efficient (and can be protected) compared to OS or iSCSI caching. If you can add RAM for storage, add it to the storage VM. ZFS does its best to use it both for read/write caching.

Busy
Your server/ disk subsystem is capable of writing around 950 MB/s on a pure sequential dd load with near 100%busy. The load is limited not only by the server and disks but by raid processing, compress, metadata read and others. These writes are distributed quite even over all datadisks in the raid. In the end you see 950 MB/s distributed over 12 datadisks (I do not count redundancy) = around 80 MB/s load per disk.

If you look at the disk specs you will see that sequentially a single fast 8TB disk can deliver up to 250 MB/s. Even on a mixed workload, a single disk is much faster than 80 MB/s, should be more than twice the value. If you look at disks, you see this load per disk (around 40%) as a single disk would allow more.

Search

ZFS Napp-it ZVOL 100% Busy disk only 50% - bottleneck?

freeman

New Member

gea

Well-Known Member

freeman

New Member

gea

Well-Known Member

freeman

New Member

gea

Well-Known Member

freeman

New Member

gea

Well-Known Member