ZFS benchmark

Discussion in 'Linux Admins, Storage and Virtualization' started by Albert Yang, Jul 26, 2019.

  1. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Hi,
    So currently trying to bench mark my ZFS setup before getting a SLOG. Here are my few question is someone could shed some light

    1) what is the rule of thumb when creating a slog? depending on the RAM or size of the disks or the zfs pool? If i have 4 disks of 4tb what is the size of the SSD i need to get?
    2) would this be sufficient? the 58gigs ssd https://www.amazon.com/Intel-Optane-800P-58GB-XPoint/dp/B078ZJSD6F
    3) Currently bench marking with FIO and yes i have pretty bad results but i was reading also depends what type physical disk im using 512 and on the vm storage volblocksize by default is 8k. not sure changing would help?
    Code:
    cat /sys/block/sda/queue/hw_sector_size
    512
    
    4) currently I have arc max to 2gigs which i think might be too low but currently have 32 gigs with 26gigs using for the Vms (prob need to add more ram) but what is the rule of thumb for the ARC max
    5) Can setting compression off help?
    6) setting the atime value to off would it also help on the writes because the VMs are RAW inside of proxmox
    Code:
    zfs set atime=off rpool
    Thank you

    Well these are the stats (pretty bad i know)
    Proxmox Host
    Code:
    command: fio --filename=test --sync=1 --rw=randwrite --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
    
    Random write
    
    test: (g=0): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=4
    fio-3.15-4-g029b
    Starting 1 process
    test: Laying out IO file (1 file / 10240MiB)
    Jobs: 1 (f=1): [f(1)][100.0%][w=475MiB/s][w=475 IOPS][eta 00m:00s]
    test: (groupid=0, jobs=1): err= 0: pid=18230: Fri Jul 26 16:36:21 2019
      write: IOPS=167, BW=168MiB/s (176MB/s)(10.0GiB/61028msec)
        clat (usec): min=99, max=7805.3k, avg=5948.35, stdev=158625.94
         lat (usec): min=104, max=7805.3k, avg=5958.56, stdev=158625.92
        clat percentiles (usec):
         |  1.00th=[    108],  5.00th=[    113], 10.00th=[    120],
         | 20.00th=[    141], 30.00th=[    151], 40.00th=[    157],
         | 50.00th=[    165], 60.00th=[    178], 70.00th=[    198],
         | 80.00th=[    219], 90.00th=[    277], 95.00th=[    347],
         | 99.00th=[    652], 99.50th=[   1500], 99.90th=[2533360],
         | 99.95th=[3137340], 99.99th=[6945768]
       bw (  KiB/s): min=147456, max=2461696, per=100.00%, avg=1037687.90, stdev=663809.16, samples=20
       iops        : min=  144, max= 2404, avg=1013.30, stdev=648.27, samples=20
      lat (usec)   : 100=0.04%, 250=86.66%, 500=11.78%, 750=0.66%, 1000=0.17%
      lat (msec)   : 2=0.26%, 4=0.07%, 10=0.03%, 20=0.05%, 50=0.04%
      lat (msec)   : 250=0.07%, 500=0.01%, 2000=0.04%, >=2000=0.13%
      cpu          : usr=0.14%, sys=2.64%, ctx=7440, majf=8, minf=10
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=0,10240,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
      WRITE: bw=168MiB/s (176MB/s), 168MiB/s-168MiB/s (176MB/s-176MB/s), io=10.0GiB (10.7GB), run=61028-61028msec
    
    
    
    command: fio --filename=test --sync=1 --rw=randread --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
    
    Random read
    
    test: (g=0): rw=randread, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=4
    fio-3.15-4-g029b
    Starting 1 process
    test: Laying out IO file (1 file / 10240MiB)
    Jobs: 1 (f=1): [r(1)][100.0%][r=76.0MiB/s][r=76 IOPS][eta 00m:00s]
    test: (groupid=0, jobs=1): err= 0: pid=337: Fri Jul 26 16:48:21 2019
      read: IOPS=73, BW=73.4MiB/s (76.0MB/s)(10.0GiB/139469msec)
        clat (usec): min=168, max=326933, avg=13616.37, stdev=8044.79
         lat (usec): min=168, max=326934, avg=13616.73, stdev=8044.80
        clat percentiles (usec):
         |  1.00th=[   260],  5.00th=[  7439], 10.00th=[  8717], 20.00th=[ 10159],
         | 30.00th=[ 11076], 40.00th=[ 11994], 50.00th=[ 12649], 60.00th=[ 13435],
         | 70.00th=[ 14222], 80.00th=[ 15270], 90.00th=[ 17695], 95.00th=[ 23462],
         | 99.00th=[ 42730], 99.50th=[ 59507], 99.90th=[ 99091], 99.95th=[111674],
         | 99.99th=[135267]
       bw (  KiB/s): min=16384, max=102400, per=99.88%, avg=75091.18, stdev=12184.43, samples=278
       iops        : min=   16, max=  100, avg=73.25, stdev=11.91, samples=278
      lat (usec)   : 250=0.82%, 500=0.85%
      lat (msec)   : 2=0.90%, 4=0.19%, 10=16.28%, 20=73.71%, 50=6.58%
      lat (msec)   : 100=0.58%, 250=0.09%, 500=0.01%
      cpu          : usr=0.03%, sys=1.70%, ctx=10205, majf=0, minf=268
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
       READ: bw=73.4MiB/s (76.0MB/s), 73.4MiB/s-73.4MiB/s (76.0MB/s-76.0MB/s), io=10.0GiB (10.7GB), run=139469-139469msec
    
    
    command: fio --filename=test --sync=1 --rw=read --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
    
    Sequential read   
    
    test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=4
    fio-3.15-4-g029b
    Starting 1 process
    test: Laying out IO file (1 file / 10240MiB)
    Jobs: 1 (f=1): [R(1)][97.6%][r=335MiB/s][r=335 IOPS][eta 00m:01s]
    test: (groupid=0, jobs=1): err= 0: pid=16042: Fri Jul 26 16:50:29 2019
      read: IOPS=256, BW=257MiB/s (269MB/s)(10.0GiB/39919msec)
        clat (usec): min=173, max=1988.9k, avg=3896.22, stdev=27967.08
         lat (usec): min=174, max=1988.9k, avg=3896.47, stdev=27967.15
        clat percentiles (usec):
         |  1.00th=[    204],  5.00th=[    217], 10.00th=[    229],
         | 20.00th=[    260], 30.00th=[    318], 40.00th=[   1156],
         | 50.00th=[   2147], 60.00th=[   2933], 70.00th=[   3720],
         | 80.00th=[   4817], 90.00th=[   7308], 95.00th=[  10159],
         | 99.00th=[  26870], 99.50th=[  36439], 99.90th=[  74974],
         | 99.95th=[ 274727], 99.99th=[1384121]
       bw (  KiB/s): min= 2048, max=401408, per=100.00%, avg=283533.05, stdev=99968.47, samples=73
       iops        : min=    2, max=  392, avg=276.88, stdev=97.62, samples=73
      lat (usec)   : 250=17.56%, 500=16.98%, 750=2.29%, 1000=1.92%
      lat (msec)   : 2=9.26%, 4=25.09%, 10=21.77%, 20=3.32%, 50=1.53%
      lat (msec)   : 100=0.21%, 250=0.01%, 500=0.02%, 750=0.01%, 2000=0.03%
      cpu          : usr=0.07%, sys=6.32%, ctx=7006, majf=0, minf=266
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
       READ: bw=257MiB/s (269MB/s), 257MiB/s-257MiB/s (269MB/s-269MB/s), io=10.0GiB (10.7GB), run=39919-39919msec
      
      
     
     Command: fio --filename=test --sync=1 --rw=write --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
     
     Sequential write
     
     test: Laying out IO file (1 file / 10240MiB)
    Jobs: 1 (f=1): [W(1)][90.2%][eta 00m:05s]                         
    test: (groupid=0, jobs=1): err= 0: pid=23005: Fri Jul 26 16:52:19 2019
      write: IOPS=219, BW=219MiB/s (230MB/s)(10.0GiB/46746msec)
        clat (usec): min=91, max=9317.8k, avg=4554.20, stdev=160615.65
         lat (usec): min=97, max=9317.8k, avg=4564.12, stdev=160615.78
        clat percentiles (usec):
         |  1.00th=[     97],  5.00th=[    108], 10.00th=[    122],
         | 20.00th=[    130], 30.00th=[    137], 40.00th=[    143],
         | 50.00th=[    151], 60.00th=[    161], 70.00th=[    176],
         | 80.00th=[    188], 90.00th=[    212], 95.00th=[    330],
         | 99.00th=[   1532], 99.50th=[   1762], 99.90th=[ 105382],
         | 99.95th=[4462740], 99.99th=[6207570]
       bw (  MiB/s): min=    2, max= 2500, per=100.00%, avg=1418.39, stdev=863.69, samples=13
       iops        : min=    2, max= 2500, avg=1418.31, stdev=863.68, samples=13
      lat (usec)   : 100=2.60%, 250=90.42%, 500=3.31%, 750=0.24%, 1000=0.35%
      lat (msec)   : 2=2.62%, 4=0.12%, 10=0.14%, 20=0.02%, 100=0.01%
      lat (msec)   : 250=0.09%, 500=0.01%, 2000=0.01%, >=2000=0.07%
      cpu          : usr=0.20%, sys=2.89%, ctx=11012, majf=0, minf=10
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=0,10240,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
      WRITE: bw=219MiB/s (230MB/s), 219MiB/s-219MiB/s (230MB/s-230MB/s), io=10.0GiB (10.7GB), run=46746-46746msec
    

    and the VM

    Code:
    command: fio --filename=test --sync=1 --rw=randwrite --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
        
        randwrite
        
        fio-3.15-4-g029b
    Starting 1 process
    test: Laying out IO file (1 file / 10240MiB)
    Jobs: 1 (f=1): [w(1)][100.0%][w=117MiB/s][w=117 IOPS][eta 00m:00s]
    test: (groupid=0, jobs=1): err= 0: pid=30330: Fri Jul 26 16:56:09 2019
      write: IOPS=89, BW=89.5MiB/s (93.9MB/s)(10.0GiB/114364msec)
        clat (usec): min=918, max=595797, avg=11132.76, stdev=26966.94
         lat (usec): min=932, max=595843, avg=11161.68, stdev=26969.05
        clat percentiles (usec):
         |  1.00th=[  1074],  5.00th=[  1156], 10.00th=[  1221], 20.00th=[  1336],
         | 30.00th=[  1500], 40.00th=[  1713], 50.00th=[  1942], 60.00th=[  2343],
         | 70.00th=[  4883], 80.00th=[ 14222], 90.00th=[ 28181], 95.00th=[ 49021],
         | 99.00th=[131597], 99.50th=[170918], 99.90th=[295699], 99.95th=[375391],
         | 99.99th=[492831]
       bw (  KiB/s): min= 2048, max=286720, per=99.79%, avg=91493.11, stdev=49831.22, samples=228
       iops        : min=    2, max=  280, avg=89.28, stdev=48.66, samples=228
      lat (usec)   : 1000=0.07%
      lat (msec)   : 2=51.91%, 4=17.09%, 10=6.06%, 20=9.51%, 50=10.50%
      lat (msec)   : 100=3.17%, 250=1.49%, 500=0.18%, 750=0.01%
      cpu          : usr=0.32%, sys=5.31%, ctx=36214, majf=0, minf=9
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=0,10240,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
      WRITE: bw=89.5MiB/s (93.9MB/s), 89.5MiB/s-89.5MiB/s (93.9MB/s-93.9MB/s), io=10.0GiB (10.7GB), run=114364-114364msec
    
    Disk stats (read/write):
      vda: ios=359/36257, merge=110/26936, ticks=260/128020, in_queue=128308, util=92.90%
     
    
    command: fio --filename=test --sync=1 --rw=randread --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
    
    Random read
    
    test: Laying out IO file (1 file / 10240MiB)
    Jobs: 1 (f=1): [r(1)][100.0%][r=49.0MiB/s][r=49 IOPS][eta 00m:00s]
    test: (groupid=0, jobs=1): err= 0: pid=30377: Fri Jul 26 17:03:11 2019
      read: IOPS=27, BW=27.7MiB/s (29.0MB/s)(8300MiB/300017msec)
        clat (usec): min=873, max=3333.3k, avg=36138.39, stdev=91739.19
         lat (usec): min=873, max=3333.3k, avg=36139.11, stdev=91739.19
        clat percentiles (usec):
         |  1.00th=[   1012],  5.00th=[   1106], 10.00th=[   1254],
         | 20.00th=[  13042], 30.00th=[  18220], 40.00th=[  21890],
         | 50.00th=[  25560], 60.00th=[  29230], 70.00th=[  33817],
         | 80.00th=[  42206], 90.00th=[  63177], 95.00th=[  88605],
         | 99.00th=[ 202376], 99.50th=[ 375391], 99.90th=[1317012],
         | 99.95th=[2122318], 99.99th=[3338666]
       bw (  KiB/s): min= 2043, max=67584, per=100.00%, avg=30456.64, stdev=15658.90, samples=558
       iops        : min=    1, max=   66, avg=29.71, stdev=15.30, samples=558
      lat (usec)   : 1000=0.78%
      lat (msec)   : 2=16.05%, 4=0.51%, 10=1.07%, 20=16.02%, 50=50.54%
      lat (msec)   : 100=11.08%, 250=3.19%, 500=0.37%, 750=0.11%, 1000=0.06%
      lat (msec)   : 2000=0.13%, >=2000=0.07%
      cpu          : usr=0.04%, sys=1.04%, ctx=69453, majf=0, minf=204
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=8300,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
       READ: bw=27.7MiB/s (29.0MB/s), 27.7MiB/s-27.7MiB/s (29.0MB/s-29.0MB/s), io=8300MiB (8703MB), run=300017-300017msec
    
    Disk stats (read/write):
      vda: ios=72206/557, merge=5772/489, ticks=354680/1624, in_queue=356132, util=98.63%
    
    
    
    command: fio --filename=test --sync=1 --rw=read --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
    
    Sequential read   
    
    
    test: (groupid=0, jobs=1): err= 0: pid=30435: Fri Jul 26 17:06:41 2019
      read: IOPS=183, BW=184MiB/s (193MB/s)(10.0GiB/55699msec)
        clat (usec): min=394, max=3498.9k, avg=5432.27, stdev=39175.67
         lat (usec): min=395, max=3498.9k, avg=5432.98, stdev=39175.68
        clat percentiles (usec):
         |  1.00th=[   685],  5.00th=[   799], 10.00th=[   848], 20.00th=[   955],
         | 30.00th=[  1139], 40.00th=[  1418], 50.00th=[  2089], 60.00th=[  2933],
         | 70.00th=[  4015], 80.00th=[  5669], 90.00th=[  9110], 95.00th=[ 15795],
         | 99.00th=[ 43254], 99.50th=[ 64226], 99.90th=[206570], 99.95th=[526386],
         | 99.99th=[775947]
       bw (  KiB/s): min= 2048, max=348160, per=100.00%, avg=204690.43, stdev=96757.40, samples=102
       iops        : min=    2, max=  340, avg=199.86, stdev=94.50, samples=102
      lat (usec)   : 500=0.12%, 750=2.07%, 1000=21.33%
      lat (msec)   : 2=25.58%, 4=20.82%, 10=21.22%, 20=5.19%, 50=2.93%
      lat (msec)   : 100=0.46%, 250=0.21%, 500=0.03%, 750=0.04%, 1000=0.01%
      lat (msec)   : >=2000=0.01%
      cpu          : usr=0.29%, sys=5.94%, ctx=35793, majf=0, minf=269
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
       READ: bw=184MiB/s (193MB/s), 184MiB/s-184MiB/s (193MB/s-193MB/s), io=10.0GiB (10.7GB), run=55699-55699msec
    
    Disk stats (read/write):
      vda: ios=41292/1655, merge=387/4867, ticks=106828/1292, in_queue=108056, util=98.08%
    
    
     Command: fio --filename=test --sync=1 --rw=write --bs=1m --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
     
     Sequential write
     
     test: Laying out IO file (1 file / 10240MiB)
    Jobs: 1 (f=1): [W(1)][100.0%][w=62.1MiB/s][w=62 IOPS][eta 00m:00s]
    test: (groupid=0, jobs=1): err= 0: pid=30539: Fri Jul 26 17:10:32 2019
      write: IOPS=127, BW=128MiB/s (134MB/s)(10.0GiB/80072msec)
        clat (usec): min=851, max=1115.1k, avg=7792.02, stdev=32943.88
         lat (usec): min=862, max=1115.2k, avg=7814.53, stdev=32945.37
        clat percentiles (usec):
         |  1.00th=[   947],  5.00th=[  1004], 10.00th=[  1045], 20.00th=[  1123],
         | 30.00th=[  1205], 40.00th=[  1303], 50.00th=[  1467], 60.00th=[  1631],
         | 70.00th=[  1827], 80.00th=[  2311], 90.00th=[ 14222], 95.00th=[ 29492],
         | 99.00th=[135267], 99.50th=[223347], 99.90th=[429917], 99.95th=[566232],
         | 99.99th=[624952]
       bw (  KiB/s): min= 2048, max=362496, per=100.00%, avg=131168.47, stdev=80230.28, samples=159
       iops        : min=    2, max=  354, avg=128.04, stdev=78.35, samples=159
      lat (usec)   : 1000=4.51%
      lat (msec)   : 2=70.53%, 4=10.39%, 10=2.52%, 20=4.36%, 50=4.70%
      lat (msec)   : 100=1.55%, 250=1.05%, 500=0.33%, 750=0.05%, 2000=0.01%
      cpu          : usr=0.45%, sys=5.85%, ctx=35313, majf=0, minf=11
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=0,10240,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
      WRITE: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s), io=10.0GiB (10.7GB), run=80072-80072msec
    
    Disk stats (read/write):
      vda: ios=309/33730, merge=202/17093, ticks=13188/81368, in_queue=94600, util=92.71%
    
     
    #1
  2. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,273
    Likes Received:
    752
    1.
    Slog size must be at least 2 x write-ramcache size
    10 GB is lower limit.

    2.
    Optane up from 800P-58 is a very good slog for home or lab use.
    The upperclass/datacenter Optane like the 4801X is only a little faster
    but offers a better write endurance and guaranteed powerloss protection
    what makes it a perfect slog for a production server.

    3.
    I have made a series of filebench benchmarks on Solarish with different type
    of pools, sync enabled/disabled, RAM size and slog (from different disks, ssd, Zeusram, NVMe, Optane). If you have filebench you can compare otherwise you can use it to
    decide about some pool layouts.

    https://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

    4.
    readcache makes random io and access to metadata fast.
    2 GB is stable but way to small if you want performance. Think of 8-32 GB or more.

    Multi VM performance is quite related to iops ans small random read/write.
    Without enough RAM you are limited by pure disk performance. Not a matter
    with a pool from Optane but bad with disks.

    see ESXi VM performance vs RAM
    https://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

    5,
    yes, use LZ4

    6.
    atime off reduces write operations, disable it.
     
    #2
    Last edited: Jul 27, 2019
  3. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Thanks for the reply,
    1) as for the slog size did not really get what you mean 2 x write-ramcache size?
    2) yes i remember you did mention it i want to give a try before buying the other intel optane to see the difference
    3) thank you will check the benchmark, But what about the volblocksize? its different from the vm and the Hardware recommend to change it?

    Thank you
     
    #3
  4. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,273
    Likes Received:
    752
    The only task of an slog is to protect the content of the rambased writecache.
    On Open-ZFS default size of writecache is 10% RAM/ max 4GB (slog is not a writecache)

    When the writecache is full, it is written as a large and fast sequential write to pool
    while a second area of the ramcache is taking new writes.

    This is why an Slog must be 2 x ramcache size (2 x 4GB) as a minimum.

    On ZFS default recordsize is 128k and this is a good compromise.
    For VM storage use I recommend settings between 32 and 64k.
    For a single user media filer 1M may be better.
     
    #4
  5. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Thanks for the reply, so an example with 32 gigs of RAM with 7tb pool?

    as for the vm storage if its currently 8k so changing to 64k or 32 would help on the reads? even thought the disk physical is 512

    Thank you
     
    #5
  6. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,273
    Likes Received:
    752
    There is no direct relation between ramsize and poolsize but there is a relation between ramsize and performance (or active amount of pooldata) especially with slow disks. On a performant ZFS system more than 80% of random reads can be delivered from RAM readcache.

    Do not confuse physical sectorsize of disks (512B or 4k) or blocksize ex of a iSCSI LUN or vdisks (8k) with ZFS recordsize (default 128k). The first two is a given. Only ZFS recordsize (how many physical blocks are read/written in one io, dedup, compress or checksums are related to this) is a performance tunable parameter.
     
    #6
    T_Minus likes this.
  7. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Thank you for the reply, as in our previous conversation you mention
    so as long as i have 1 pool, a minimum for a slog 10gig per pool? I was reading this document i think that's why im somewhat confused
    https://martin.heiland.io/2018/02/23/zfs-tuning/

    and for the SLOG i was reading to improve the reading performance i would add cache pool but to improve write i would add a log pool, but couldn't i just add both with the device with 58gigs of storage?

    Thank you
     
    #7
    Last edited: Jul 28, 2019
  8. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,273
    Likes Received:
    752
    Slog is a vdev that you add to the pool as log device.
    So you need an slog per pool

    Normally I would not dual-use a disk for slog and l2arc.
    Intel Optane is so fast that you can do without serious disadvantages (especially as your ram is low.)

    Recommended size for l2arc is 5x ram (never more than 10x).
    With 2 GB RAM, you may add 10-10 GB L2Arc (as the L2Arc requires RAM to manage) but seriously, use more RAM. Even on Solarish with the lowest RAM needs for ZFS, I would prefer 8GB+ and would not go below 4GB.
     
    #8
    Albert Yang likes this.
  9. BackupProphet

    BackupProphet Well-Known Member

    Joined:
    Jul 2, 2014
    Messages:
    786
    Likes Received:
    278
    A slog is not a write cache,

    A slog takes only synchronous writes. That means writes that has the following behavior:
    write 4kb (2us latency) ->
    flush to disk (120us latency) ->
    write another 8kb (3us latency) ->
    flush to disk (160us latency) ->
    write 4kb (2us latency) ->
    flush to disk (120 us latency)

    Total time spent
    407 us

    Without a fast dedicated slog it would be something more like
    write 4kb (2us latency) ->
    flush to disk (4500us latency) ->
    write another 8kb (3us latency) ->
    flush to disk (4500us latency) ->
    write 4kb (2us latency) ->
    flush to disk (4500 us latency)

    Total 13507 us spent.

    Most writes are asynchronous and behave as the following
    write 4kb (2us latency) ->
    write another 8kb (3us latency) ->
    write 4kb (2us latency)

    Total time spent 7 us.

    Your operating system will handle the disk flushes automatically.

    The Optane 800P is not something I would recommend either, it has low endurance and is pricey. Either just get a 900p 280GB or get the 16GB which is available for cheap.

    To have more RAM for disk cache, consider increasing your swap size. Too many configurations has no swap at all. I highly recommend a few GB's, and the Intel Optane is perfect for swap :)
     
    #9
    Albert Yang likes this.
  10. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Thanks for the reply, as for this With 2 GB RAM, you may add 10-10 GB L2Arc (as the L2Arc requires RAM to manage) 10gig of SLOG? as correct this is going to be the LAB before putting it in production on a server with 220 gigs of ram and raid 10 15k RPM disks 600gigs each disk, but before i need to understand how to distribute the SLOG, as for the mirror is just in case the disk dies out, i have time to replace the SSD. SO rule of thumb 10gig per pool? but 1 slog (ssd) can only be configured either cache or log? cant be both right?
    sorry for my ignorance

    Thank you
     
    #10
  11. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Thank you for the reply, a bit more about SLOG there are two configuration LOG or CACHE to add to the pool. i was looking at what you recommend something like this?
    https://www.amazon.com/Intel-Optane...ane+900p+16gb&qid=1564614425&s=gateway&sr=8-1

    The idea is that on my test lab is the 32 gigs of ram with 1tb 7200rpm raid 10 ZFS. but the real LAB is 220 gigs of ram and 600gigs 15k rpm disks with raid 10 so im trying test out before on production and bench test before and after with FIO. With the 16 gigs will be enough for the SLOG? As because were running MSSQL would it be better a LOG slog or CACHE?
    Thank you
     
    #11
  12. BackupProphet

    BackupProphet Well-Known Member

    Joined:
    Jul 2, 2014
    Messages:
    786
    Likes Received:
    278
    SQL servers benefit a lot from having a dedicated slog, they do a lot of sync writes. Optane is so fast that you can create two partitations for slog and L2ARC without risking degraded performance.
     
    #12
    Albert Yang likes this.
  13. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Thank you so much for this help, so from the link amazon works for the slog i need?
    Thank you again
     
    #13
  14. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,273
    Likes Received:
    752
    The 16/32G Optane is not really bad, but more like a very good Sata SSD. The Optane up from 800P are around 3x as fast, see https://napp-it.org/doc/downloads/optane_slog_pool_performane.pdf chapter 2.9 vs 2.10

    Optane is fast enough so you can partition for a dualuse slog+l2arc. With 32G RAM an L2Arc may be not really helpful.

    For a cheap home/lab server the 16/32G Optane may be ok (although I had troubles with them on some settings).

    For a production system, use datacenter disks (DC 4801X). propably in an Optane only pool for ultrafast databases without the need of an extra slog.
     
    #14
    Albert Yang likes this.
  15. Albert Yang

    Albert Yang Member

    Joined:
    Oct 26, 2017
    Messages:
    46
    Likes Received:
    0
    Thanks for the reply, im going to try out the SLOG and post back the results
    Thank you again
     
    #15
  16. Davewolfs

    Davewolfs Active Member

    Joined:
    Aug 6, 2015
    Messages:
    329
    Likes Received:
    31
    @gea Between the 100GB 4801x and 900p. Which one would you go for?
     
    #16
  17. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,273
    Likes Received:
    752
    Only a matter of price.
    The higher capacity 480x are as fast as the 900P and guarantee plp what the 900p does not but the 4801-100 is not as fast a sthe 900p.

    In my lab I use 900P and see no reason to replace, If I should suggest a production setup and only the 4801-100 or the 900P are affordable, I would suggest the 4801.
     
    #17
    Patrick likes this.
Similar Threads: benchmark
Forum Title Date
Linux Admins, Storage and Virtualization Ceph Benchmark request /NVME Feb 22, 2019
Linux Admins, Storage and Virtualization nvmeof 900p quick benchmarks Dec 3, 2017
Linux Admins, Storage and Virtualization Benchmarking Jun 21, 2017

Share This Page