Slow read speed for large files

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by RonanR, Nov 23, 2018.

  1. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Hi,
    I'm wondering why I got very slow read speed for large files (basically file larger than my RAM), compared to top notch async write speed.
    Here is my setup:
    Up to date Omnios 151028
    Supermicro X10SRL-F board with a Xeon 1620 v3
    32GB of DDR4 ECC
    Controller LSI 3008
    12 HGST DC HC510 Sata (10TB drives) configured in one z2 pool with record size set to 256K

    Basically, as soon as I read a file which cannot fit in the ARC cache, I got very slow read speed, whereas my async write speed is very good.
    On an AJA Speed test over a 10Gb connection, with 16GB file test, I got 950MB/s in write and 900MB/s in read.
    If I set the filesize to 64GB, I still got 950MB/s in write but only around 100MB/s in read.
    I also got the same result doing large files copy through Windows explorer.

    It's the exact same effect if I disable the cache when doing benchmarks:
    Bennchmark filesystem: /hdd12z2/_Pool_Benchmark
    Read:
    filebench+dd, Write: filebench_sequential, date: 11.23.2018

    time dd if=/dev/zero of=/hdd12z2/_Pool_Benchmark/syncwrite.tst bs=500000000 count=10
    5000000000 bytes transferred in 3.431949 secs (1456898318 bytes/sec)

    hostname XST24BA Memory size: 32661 Megabytes
    pool hdd12z2
    (recsize=256k, compr=off, readcache=none)
    slog -
    remark



    Fb3 sync=always sync=disabled

    Fb4 singlestreamwrite.f sync=always sync=disabled
    246 ops 7426 ops
    49.197 ops/s 1483.963 ops/s
    10802us cpu/op 2042us cpu/op
    20.2ms latency 0.7ms latency
    49.0 MB/s 1483.8 MB/s
    ____________________________________________________________________________
    read fb 7-9 + dd (opt) randomread.f randomrw.f singlestreamr dd
    pri/sec cache=none 0.4 MB/s 0.8 MB/s 81.2 MB/s 119.9 MB/s
    ____________________________________________________________________________

    If I set the record size to 1M, I got a small a little over 200MB/s in read, but it's still far less than expected.

    Bennchmark filesystem: /hdd12z2/_Pool_Benchmark
    Read:
    filebench+dd, Write: filebench_sequential, date: 11.23.2018

    time dd if=/dev/zero of=/hdd12z2/_Pool_Benchmark/syncwrite.tst bs=500000000 count=10
    5000000000 bytes transferred in 1.931137 secs (2589148332 bytes/sec)

    hostname XST24BA Memory size: 32661 Megabytes
    pool hdd12z2
    (recsize=1M, compr=off, readcache=none)
    slog -
    remark



    Fb3 sync=always sync=disabled

    Fb4 singlestreamwrite.f sync=always sync=disabled
    212 ops 9971 ops
    42.397 ops/s 1994.163 ops/s
    12158us cpu/op 1552us cpu/op
    23.4ms latency 0.5ms latency
    42.2 MB/s 1994.0 MB/s
    __________________________________________________________________________

    read fb 7-9 + dd (opt) randomread.f randomrw.f singlestreamr dd
    pri/sec cache=none 0.4 MB/s 0.8 MB/s 200.0 MB/s 243.9 MB/s
    __________________________________________________________________________

    While I can somehow understand the result when the cache is disabled, why do I have the same result when the cache is enable, but only with files bigger than my RAM size ?
    What's more strange for me is that it's slow even at the beginning of the copy/AJA read test, as if the cache was never used.
    I used to work with XFS shares through SMB, using a dedicated LSI MegaRAID card with BBU, and on which I never had such disparities in read and write performances using any file size.
    I'm quite new to ZFS, so I'm trying to understand its strengths (and there are plenty!) and limits.
     
    #1
  2. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,248
    Likes Received:
    745
    At first a few remarks

    ZFS is a high secure filesystem with data/metadata checksums and Copy on Write. Both are needed for the security level ZFS wants to achieve, The first produces more data to process, the second a higher fragmentation. To overcome these limitations, ZFS has superiour cache strategies.

    So the first thing is, you need the readcaches. Disable only when you want to check some details, not for regular use or regular benchmark checks. Even on writes you need the readcache to read metadata.

    The readcache also do not store files otherwise it would fill up with a single file. It stores ZFS datablocks based on a read most/ read last strategy. It does not help with large sequential files but is intended for small random reads. For large sequential files it caches only metadata. On a pool with a higher fragmentation this can lead to lower readperformance than writeperformance as read of sequential files is iops limited.

    Only the L2Arc (Addition to the readcache on SSDs or NVMe) can enable read ahead that can improve reading of sequential data.

    On writes, all writes are always cached in the RAM. The size of this writecache is 10% RAM/max 4GB per default. Its intention is to transfer many small and slow random writes to a single fast sequential write. This allows a high write performance even on a slower pool and often even higher values than on read where you lack this level of cache support.

    If you want a secure/ crash save write behaviour (like with the BBU on a hardwareraid) you can enable/force sync write. In this case you additionally log any single write commit to the onpool ZIL or an extra Slog device. In such a case a low iops pool is slow unless you add a fast Slog, ex an Intel Optane up from 800P.

    I have also seen 900 MB/s write and 100 MB/s reads in some cases. But such huge differences were always related to a bad cable, bad settings or driver problems. On Windows ex you should use the newest drivers and disable int throtteling. Optionally try another cable. On large files 800 MB/s writes and 600 MB/s reads are a value that I have seen often with disk pools.
     
    #2
  3. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Hi Gea,

    Thanks for your explanation.
    As my server is going to be used for reading large video file, I will add a NVMe for L2Arc cache, to improve sequential data read. Correct me if I'm wrong, but if I use a SATA SSD, my read will be limited by the SSD speed, so around 500MB/s, right?

    Regarding ARC, I understand its importance. I just found very strange that only with files bigger than my RAM size I got slow read performances, as if ARC was never used in this case. I didn't have the "classic" behavior of fast read while it's in cache and then slow as soon as it's not cached anymore.

    For the secure write behavior, I don't really need it as if it crashes while I'm copying a media, the media will be corrupted anyway.
    The only potential case it can be helpful is if the system crash while saving a project, but for this case I prepared a separate pool with sync write activated.

    I don't think I'm in a bad cable/settings/drivers problem case, as I got really good performances (950MB/s in write and 900MB/s in read), with smaller files.
     
    #3
  4. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,248
    Likes Received:
    745
    With enough RAM the improvments of an L2Arc can be quite low as the Arc delivers the data. But in your case you should use an Intel Optane if you want an L2Arc (800p or better).

    You must enable read ahed manually and size of L2Arc should be between 5x and 10x RAM

    btw<
    Its not normal that read performance can go down from 800 MB/s to 100 MB/s without a cable, driver or setting problem. Your pool should be able to deliver 800 MB/s in any case.

    What you can check is iostat of disks during load. All disks should behave similar. If you have one weak disk this can explain this as well.

    If you need a better performance for random reads with large files, use a multi-raid 10 pool.
     
    #4
    Last edited: Nov 23, 2018
  5. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Can you tell me how to enable read ahead manually? I did some search but wasn't able to find a proper answer.

    Since last Friday, I've done a lot of tests and there really is a problem with files larger than RAM size.
    Here is what I've done:
    First, I checked if I have one bad disk which can slow everything: it's not the case, all disk are used the same way when I look at iostat.
    I then tested by modifying my record size to 512K, in this case I got 980MB/s in write and around 230MB/s in read
    With the record size set to 1M, I got 1020MB/s in write and 350MB/s in read.

    I removed 16GB of memory to validate it's linked to it, and effectively, now I got the same slow read performances with a 16GB file (when I used 32GB of memory, I got around 950MB/s in write and 900MB/s in read using a record size of 256K and the same 16GB file).

    I did the test on two different systems, using two different 10Gb cards on my server, although the iperf test didin't revealed any flaw with my cards (tested in both ways, as a client and as a server).
    I ordered a NVMe SSD so I can try with L2ARC, but I'm still finding this very strange.
     
    #5
  6. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,248
    Likes Received:
    745
    The parameter to enable read ahead (add to /etc/system) is
    set zfs:l2arc_noprefetch=0

    Chris's Wiki :: blog/solaris/ZFSL2ARCNoprefetchTunable
    ZFS L2ARC
    A reboot is required after changing values in /etc/system

    In napp-it Pro you can set parameters in tuning sets to compare results
    in Menu System > Appliance Tuning
     
    #6
    Last edited: Nov 28, 2018
  7. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Ok, now it make sense, thanks.
    I will post my test results as soon as I get my NVMe SSD.
     
    #7
  8. carcass

    carcass New Member

    Joined:
    Aug 14, 2017
    Messages:
    9
    Likes Received:
    0
    Try tuning your prefetcher -- start with "echo 67108864 > /sys/module/zfs/parameters/zfetch_max_distance"
     
    #8
  9. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Ok, I got a Sandisk Extrem Pro 1TB NVMe and did some tests, using the AJA disk test with a 64GB file size.
    I properly set zfs:l2arc_noprefetch=0

    Without the prefetcher tuning suggested by Caracass:
    Standalone NVMe drive:
    Either with 128k, 256k, 512k and 1m record size, I got roughly the same bandwidth:
    1016 Mo/s in write and 810 Mo/s in read
    The read performances seemed to be capped by the network, so I have to test some tuning on it to see if I can get the same bandwidth on read and write.

    2x8 10TB HDD in RAID Z2, no L2ARC (write / read):
    128k : 948 / 120
    256k : 950 / 144
    512k : 957 / 188
    1m : 950 / 522

    This time with my NVMe added as L2ARC:
    128k : 931 / 110
    256k : 959 / 135
    512k : 958 / 200
    1m : 918 / 505

    Adding a L2ARC NVMe cache didn't change anything in my case.

    With Carcass tuning suggestion (zfetch_max_distance set to 64MB, in my case 0x4000000, reported by echo "::zfs_params" | mdb -k):
    Without L2ARC
    128k : 950/ 110
    256k : 965 / 145
    512k : 960 / 200
    1m : 970 / 525

    With L2ARC
    128k : 975 / 120
    256k : 970 / 145
    512k : 970 / 205
    1m : 975 / 562

    I got a little more write bandwidth with this setting, but read stay the same. The L2ARC NVMe seemed to have a small effect on read performances, but the different between with and without is close to none.

    Any suggestion?
     
    #9
  10. carcass

    carcass New Member

    Joined:
    Aug 14, 2017
    Messages:
    9
    Likes Received:
    0
    L2ARC would have zero impact on your streaming read perfomance unless data has already been read.
    what is the ashift value for this pool (zdb -C <pool_name> | grep ashift)?
    when you reading that large file what is an actual IO size (zpool iostat -rp 1 in ZoL)?
    maybe you can put here all current zfs tunables along with zfs get all <pool_name>?
     
    #10
  11. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    For L2ARC, that's what I thought.
    ashift value is 12 (I got 4Kn HDD)

    Here is the IO stat results (zpool iostat -v in OmniOSce). Test done without L2ARC, but with zfetch_max_distance set to 64MB.

    with 1Mb record size:
    In write:
    Code:
                                 capacity     operations    bandwidth
    pool                       alloc   free   read  write   read  write
    -------------------------  -----  -----  -----  -----  -----  -----
    hdd2x8z2                   6.61T   139T      0  1.30K      0   888M
      raidz2                   3.31T  69.4T      0    739      0   449M
        c0t5000CCA273C5FF84d0      -      -      0    485      0  70.5M
        c0t5000CCA273CE0A68d0      -      -      0    488      0  71.8M
        c0t5000CCA273CE9FE7d0      -      -      0    486      0  71.9M
        c0t5000CCA273CEB7CDd0      -      -      0    525      0  77.8M
        c0t5000CCA273CED387d0      -      -      0    483      0  71.0M
        c0t5000CCA273CED4B5d0      -      -      0    481      0  71.4M
        c0t5000CCA273CEE080d0      -      -      0    471      0  68.9M
        c0t5000CCA273D1BBB2d0      -      -      0    488      0  71.2M
      raidz2                   3.30T  69.5T      0    588      0   439M
        c0t5000CCA273CF0DBFd0      -      -      0    484      0  74.2M
        c0t5000CCA273CF0DCCd0      -      -      0    483      0  74.4M
        c0t5000CCA273CF381Ed0      -      -      0    471      0  72.2M
        c0t5000CCA273CF4C17d0      -      -      0    477      0  73.7M
        c0t5000CCA273CF9BF4d0      -      -      0    472      0  73.2M
        c0t5000CCA273D07A91d0      -      -      0    488      0  76.0M
        c0t5000CCA273D186AAd0      -      -      0    483      0  75.3M
        c0t5000CCA273D1BB78d0      -      -      0    486      0  75.0M
    -------------------------  -----  -----  -----  -----  -----  -----
    In read:
    Code:
                                  capacity     operations    bandwidth
    pool                       alloc   free   read  write   read  write
    -------------------------  -----  -----  -----  -----  -----  -----
    hdd2x8z2                   6.68T   139T    492      0   492M      0
      raidz2                   3.35T  69.4T    238      0   239M      0
        c0t5000CCA273C5FF84d0      -      -    184      0  30.7M      0
        c0t5000CCA273CE0A68d0      -      -    180      0  30.0M      0
        c0t5000CCA273CE9FE7d0      -      -    184      0  30.8M      0
        c0t5000CCA273CEB7CDd0      -      -    178      0  29.8M      0
        c0t5000CCA273CED387d0      -      -    186      0  31.1M      0
        c0t5000CCA273CED4B5d0      -      -    187      0  31.3M      0
        c0t5000CCA273CEE080d0      -      -    188      0  31.4M      0
        c0t5000CCA273D1BBB2d0      -      -    186      0  31.1M      0
      raidz2                   3.33T  69.4T    253      0   253M      0
        c0t5000CCA273CF0DBFd0      -      -    189      0  31.6M      0
        c0t5000CCA273CF0DCCd0      -      -    191      0  32.0M      0
        c0t5000CCA273CF381Ed0      -      -    193      0  32.2M      0
        c0t5000CCA273CF4C17d0      -      -    195      0  32.6M      0
        c0t5000CCA273CF9BF4d0      -      -    190      0  31.7M      0
        c0t5000CCA273D07A91d0      -      -    189      0  31.6M      0
        c0t5000CCA273D186AAd0      -      -    194      0  32.4M      0
        c0t5000CCA273D1BB78d0      -      -    194      0  32.4M      0
    -------------------------  -----  -----  -----  -----  -----  -----
    Small write operations every 4 to 5 seconds
    Code:
                                  capacity     operations    bandwidth
    pool                       alloc   free   read  write   read  write
    -------------------------  -----  -----  -----  -----  -----  -----
    hdd2x8z2                   6.68T   139T    401    268   402M  2.37M
      raidz2                   3.35T  69.4T    220     94   221M   830K
        c0t5000CCA273C5FF84d0      -      -    162     30  27.1M   237K
        c0t5000CCA273CE0A68d0      -      -    166     26  27.8M   208K
        c0t5000CCA273CE9FE7d0      -      -    162     23  27.0M   191K
        c0t5000CCA273CEB7CDd0      -      -    168     24  28.0M   208K
        c0t5000CCA273CED387d0      -      -    157     27  26.2M   212K
        c0t5000CCA273CED4B5d0      -      -    159     27  26.6M   210K
        c0t5000CCA273CEE080d0      -      -    157     31  26.3M   233K
        c0t5000CCA273D1BBB2d0      -      -    160     28  26.7M   228K
      raidz2                   3.33T  69.4T    181    173   181M  1.56M
        c0t5000CCA273CF0DBFd0      -      -    135     54  22.6M   440K
        c0t5000CCA273CF0DCCd0      -      -    132     50  22.0M   426K
        c0t5000CCA273CF381Ed0      -      -    130     47  21.8M   391K
        c0t5000CCA273CF4C17d0      -      -    129     45  21.6M   387K
        c0t5000CCA273CF9BF4d0      -      -    134     42  22.4M   381K
        c0t5000CCA273D07A91d0      -      -    133     43  22.3M   375K
        c0t5000CCA273D186AAd0      -      -    130     44  21.7M   377K
        c0t5000CCA273D1BB78d0      -      -    130     48  21.7M   408K
    -------------------------  -----  -----  -----  -----  -----  -----


    with 256k record size:
    In write:
    Code:
                                  capacity     operations    bandwidth
    pool                       alloc   free   read  write   read  write
    -------------------------  -----  -----  -----  -----  -----  -----
    hdd2x8z2                   6.63T   139T      0  4.39K      0   989M
      raidz2                   3.32T  69.4T      0  2.27K      0   511M
        c0t5000CCA273C5FF84d0      -      -      0    862      0  87.0M
        c0t5000CCA273CE0A68d0      -      -      0    882      0  86.9M
        c0t5000CCA273CE9FE7d0      -      -      0    881      0  86.9M
        c0t5000CCA273CEB7CDd0      -      -      0    864      0  86.5M
        c0t5000CCA273CED387d0      -      -      0    892      0  87.3M
        c0t5000CCA273CED4B5d0      -      -      0    873      0  87.2M
        c0t5000CCA273CEE080d0      -      -      0    879      0  86.8M
        c0t5000CCA273D1BBB2d0      -      -      0    887      0  86.8M
      raidz2                   3.30T  69.4T      0  2.12K      0   478M
        c0t5000CCA273CF0DBFd0      -      -      0    825      0  81.1M
        c0t5000CCA273CF0DCCd0      -      -      0    802      0  80.9M
        c0t5000CCA273CF381Ed0      -      -      0    809      0  81.4M
        c0t5000CCA273CF4C17d0      -      -      0    813      0  81.6M
        c0t5000CCA273CF9BF4d0      -      -      0    802      0  82.3M
        c0t5000CCA273D07A91d0      -      -      0    803      0  82.2M
        c0t5000CCA273D186AAd0      -      -      0    813      0  82.0M
        c0t5000CCA273D1BB78d0      -      -      0    817      0  81.7M
    -------------------------  -----  -----  -----  -----  -----  -----
    In read:
    Code:
                                  capacity     operations    bandwidth
    pool                       alloc   free   read  write   read  write
    -------------------------  -----  -----  -----  -----  -----  -----
    hdd2x8z2                   6.68T   139T    528      0   132M      0
      raidz2                   3.35T  69.4T    273      0  68.4M      0
        c0t5000CCA273C5FF84d0      -      -    127      0  8.36M      0
        c0t5000CCA273CE0A68d0      -      -    120      0  8.15M      0
        c0t5000CCA273CE9FE7d0      -      -    117      0  8.16M      0
        c0t5000CCA273CEB7CDd0      -      -    117      0  8.15M      0
        c0t5000CCA273CED387d0      -      -    128      0  8.67M      0
        c0t5000CCA273CED4B5d0      -      -    122      0  8.09M      0
        c0t5000CCA273CEE080d0      -      -    120      0  8.07M      0
        c0t5000CCA273D1BBB2d0      -      -    122      0  8.25M      0
      raidz2                   3.33T  69.4T    254      0  63.7M      0
        c0t5000CCA273CF0DBFd0      -      -    123      0  7.95M      0
        c0t5000CCA273CF0DCCd0      -      -    126      0  7.95M      0
        c0t5000CCA273CF381Ed0      -      -    128      0  7.97M      0
        c0t5000CCA273CF4C17d0      -      -    119      0  7.98M      0
        c0t5000CCA273CF9BF4d0      -      -    128      0  8.02M      0
        c0t5000CCA273D07A91d0      -      -    132      0  7.99M      0
        c0t5000CCA273D186AAd0      -      -    123      0  7.96M      0
        c0t5000CCA273D1BB78d0      -      -    119      0  7.99M      0
    -------------------------  -----  -----  -----  -----  -----  -----
    Once every 2 to 3 seconds, their are also small write operations:
    Code:
                                  capacity     operations    bandwidth
    pool                       alloc   free   read  write   read  write
    -------------------------  -----  -----  -----  -----  -----  -----
    hdd2x8z2                   6.68T   139T    555    173   139M   742K
      raidz2                   3.35T  69.4T    277     59  69.3M   254K
        c0t5000CCA273C5FF84d0      -      -    188     18  8.80M   119K
        c0t5000CCA273CE0A68d0      -      -    190     18  8.91M   113K
        c0t5000CCA273CE9FE7d0      -      -    170     16  8.65M   119K
        c0t5000CCA273CEB7CDd0      -      -    185     14  8.84M   103K
        c0t5000CCA273CED387d0      -      -    165     16  8.78M   105K
        c0t5000CCA273CED4B5d0      -      -    174     14  8.89M  91.8K
        c0t5000CCA273CEE080d0      -      -    191     14  8.84M  86.4K
        c0t5000CCA273D1BBB2d0      -      -    185     16  8.79M  94.5K
      raidz2                   3.33T  69.4T    278    114  69.7M   488K
        c0t5000CCA273CF0DBFd0      -      -    148     27  8.82M   197K
        c0t5000CCA273CF0DCCd0      -      -    203     29  9.03M   197K
        c0t5000CCA273CF381Ed0      -      -    188     29  8.86M   200K
        c0t5000CCA273CF4C17d0      -      -    172     24  9.09M   181K
        c0t5000CCA273CF9BF4d0      -      -    192     25  9.09M   184K
        c0t5000CCA273D07A91d0      -      -    192     30  9.06M   189K
        c0t5000CCA273D186AAd0      -      -    189     27  8.99M   178K
        c0t5000CCA273D1BB78d0      -      -    174     29  8.98M   178K
    -------------------------  -----  -----  -----  -----  -----  -----
    Actually I specified only two zfs tunables:
    zfs:l2arc_noprefetch = 0
    zfs:zfetch_max_distance = 0x4000000

    Everything else are default parameters:
    Code:
    arc_lotsfree_percent = 0xa
    arc_pages_pp_reserve = 0x40
    arc_reduce_dnlc_percent = 0x3
    arc_swapfs_reserve = 0x40
    arc_zio_arena_free_shift = 0x2
    dbuf_cache_hiwater_pct = 0xa
    dbuf_cache_lowater_pct = 0xa
    dbuf_cache_max_bytes = 0x3d434780
    mdb: variable dbuf_cache_max_shift not found: unknown symbol name
    ddt_zap_indirect_blockshift = 0xc
    ddt_zap_leaf_blockshift = 0xc
    ditto_same_vdev_distance_shift = 0x3
    dmu_find_threads = 0x0
    dmu_rescan_dnode_threshold = 0x20000
    dsl_scan_delay_completion = 0x0
    fzap_default_block_shift = 0xe
    l2arc_feed_again = 0x1
    l2arc_feed_min_ms = 0xc8
    l2arc_feed_secs = 0x1
    l2arc_headroom = 0x2
    l2arc_headroom_boost = 0xc8
    l2arc_noprefetch = 0x0
    l2arc_norw = 0x1
    l2arc_write_boost = 0x800000
    l2arc_write_max = 0x800000
    metaslab_aliquot = 0x80000
    metaslab_bias_enabled = 0x1
    metaslab_debug_load = 0x0
    metaslab_debug_unload = 0x0
    metaslab_df_alloc_threshold = 0x20000
    metaslab_df_free_pct = 0x4
    metaslab_fragmentation_factor_enabled = 0x1
    metaslab_force_ganging = 0x1000001
    metaslab_lba_weighting_enabled = 0x1
    metaslab_load_pct = 0x32
    metaslab_min_alloc_size = 0x2000000
    metaslab_ndf_clump_shift = 0x4
    metaslab_preload_enabled = 0x1
    metaslab_preload_limit = 0x3
    metaslab_trace_enabled = 0x1
    metaslab_trace_max_entries = 0x1388
    metaslab_unload_delay = 0x8
    mdb: variable metaslabs_per_vdev not found: unknown symbol name
    mdb: variable reference_history not found: unknown symbol name
    mdb: variable reference_tracking_enable not found: unknown symbol name
    send_holes_without_birth_time = 0x1
    spa_asize_inflation = 0x18
    spa_load_verify_data = 0x1
    spa_load_verify_maxinflight = 0x2710
    spa_load_verify_metadata = 0x1
    spa_max_replication_override = 0x3
    spa_min_slop = 0x8000000
    spa_mode_global = 0x3
    spa_slop_shift = 0x5
    mdb: variable space_map_blksz not found: unknown symbol name
    vdev_mirror_shift = 0x15
    zfetch_max_distance = 0x4000000
    zfs_abd_chunk_size = 0x1000
    zfs_abd_scatter_enabled = 0x1
    zfs_arc_average_blocksize = 0x2000
    zfs_arc_evict_batch_limit = 0xa
    zfs_arc_grow_retry = 0x0
    zfs_arc_max = 0x0
    zfs_arc_meta_limit = 0x0
    zfs_arc_meta_min = 0x0
    zfs_arc_min = 0x0
    zfs_arc_p_min_shift = 0x0
    zfs_arc_shrink_shift = 0x0
    zfs_async_block_max_blocks = 0xffffffffffffffff
    zfs_ccw_retry_interval = 0x12c
    zfs_commit_timeout_pct = 0x5
    zfs_compressed_arc_enabled = 0x1
    zfs_condense_indirect_commit_entry_delay_ticks = 0x0
    zfs_condense_indirect_vdevs_enable = 0x1
    zfs_condense_max_obsolete_bytes = 0x40000000
    zfs_condense_min_mapping_bytes = 0x20000
    zfs_condense_pct = 0xc8
    zfs_dbgmsg_maxsize = 0x400000
    zfs_deadman_checktime_ms = 0x1388
    zfs_deadman_enabled = 0x1
    zfs_deadman_synctime_ms = 0xf4240
    zfs_dedup_prefetch = 0x1
    zfs_default_bs = 0x9
    zfs_default_ibs = 0x11
    zfs_delay_max_ns = 0x5f5e100
    zfs_delay_min_dirty_percent = 0x3c
    zfs_delay_resolution_ns = 0x186a0
    zfs_delay_scale = 0x7a120
    zfs_dirty_data_max = 0xcc0a7e66
    zfs_dirty_data_max_max = 0x100000000
    zfs_dirty_data_max_percent = 0xa
    mdb: variable zfs_dirty_data_sync not found: unknown symbol name
    zfs_flags = 0x0
    zfs_free_bpobj_enabled = 0x1
    zfs_free_leak_on_eio = 0x0
    zfs_free_min_time_ms = 0x3e8
    zfs_fsync_sync_cnt = 0x4
    zfs_immediate_write_sz = 0x8000
    zfs_indirect_condense_obsolete_pct = 0x19
    zfs_lua_check_instrlimit_interval = 0x64
    zfs_lua_max_instrlimit = 0x5f5e100
    zfs_lua_max_memlimit = 0x6400000
    zfs_max_recordsize = 0x100000
    zfs_mdcomp_disable = 0x0
    zfs_metaslab_condense_block_threshold = 0x4
    zfs_metaslab_fragmentation_threshold = 0x46
    zfs_metaslab_segment_weight_enabled = 0x1
    zfs_metaslab_switch_threshold = 0x2
    zfs_mg_fragmentation_threshold = 0x55
    zfs_mg_noalloc_threshold = 0x0
    zfs_multilist_num_sublists = 0x0
    zfs_no_scrub_io = 0x0
    zfs_no_scrub_prefetch = 0x0
    zfs_nocacheflush = 0x0
    zfs_nopwrite_enabled = 0x1
    zfs_object_remap_one_indirect_delay_ticks = 0x0
    zfs_obsolete_min_time_ms = 0x1f4
    zfs_pd_bytes_max = 0x3200000
    zfs_per_txg_dirty_frees_percent = 0x1e
    zfs_prefetch_disable = 0x0
    zfs_read_chunk_size = 0x100000
    zfs_recover = 0x0
    zfs_recv_queue_length = 0x1000000
    zfs_redundant_metadata_most_ditto_level = 0x2
    zfs_remap_blkptr_enable = 0x1
    zfs_remove_max_copy_bytes = 0x4000000
    zfs_remove_max_segment = 0x100000
    zfs_resilver_delay = 0x2
    zfs_resilver_min_time_ms = 0xbb8
    zfs_scan_idle = 0x32
    zfs_scan_min_time_ms = 0x3e8
    zfs_scrub_delay = 0x4
    zfs_scrub_limit = 0xa
    zfs_send_corrupt_data = 0x0
    zfs_send_queue_length = 0x1000000
    zfs_send_set_freerecords_bit = 0x1
    zfs_sync_pass_deferred_free = 0x2
    zfs_sync_pass_dont_compress = 0x5
    zfs_sync_pass_rewrite = 0x2
    zfs_sync_taskq_batch_pct = 0x4b
    zfs_top_maxinflight = 0x20
    zfs_txg_timeout = 0x5
    zfs_vdev_aggregation_limit = 0x20000
    zfs_vdev_async_read_max_active = 0x3
    zfs_vdev_async_read_min_active = 0x1
    zfs_vdev_async_write_active_max_dirty_percent = 0x3c
    zfs_vdev_async_write_active_min_dirty_percent = 0x1e
    zfs_vdev_async_write_max_active = 0xa
    zfs_vdev_async_write_min_active = 0x1
    zfs_vdev_cache_bshift = 0x10
    zfs_vdev_cache_max = 0x4000
    zfs_vdev_cache_size = 0x0
    zfs_vdev_max_active = 0x3e8
    zfs_vdev_queue_depth_pct = 0x3e8
    zfs_vdev_read_gap_limit = 0x8000
    zfs_vdev_removal_max_active = 0x2
    zfs_vdev_removal_min_active = 0x1
    zfs_vdev_scrub_max_active = 0x2
    zfs_vdev_scrub_min_active = 0x1
    zfs_vdev_sync_read_max_active = 0xa
    zfs_vdev_sync_read_min_active = 0xa
    zfs_vdev_sync_write_max_active = 0xa
    zfs_vdev_sync_write_min_active = 0xa
    zfs_vdev_write_gap_limit = 0x1000
    zfs_write_implies_delete_child = 0x1
    zfs_zil_clean_taskq_maxalloc = 0x100000
    zfs_zil_clean_taskq_minalloc = 0x400
    zfs_zil_clean_taskq_nthr_pct = 0x64
    zil_replay_disable = 0x0
    zil_slog_bulk = 0xc0000
    zio_buf_debug_limit = 0x0
    zio_dva_throttle_enabled = 0x1
    zio_injection_enabled = 0x0
    zvol_immediate_write_sz = 0x8000
    zvol_maxphys = 0x1000000
    zvol_unmap_enabled = 0x1
    zvol_unmap_sync_enabled = 0x0
    zfs_max_dataset_nesting = 0x32

    Her is the output of zfs get all mypool
    Code:
    NAME      PROPERTY              VALUE                  SOURCE
    hdd2x8z2  type                  filesystem             -
    hdd2x8z2  creation              Wed Nov 28 17:26 2018  -
    hdd2x8z2  used                  4.69T                  -
    hdd2x8z2  available             95.5T                  -
    hdd2x8z2  referenced            230K                   -
    hdd2x8z2  compressratio         1.00x                  -
    hdd2x8z2  mounted               yes                    -
    hdd2x8z2  quota                 none                   default
    hdd2x8z2  reservation           none                   default
    hdd2x8z2  recordsize            1M                     local
    hdd2x8z2  mountpoint            /hdd2x8z2              default
    hdd2x8z2  sharenfs              off                    default
    hdd2x8z2  checksum              on                     default
    hdd2x8z2  compression           off                    default
    hdd2x8z2  atime                 on                     default
    hdd2x8z2  devices               on                     default
    hdd2x8z2  exec                  on                     default
    hdd2x8z2  setuid                on                     default
    hdd2x8z2  readonly              off                    default
    hdd2x8z2  zoned                 off                    default
    hdd2x8z2  snapdir               hidden                 default
    hdd2x8z2  aclmode               passthrough            local
    hdd2x8z2  aclinherit            passthrough            local
    hdd2x8z2  createtxg             1                      -
    hdd2x8z2  canmount              on                     default
    hdd2x8z2  xattr                 on                     default
    hdd2x8z2  copies                1                      default
    hdd2x8z2  version               5                      -
    hdd2x8z2  utf8only              off                    -
    hdd2x8z2  normalization         none                   -
    hdd2x8z2  casesensitivity       sensitive              -
    hdd2x8z2  vscan                 off                    default
    hdd2x8z2  nbmand                off                    default
    hdd2x8z2  sharesmb              off                    default
    hdd2x8z2  refquota              none                   default
    hdd2x8z2  refreservation        none                   default
    hdd2x8z2  guid                  13382713124067928909   -
    hdd2x8z2  primarycache          all                    default
    hdd2x8z2  secondarycache        all                    default
    hdd2x8z2  usedbysnapshots       0                      -
    hdd2x8z2  usedbydataset         230K                   -
    hdd2x8z2  usedbychildren        4.69T                  -
    hdd2x8z2  usedbyrefreservation  0                      -
    hdd2x8z2  logbias               latency                default
    hdd2x8z2  dedup                 off                    default
    hdd2x8z2  mlslabel              none                   default
    hdd2x8z2  sync                  disabled               local
    hdd2x8z2  refcompressratio      1.00x                  -
    hdd2x8z2  written               230K                   -
    hdd2x8z2  logicalused           4.94T                  -
    hdd2x8z2  logicalreferenced     45K                    -
    hdd2x8z2  filesystem_limit      none                   default
    hdd2x8z2  snapshot_limit        none                   default
    hdd2x8z2  filesystem_count      none                   default
    hdd2x8z2  snapshot_count        none                   default
    hdd2x8z2  redundant_metadata    all                    default
     
    #11
    Last edited: Dec 17, 2018
  12. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    And here is the output of zfs get all mypool/myzfsvolume (which is on of my volumes I used for these tests)
    Code:
    NAME            PROPERTY              VALUE                  SOURCE
    hdd2x8z2/test2  type                  filesystem             -
    hdd2x8z2/test2  creation              Fri Dec 14 16:14 2018  -
    hdd2x8z2/test2  used                  188K                   -
    hdd2x8z2/test2  available             95.5T                  -
    hdd2x8z2/test2  referenced            188K                   -
    hdd2x8z2/test2  compressratio         1.00x                  -
    hdd2x8z2/test2  mounted               yes                    -
    hdd2x8z2/test2  quota                 none                   default
    hdd2x8z2/test2  reservation           none                   default
    hdd2x8z2/test2  recordsize            1M                     local
    hdd2x8z2/test2  mountpoint            /hdd2x8z2/test2        default
    hdd2x8z2/test2  sharenfs              off                    default
    hdd2x8z2/test2  checksum              on                     default
    hdd2x8z2/test2  compression           off                    default
    hdd2x8z2/test2  atime                 off                    local
    hdd2x8z2/test2  devices               on                     default
    hdd2x8z2/test2  exec                  on                     default
    hdd2x8z2/test2  setuid                on                     default
    hdd2x8z2/test2  readonly              off                    default
    hdd2x8z2/test2  zoned                 off                    default
    hdd2x8z2/test2  snapdir               hidden                 local
    hdd2x8z2/test2  aclmode               passthrough            local
    hdd2x8z2/test2  aclinherit            passthrough            local
    hdd2x8z2/test2  createtxg             27546                  -
    hdd2x8z2/test2  canmount              on                     default
    hdd2x8z2/test2  xattr                 on                     default
    hdd2x8z2/test2  copies                1                      default
    hdd2x8z2/test2  version               5                      -
    hdd2x8z2/test2  utf8only              on                     -
    hdd2x8z2/test2  normalization         formD                  -
    hdd2x8z2/test2  casesensitivity       insensitive            -
    hdd2x8z2/test2  vscan                 off                    default
    hdd2x8z2/test2  nbmand                on                     local
    hdd2x8z2/test2  sharesmb              name=test2             local
    hdd2x8z2/test2  refquota              none                   default
    hdd2x8z2/test2  refreservation        none                   default
    hdd2x8z2/test2  guid                  10297845018154907042   -
    hdd2x8z2/test2  primarycache          all                    default
    hdd2x8z2/test2  secondarycache        all                    default
    hdd2x8z2/test2  usedbysnapshots       0                      -
    hdd2x8z2/test2  usedbydataset         188K                   -
    hdd2x8z2/test2  usedbychildren        0                      -
    hdd2x8z2/test2  usedbyrefreservation  0                      -
    hdd2x8z2/test2  logbias               latency                default
    hdd2x8z2/test2  dedup                 off                    default
    hdd2x8z2/test2  mlslabel              none                   default
    hdd2x8z2/test2  sync                  disabled               inherited from hdd2x8z2
    hdd2x8z2/test2  refcompressratio      1.00x                  -
    hdd2x8z2/test2  written               188K                   -
    hdd2x8z2/test2  logicalused           36.5K                  -
    hdd2x8z2/test2  logicalreferenced     36.5K                  -
    hdd2x8z2/test2  filesystem_limit      none                   default
    hdd2x8z2/test2  snapshot_limit        none                   default
    hdd2x8z2/test2  filesystem_count      none                   default
    hdd2x8z2/test2  snapshot_count        none                   default
    hdd2x8z2/test2  redundant_metadata    all                    default
     
    #12
  13. carcass

    carcass New Member

    Joined:
    Aug 14, 2017
    Messages:
    9
    Likes Received:
    0
    Are you getting the same result when reading that file locally (with dd) with empty ARC?
     
    #13
  14. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Internally with dd, I have differences but not this big.
    here is what I got for 256k record size:
    write:
    time sh -c "dd if=/dev/zero of=/hdd2x8z2/test2/dd-256k-256 bs=256k count=160000"
    160000+0 records in
    160000+0 records out
    41943040000 bytes transferred in 36.283510 secs (1155980780 bytes/sec)

    read
    time sh -c "dd if=/hdd2x8z2/test2/dd-256k-256 of=/dev/zero bs=256k"
    160000+0 records in
    160000+0 records out
    41943040000 bytes transferred in 63.418068 secs (661373669 bytes/sec)

    And for 1M record size:
    write
    time sh -c "dd if=/dev/zero of=/hdd2x8z2/test2/dd-1m bs=1M count=40000"
    40000+0 records in
    40000+0 records out
    41943040000 bytes transferred in 31.749132 secs (1321076742 bytes/sec)

    read
    time sh -c "dd if=/hdd2x8z2/test2/dd-1m of=/dev/zero bs=1M"
    40000+0 records in
    40000+0 records out
    41943040000 bytes transferred in 38.201376 secs (1097945805 bytes/sec)
     
    #14
  15. carcass

    carcass New Member

    Joined:
    Aug 14, 2017
    Messages:
    9
    Likes Received:
    0
    So assuming that you did clear ARC between "write" and "read" you've got 660MB/s read with 256k recordsize and 1.1GB/s read with 1M recordsize?
     
    #15
  16. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Yes, that's right. As I don't know how to properly clear the ARC, I did a reboot to be sure before doing each read test.
     
    #16
  17. carcass

    carcass New Member

    Joined:
    Aug 14, 2017
    Messages:
    9
    Likes Received:
    0
    #17
  18. RonanR

    RonanR Member

    Joined:
    Jul 27, 2018
    Messages:
    30
    Likes Received:
    0
    Thanks for your time. I already tried to follow Gea's guide and applied network tuning parameters, without success. I will play with network parameters on both y server and my client and also try with another 10g network card.
     
    #18
Similar Threads: Slow read
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it How slow is ZFS with low RAM or readcache disabled and slow disks? Nov 24, 2016
Solaris, Nexenta, OpenIndiana, and napp-it SMB reads are slow on OpenIndiana + NAPP-IT Mar 22, 2012
Solaris, Nexenta, OpenIndiana, and napp-it omnios+nappit 10gb performance: iperf fast, zfs-send slow Jun 17, 2019
Solaris, Nexenta, OpenIndiana, and napp-it Solaris network slow (vmxnet3) Nov 21, 2018
Solaris, Nexenta, OpenIndiana, and napp-it slow network speed issue vmxnet3 ? Aug 9, 2017

Share This Page