HowTo Improve ZFS sync writes with old junk

Discussion in 'Guides' started by rootgremlin, Apr 3, 2018.

  1. rootgremlin

    rootgremlin New Member

    Jun 9, 2016
    Likes Received:
    so i got myself a new 2TB s840 STEC SAS SSD, Z16IZF2E-2TBUCZ to use as vm-datastore for multiple esxi hosts.
    The Storage-Server is an AllinOne ESXi host, whose Specs are....

    • Mainboard: X9SRL-F
    • CPU: Xeon E5-1620v2
    • RAM: 96GB
    • Chassis: SM836
    • Controller: SAS2008, P410i with 1GB BBWC
    • Backplane: SAS1-EL1
    • Disks: 12x WD60EFRX, 6TB SATA in RaidZ2
    • 2x HUH721212AL4200, 12TB SAS as Mirror
    • 1x STEC S840 Z16IZF2E-2TBUCZ, 2TB SAS as single disk
    • 1x ST2000LM003 HN-M, 2TB SATA as single backup disk
    • OS: Solaris 11.3 with Per-FS Encryption on most of the Filesystems

    the Solaris VM has both controllers passed through and 72GB (locked) RAM
    I NFS-exported the single ZFS-SSD-Disk to the ESXi and moved the Test win7 VM
    to that Datastore.The Backplane is connected to one port of the SAS2008, the second port had the s840 SSD.
    Behind the p410i, the SSD is configured as a single raid0 and the 1GB BBWC configured as 100% writecache, 0% readcache.

    with sync=standard on "uncached" HBA

    with sync=standard on P410i with Cache

    Between 20% and 50% SYNC Writespeed Improvement with something that would get tossed in the trash !!!! Even the latency got reduced!

    Just for comparison, the Speed
    with sync=disabled on P410i with Cache
    Last edited: Apr 3, 2018
    SlickNetAaron likes this.
  2. i386

    i386 Well-Known Member

    Mar 18, 2016
    Likes Received:
    Thats a raid Controller right?
  3. rootgremlin

    rootgremlin New Member

    Jun 9, 2016
    Likes Received:
    yes, in my case its the "HP Smart Array P410i RAID Controller" with an AddOn 1GB Memory Cache, backed up by a Battery /Capacitor Module.

    I would say, if you have any kind of RAID Controller with onboard Cache lying around, you could at least try to use it and evaluate its usecase.

    If you think about it, its basically a 1GB ZEUS-RAM in front of your SATA/SAS SSD
  4. Rand__

    Rand__ Well-Known Member

    Mar 6, 2014
    Likes Received:
    Not a new idea but always entertaining. There are some discussions about this in the FN forum. I think the consensus was don't do it but I can't remember whether it was a profound reason or the usual FreeNas Forum 'if you don't do it our way its no good' issue.
    O/C its at your own risk but everything is :)
  5. rootgremlin

    rootgremlin New Member

    Jun 9, 2016
    Likes Received:
    These Controller have proven their stability and driverquality, so no concerne there.
    The whole "always use an streight HBA for ZFS" is only because of the hotplug / one raid0 per disk thing, and the disability for ZFS to be sure, that sync writes are actually committed to stable storage. Because of this the "myth" arose, that you should only use HBAs with ZFS.

    But RAID-Controller could also work, as long as they are ensuring sync to stable storage and could present every single disk to ZFS. But this is more an academic "discussion" since no one would use an expensive RAID-Controller when they could use a simple, cheap HBA.

    But after all, for me, i already had the controller lying around, collecting dust.

    Also, since i can not use hotpluggable NVME-Disks in my Chassis, this was just another option. In Fact, all those shiny new PCIe SSDs are also not really hotpluggable.

    I Mean i could live with the performance of the pure STEC S840 SAS SSD.

    But where's the fun in that.
  6. Joel

    Joel Active Member

    Jan 30, 2015
    Likes Received:
    From what I recall, the typical arguement against using RAID cards from the FreeNAS forums were centered around two issues:

    • RAID controllers don't pass SMART info through to the OS, so you can't check drive health
    • Some RAID controllers obfuscate the file system enough to where you can't use the drive elsewhere if the RAID card dies. You'd need another identical RAID card to even see if the underlying data was even still there.
    Both of those reasons are why ZFS folks tend to reflash HBAs to IT mode since that passes through the drives to the host OS unmolested.
    T_Minus likes this.
  7. T_Minus

    T_Minus Moderator

    Feb 15, 2015
    Likes Received:
    Yes there are threads here on STH about this too.
  8. rootgremlin

    rootgremlin New Member

    Jun 9, 2016
    Likes Received:
    Those 2 points were actually the reason why i began evaluating this.

    For the HP P410i, the full smartinfo could be accessed with smartmontools with the --device=cciss,DISKNUM parameter. See the

    Linux: smartctl --help:
    Linux: -d TYPE, --device=TYPE
             Specify device type to one of: ata, scsi, nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test
    OSX: -d TYPE, --device=TYPE
             Specify device type to one of: ata, scsi, nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, auto, test
    Unfortunately not on Solaris and not BSD. smartctl needs kernel driversupport for this.

    Also the S840 has a non-standard smart table which to its full extent is only accessible with a custom compiled smart-package.

    BUT!!! SAS drives do make automatic sheduled self-tests and the RAID-Controller freaks out if smartvalues of a drive are not ok.

    and the temperature is still reported through the hpssacli utility (which you need to configure the array anyway for the cacheratio setting)

    root@ZFS01:~# . /opt/hp/sbin/hpssacli  ctrl all show config detail
    Smart Array P410 in Slot 3
       Bus Interface: PCI
       Slot: 3
       Serial Number: SMCDEADBEEF66
       Cache Serial Number: CSACDEADBEEF00
       RAID 6 (ADG) Status: Disabled
       Controller Status: OK
       Hardware Revision: C
       Firmware Version: 6.64
       Rebuild Priority: Medium
       Expand Priority: Medium
       Surface Scan Delay: 3 secs
       Surface Scan Mode: Idle
       Parallel Surface Scan Supported: No
       Queue Depth: Automatic
       Monitor and Performance Delay: 60  min
       Elevator Sort: Enabled
       Degraded Performance Optimization: Disabled
       Inconsistency Repair Policy: Disabled
       Wait for Cache Room: Disabled
       Surface Analysis Inconsistency Notification: Disabled
       Post Prompt Timeout: 15 secs
       Cache Board Present: True
       Cache Status: OK
       Cache Ratio: 0% Read / 100% Write
       Drive Write Cache: Enabled
       Total Cache Size: 1024 MB
       Total Cache Memory Available: 912 MB
       No-Battery Write Cache: Disabled
       Cache Backup Power Source: Capacitors
       Battery/Capacitor Count: 1
       Battery/Capacitor Status: OK
       SATA NCQ Supported: True
       Number of Ports: 2 Internal only
       Driver Name: cpqary3 For Interface cciss
       Driver Supports HP SSD Smart Path: False
       PCI Address (Domain:Bus: Device.Function): 0000:13:00.0
       Host Serial Number: DEADBEEF
       Array: A
          Interface Type: Solid State SAS
          Unused Space: 0  MB
          Status: OK
          Array Type: Data
          Logical Drive: 1
             Size: 1.8 TB
             Fault Tolerance: 0
             Heads: 255
             Sectors Per Track: 32
             Cylinders: 65535
             Strip Size: 256 KB
             Full Stripe Size: 256 KB
             Status: OK
             Caching:  Enabled
             Disk Name: /dev/dsk/c11t0d0
             Mount Points: /bigfish0
             OS Status: LOCKED
             Logical Drive Label: DEADBEEFDEADBEEFDEADBEEF
             Drive Type: Data
             LD Acceleration Method: Controller Cache
          physicaldrive 1I:0:1
             Port: 1I
             Box: 0
             Bay: 1
             Status: OK
             Drive Type: Data Drive
             Interface Type: Solid State SAS
             Size: 2 TB
             Native Block Size: 512
             Firmware Revision: C23F
             Serial Number: STM000202C85
             Model: STEC    Z16IZF2E-2TBUCZ
             Current Temperature (C): 40
             SSD Smart Trip Wearout: Not Supported
             PHY Count: 2
             PHY Transfer Rate: 6.0Gbps, Unknown
       SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250
          Device Number: 250
          Firmware Version: RevC
          WWID: 5002549123E4348F
          Vendor ID: PMCSIERA
          Model: SRC 8x6G

    After ZFS-formatting the drive behind the Raidcontroller and moving it to the SAS2008 HBA, i could still use it with all data intact.
    Unfortunately the way back, from HBA -> RAID destroyed the partitiontable and all FS-Infos, because of the RAID initialization.
  9. Aluminum

    Aluminum Active Member

    Sep 7, 2012
    Likes Received:
    $55 for a 32GB optane m2 and $5 for a pcie converter card is another way to speed the hell out of zfs sync. No battery/plp or raid card shenanigans to worry about.
  10. rootgremlin

    rootgremlin New Member

    Jun 9, 2016
    Likes Received:
    Still, you can't beat ZERO$ for using "old junk".
    On the plus side:
    • The RAIDController has real PLP, the Optane m2 has no PowerLossProtection (albeit possibly not needed because of the 3D-XPoint technology)
    • The RAM-Cache is orders of magnitude Faster than SSDs (nanoseconds for RAM vs. microseconds on SSD)
    • You could use the Full PCIe Bandwith of up to 8x PCIe3.0 vs. the limited (2x PCIe 3D-Xpoint Interface)

    My focus for this post was to "Leverage things you may have lying around"

    Granted, i would really like to compare this setup against one of those Optane 800p Modules. (just for shits and giggles)
    But as I said, i was already satisfied with the raw (uncached) Z840 SSD performance, and i'm not dumping one more € in this Setup to "improve" speed
  11. _alex

    _alex Active Member

    Jan 28, 2016
    Likes Received:
    in i remember right Areca 1882/1883 can enable writeback-cache on a passthrough disk - so no need for raid-0 with them. have some laying around with bbu/caps nand for the 1883 but not found the time to benchmark the cache with zfs yet. would be interesting how it looks with two cached drives in a mirror.

Share This Page