HowTo Improve ZFS sync writes with old junk

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

rootgremlin

Member
Jun 9, 2016
42
14
8
so i got myself a new 2TB s840 STEC SAS SSD, Z16IZF2E-2TBUCZ to use as vm-datastore for multiple esxi hosts.
The Storage-Server is an AllinOne ESXi host, whose Specs are....

  • Mainboard: X9SRL-F
  • CPU: Xeon E5-1620v2
  • RAM: 96GB
  • Chassis: SM836
  • Controller: SAS2008, P410i with 1GB BBWC
  • Backplane: SAS1-EL1
  • Disks: 12x WD60EFRX, 6TB SATA in RaidZ2
  • 2x HUH721212AL4200, 12TB SAS as Mirror
  • 1x STEC S840 Z16IZF2E-2TBUCZ, 2TB SAS as single disk
  • 1x ST2000LM003 HN-M, 2TB SATA as single backup disk
  • OS: Solaris 11.3 with Per-FS Encryption on most of the Filesystems

the Solaris VM has both controllers passed through and 72GB (locked) RAM
I NFS-exported the single ZFS-SSD-Disk to the ESXi and moved the Test win7 VM
to that Datastore.The Backplane is connected to one port of the SAS2008, the second port had the s840 SSD.
Behind the p410i, the SSD is configured as a single raid0 and the 1GB BBWC configured as 100% writecache, 0% readcache.


with sync=standard on "uncached" HBA


with sync=standard on P410i with Cache


Between 20% and 50% SYNC Writespeed Improvement with something that would get tossed in the trash !!!! Even the latency got reduced!


Just for comparison, the Speed
with sync=disabled on P410i with Cache
 
Last edited:
  • Like
Reactions: SlickNetAaron

rootgremlin

Member
Jun 9, 2016
42
14
8
Thats a raid Controller right?
yes, in my case its the "HP Smart Array P410i RAID Controller" with an AddOn 1GB Memory Cache, backed up by a Battery /Capacitor Module.

I would say, if you have any kind of RAID Controller with onboard Cache lying around, you could at least try to use it and evaluate its usecase.

If you think about it, its basically a 1GB ZEUS-RAM in front of your SATA/SAS SSD
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Not a new idea but always entertaining. There are some discussions about this in the FN forum. I think the consensus was don't do it but I can't remember whether it was a profound reason or the usual FreeNas Forum 'if you don't do it our way its no good' issue.
O/C its at your own risk but everything is :)
 

rootgremlin

Member
Jun 9, 2016
42
14
8
These Controller have proven their stability and driverquality, so no concerne there.
The whole "always use an streight HBA for ZFS" is only because of the hotplug / one raid0 per disk thing, and the disability for ZFS to be sure, that sync writes are actually committed to stable storage. Because of this the "myth" arose, that you should only use HBAs with ZFS.

But RAID-Controller could also work, as long as they are ensuring sync to stable storage and could present every single disk to ZFS. But this is more an academic "discussion" since no one would use an expensive RAID-Controller when they could use a simple, cheap HBA.

But after all, for me, i already had the controller lying around, collecting dust.

Also, since i can not use hotpluggable NVME-Disks in my Chassis, this was just another option. In Fact, all those shiny new PCIe SSDs are also not really hotpluggable.

I Mean i could live with the performance of the pure STEC S840 SAS SSD.

But where's the fun in that.
 

Joel

Active Member
Jan 30, 2015
856
199
43
42
Not a new idea but always entertaining. There are some discussions about this in the FN forum. I think the consensus was don't do it but I can't remember whether it was a profound reason or the usual FreeNas Forum 'if you don't do it our way its no good' issue.
O/C its at your own risk but everything is :)
From what I recall, the typical arguement against using RAID cards from the FreeNAS forums were centered around two issues:

  • RAID controllers don't pass SMART info through to the OS, so you can't check drive health
  • Some RAID controllers obfuscate the file system enough to where you can't use the drive elsewhere if the RAID card dies. You'd need another identical RAID card to even see if the underlying data was even still there.
Both of those reasons are why ZFS folks tend to reflash HBAs to IT mode since that passes through the drives to the host OS unmolested.
 
  • Like
Reactions: T_Minus

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,640
2,058
113
From what I recall, the typical arguement against using RAID cards from the FreeNAS forums were centered around two issues:

  • RAID controllers don't pass SMART info through to the OS, so you can't check drive health
  • Some RAID controllers obfuscate the file system enough to where you can't use the drive elsewhere if the RAID card dies. You'd need another identical RAID card to even see if the underlying data was even still there.
Both of those reasons are why ZFS folks tend to reflash HBAs to IT mode since that passes through the drives to the host OS unmolested.
Yes there are threads here on STH about this too.
 

rootgremlin

Member
Jun 9, 2016
42
14
8
From what I recall, the typical arguement against using RAID cards from the FreeNAS forums were centered around two issues:

  • RAID controllers don't pass SMART info through to the OS, so you can't check drive health
  • Some RAID controllers obfuscate the file system enough to where you can't use the drive elsewhere if the RAID card dies. You'd need another identical RAID card to even see if the underlying data was even still there.
Both of those reasons are why ZFS folks tend to reflash HBAs to IT mode since that passes through the drives to the host OS unmolested.
Those 2 points were actually the reason why i began evaluating this.

For the HP P410i, the full smartinfo could be accessed with smartmontools with the --device=cciss,DISKNUM parameter. See the

Linux: smartctl --help:
Code:
Linux: -d TYPE, --device=TYPE
         Specify device type to one of: ata, scsi, nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test
Code:
OSX: -d TYPE, --device=TYPE
         Specify device type to one of: ata, scsi, nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, auto, test
Unfortunately not on Solaris and not BSD. smartctl needs kernel driversupport for this.

Also the S840 has a non-standard smart table which to its full extent is only accessible with a custom compiled smart-package.

See:
hello,

good news coming - i have a working recipe howto repair drives and therefore fix the mentioned speed degradation.

1) you need to download some files from Index of /public/hgst_2tb/util

E4Z1.G4-SK04-72R35-YN-v4.0.4.RC11-b1975 - newest firmware
hdm-core_3.4.0-8.ga_amd64.deb - utility for loading firmware
sdmcmd64.2.0.0.124.tar.gz - utility for STEC drives
smartmontools-6.0-stec.tar.gz - modified smartmontools which can read extended STEC info

2) update drive firmware (maybe not necessary, but wise) - scan is needed, without scan hdm can't update firmware

Code:
hdm scan

[5000A72030097FE9]
  Device Type         = SCSI Device
  Device Path         = /dev/sdd
  UID                 = 5000A72030097FE9
  Alias               = @scsi0
  Vendor Name         = STEC
  Model Name          = Z16IZF2E-2TBUCZ

hdm manage-firmware --load --activate --file E4Z1.G4-SK04-72R35-YN-v4.0.4.RC11-b1975 --path /dev/sdd
Results for manage-firmware: Operation succeeded.
3) use sdmcdm64 utility and scan for available drives

Code:
./sdmcmd64 scanLocal
Results for ScanLocal
                       operationResult = Success
                         devices.count = 7
                            devices[0] = other:Drive0
                            devices[1] = gen4sas:Drive6
                            devices[2] = gen4sas:Drive5
                            devices[3] = gen4sas:Drive4
                            devices[4] = gen4sas:Drive3
                            devices[5] = other:Drive2
                            devices[6] = other:Drive1
4) use sdmcdm64 utilty to clear SMART error

Code:
./sdmcmd64 ClearSmartAlerts target=gen4sas:Drive3
Results for ClearSmartAlerts
                       operationResult = Success
                                target = gen4sas:Drive3
5) use sdmcdm64 utility to format the drive. it's needed to do format twice - after first format the drive reports zero size (0 sectors capacity),
but after second format everything will be ok and drive gets his 2TB capacity back.
you can see how long takes first format - 1 minute and 42 seconds and how long takes the second format - only 4 seconds..

Code:
time ./sdmcmd64 Format target=gen4sas:Drive3 sectorSize=512 difLevel=None
Results for Format
                       operationResult = Success
                                target = gen4sas:Drive3

real    1m42.699s
user    0m0.000s
sys     0m0.005s
time ./sdmcmd64 Format target=gen4sas:Drive3 sectorSize=512 difLevel=None
Results for Format
                       operationResult = Success
                                target = gen4sas:Drive3

real    0m4.739s
user    0m0.003s
sys     0m0.002s
6) check the formatted drive, it should look like following

Code:
./sdmcmd64 GetDriveSize target=gen4sas:Drive3
Results for GetDriveSize
                       operationResult = Success
                                target = gen4sas:Drive3
                            hostBlocks = 3907029168
                            userBlocks = 6821 0x1aa5
Now the drive is back in good state, no speed probles, no errors.

That's all. So easy it is :)
Even if it now looks easy and simple, the way to this recipe was relativeluy long and time consuming..

Jan
BUT!!! SAS drives do make automatic sheduled self-tests and the RAID-Controller freaks out if smartvalues of a drive are not ok.

and the temperature is still reported through the hpssacli utility (which you need to configure the array anyway for the cacheratio setting)

Code:
root@ZFS01:~# . /opt/hp/sbin/hpssacli  ctrl all show config detail

Smart Array P410 in Slot 3
   Bus Interface: PCI
   Slot: 3
   Serial Number: SMCDEADBEEF66
   Cache Serial Number: CSACDEADBEEF00
   RAID 6 (ADG) Status: Disabled
   Controller Status: OK
   Hardware Revision: C
   Firmware Version: 6.64
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 0% Read / 100% Write
   Drive Write Cache: Enabled
   Total Cache Size: 1024 MB
   Total Cache Memory Available: 912 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True
   Number of Ports: 2 Internal only
   Driver Name: cpqary3 For Interface cciss
   Driver Supports HP SSD Smart Path: False
   PCI Address (Domain:Bus: Device.Function): 0000:13:00.0
   Host Serial Number: DEADBEEF

   Array: A
      Interface Type: Solid State SAS
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 1
         Size: 1.8 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: DEADBEEFDEADBEEFDEADBEEFDEADBEEF
         Disk Name: /dev/dsk/c11t0d0
         Mount Points: /bigfish0
         OS Status: LOCKED
         Logical Drive Label: DEADBEEFDEADBEEFDEADBEEF
         Drive Type: Data
         LD Acceleration Method: Controller Cache

      physicaldrive 1I:0:1
         Port: 1I
         Box: 0
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SAS
         Size: 2 TB
         Native Block Size: 512
         Firmware Revision: C23F
         Serial Number: STM000202C85
         Model: STEC    Z16IZF2E-2TBUCZ
         Current Temperature (C): 40
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown


   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250
      Device Number: 250
      Firmware Version: RevC
      WWID: 5002549123E4348F
      Vendor ID: PMCSIERA
      Model: SRC 8x6G

root@ZFS01:~#

After ZFS-formatting the drive behind the Raidcontroller and moving it to the SAS2008 HBA, i could still use it with all data intact.
Unfortunately the way back, from HBA -> RAID destroyed the partitiontable and all FS-Infos, because of the RAID initialization.
 

Aluminum

Active Member
Sep 7, 2012
431
46
28
$55 for a 32GB optane m2 and $5 for a pcie converter card is another way to speed the hell out of zfs sync. No battery/plp or raid card shenanigans to worry about.
 

rootgremlin

Member
Jun 9, 2016
42
14
8
$55 for a 32GB optane m2 and $5 for a pcie converter card is another way to speed the hell out of zfs sync. No battery/plp or raid card shenanigans to worry about.
Still, you can't beat ZERO$ for using "old junk".
On the plus side:
  • The RAIDController has real PLP, the Optane m2 has no PowerLossProtection (albeit possibly not needed because of the 3D-XPoint technology)
  • The RAM-Cache is orders of magnitude Faster than SSDs (nanoseconds for RAM vs. microseconds on SSD)
  • You could use the Full PCIe Bandwith of up to 8x PCIe3.0 vs. the limited (2x PCIe 3D-Xpoint Interface)

My focus for this post was to "Leverage things you may have lying around"

Granted, i would really like to compare this setup against one of those Optane 800p Modules. (just for shits and giggles)
But as I said, i was already satisfied with the raw (uncached) Z840 SSD performance, and i'm not dumping one more € in this Setup to "improve" speed
 
  • Like
Reactions: Gammal Sokk

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
nice.
in i remember right Areca 1882/1883 can enable writeback-cache on a passthrough disk - so no need for raid-0 with them. have some laying around with bbu/caps nand for the 1883 but not found the time to benchmark the cache with zfs yet. would be interesting how it looks with two cached drives in a mirror.