ZFS SLOG (ZIL) Drive Upgrade?

mattlach

Member
Aug 1, 2014
153
14
18
Hey all,

I have a ZFS setup as follows:

Code:
     raidz2-0     
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
     raidz2-1     
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
       WD RED 4TB
    logs
     mirror-2               
       Intel S3700 100GB
       Intel S3700 100GB
    cache
     Samsung 850 Pro 512GB
     Samsung 850 Pro 512GB
So, I have two 100GB Intel S3700 mirrored for my SLOG device.

This allows me to do sync writes at about 119MB/s as tested locally with random data.

If I do async writes I am able to write at about 400MB/s

(these are imperfect benchmarks, as I did not shut down other actively running things on the server for this test)

One of my biggest pet peeves with this system is that I have always been disappointed in my relatively slow sync writes over 10Gig ethernet.

I don't run any VM's off of this pool, it is strictly for storage of large files that occasional get accessed. When I do write large files to the pool - however - I get impatient.

The S3700's were essentially the best thing on the market for a SLOG when I got them in 2014, but I have not kept up as much lately regarding what may have changed.

I have seen lately that 8GB ZeusRAM devices have gotten a lot cheaper (can be had for $400 a piece on eBay right now) and there are also a number of PCIe and M.2 solutions.

Would a ZeusRAM 8GB unit be a significant upgrade? One would think they would be since they are RAM based, but on the other hand I have heard that 100-150MB/s sync writes is the most you can expect from a pool when using them as SLOG devices.

Is there any drive out there for under $500 (so I can get two for a mirror under $1000) that performs notably better than the S3700 does for this purpose these days?

Key requirements are:

- Low latency writes
- Sustained high speed writes
- relatively high write endurance
- Battery or capacitor backed cache, so all data is committed if power goes out

The size can be tiny, I don't care. Does not need to be larger than 10GB.

Appreciate any input!
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,004
1,569
113
CA
If you don't run VMs and you are doing large sequential writes why/how are you doing sync writes at all? Are you forcing it?
 

mattlach

Member
Aug 1, 2014
153
14
18
If you don't run VMs and you are doing large sequential writes why/how are you doing sync writes at all? Are you forcing it?

Certain datasets of my pool (where I really care about the data) have "sync=always" set. I also do this for datasets where I have working data (like large project files that are in work, etc.)

Other areas where I mostly store replaceable large files, or finished work that I am transferring from my working directories have "sync=never" set. I generally don't use the default sync settings where async vs sync is determined automatically

While I agree. The risks are smaller with large sequential writes, so you could probably run with sync disabled on those, there are still corner cases in which I'd rather be safe than sorry, like when writing my batch jobs from my camera processing software.
 

mattlach

Member
Aug 1, 2014
153
14
18
+1

to not write everything twice, compare
Would a Faster ZFS Slog drive for ZIL make sense?
Yup, and see my response there too :p

the HardForum is my main forum for all things tech, but I tend to post server specific stuff either here, or on one of the platform specific forums. I realized after posting over there, that this might be a better place for this discussion, to get more of the appropriate eyeballs on it.
 

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
I am slightly surprised re your S3700 performance - maybe the usage of the 100 GB model. In my tests/ experience the S3700 usually tops out around 200 MB (used 200GB model). Zeus went to 250, P3700 to about 500 MB/s. Any other intel nvme is somewhere in between, even 750 which might drop with longer utilization o/c.

I got a Zeusram for €200 and have seen them often below $300 recently. P3700 (400GB) I got for €250 (quite a good deal) but you should be able to get a pair under $1000 if you can wait for a deal.
 

mattlach

Member
Aug 1, 2014
153
14
18
I am slightly surprised re your S3700 performance - maybe the usage of the 100 GB model. In my tests/ experience the S3700 usually tops out around 200 MB (used 200GB model). Zeus went to 250, P3700 to about 500 MB/s. Any other intel nvme is somewhere in between, even 750 which might drop with longer utilization o/c.

I got a Zeusram for €200 and have seen them often below $300 recently. P3700 (400GB) I got for €250 (quite a good deal) but you should be able to get a pair under $1000 if you can wait for a deal.
Thanks for that info.

It could be the 100GB models that are slowing me down. Back when I bought them I was less educated on the topic than I am now, and didn't know that the larger versions would perform better. I was merely focused on getting as small drives as possible, as the large ones seemed like such a waste when I would only be using a very small portion of them.

I also wonder if mirroring two of them as I do results in increased write latency and thus slower SLOG performance.

Googling around I find these results, which indicate the 100GB S3700 performing at 99MB/s, so I'm guessing you are right and the size is probably the cause.

A pair of used P3700's seems like a great performing option, albeit a somewhat expensive one. I'd be somewhat concerned about buying SSD's used what with the limited number of write cycles, but maybe that isn't worth worrying about, considering my S3700's have been in use as SLOG's for about 3 years, and are still listed at 100% in the wearout indicator in SMART.

Now I have to remember what kind of PCIe slots I have available in the server. I haven't opened it in a while so I can't recall.

I know my board has 4 8x PCIe2.0 slots, one 4x PCIe 2.0 slot and one 4x PCIe 1.0 slot.

I have two LSI 9211-i8 SAS controllers in 8x 2.0 slots, and one Intel 10Gbit ethernet adapter also in an 8x 2.0 slot.

I believe the 4x 1.0 slot is taken up by a quad port Intel Gigabit adapter. This leaves 1 8x slot and 1 4x slot, both PCIe 2.0 for PCIe SSD's, which should work, as long as the lack of PCIe 3.0 doesn't hurt performance too much.

It's a Westmere EP era dual socket Xeon system with two L5640's, so not the absolute newest. I've been having the upgrade itch for some time now. AMD's new EPYC systems seem quite awesome. I could easily get away with a single 16 core chip, but the RAM cost has been what is keeping be from doing it. I have 192GB of registered DDR3. Replacing it with DDR4 will cost A LOT.
 

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
Well officially nvme needs PCIe 3 x4 slots. Technically the same as Pcie2 x8, but I have no clue whether pcie2 and 3 have different signal speeds, i.e. if you loose some of the precious latency.
You won't need the bandwith o/c but I'd test it (in the x4 slot) before going that route.

And yes, Ram prices are quite high:(
 

Stux

Member
May 29, 2017
30
10
8
42
A PCIe3 x4 card will not use x8 lanes in PCIe2, rather its maximum bandwidth will be halved. Of course, the P3700 in a 100% write scenario will probably not be bottlenecked being stuck at 2GB/s vs 4GB/s.