Setting up test bed for ZIL and L2ARC SSDs

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

levak

Member
Sep 22, 2013
49
10
8
Hello!

I'm setting up a test bed for testing SAS SSDs to pick for my ZIL (and maybe L2ARC as well). As far as I looked on the internet, 'fio' seems to be the tool of choice when it comes to benchmarking (on linux that is).

My test bed:
- IBM xServer 3550 M4
- 2x Intel Xeon E5-2640
- 164GB memory
- 2x SAS drives for system
- LSI SAS 9207-8e SAS HBA
- Supermicro SC837E26-RJBOD1 JBOD case for drives

OS: CentOS 7.0 1508 ISO + EPEL repository, from where I will get 'fio'.

As far as tests go, I plan on the following scenario:
- insert fresh new drive
- warming run: run 'fio' for 6h with options:
fio --filename=/dev/sdx --direct=1 --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --iodepth=16 --numjobs=16 --runtime=21600 --group_reporting --name=4k-write-test
- final run: run 'fio' for 1h with the same options as above and record the results for comparison.

Options were picked from 'storagereviews.com' and seems reasonable.

What do you think of the test?
As far as I know, ZIL needs as much write IOPS as possible and as low latency as possible, is that correct?
How about bandwidth? Probably best if it matches my network speed?

For L2ARC tests, I would test random read instead. What to look at SSDs that will be used for L2ARC?

What do you think of my scenario?
Would you do it differently?
Any other tools that I should look at?

Matej
 

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
Looks good. Very good platform (lots of memory etc). Longer tests are better as it also brings out the consistency side which is important as well. Most disks can survive a 5 or 10 minute onslaught but hit them hard and for a long time and some drop massively in performance.

As far as I know, ZIL needs as much write IOPS as possible and as low latency as possible, is that correct?
How about bandwidth? Probably best if it matches my network speed?
High write IOPS/Low latency and the ability to commit sync writes quickly. The latter seems to be where most drives fail miserably.

Bandwidth will matter as will protocol if you are going out over the network. Gigabit isn't too hard, but on 10GBe >200 MB/s writes is difficult ime.

Look forward to the tests.
 

levak

Member
Sep 22, 2013
49
10
8
Looks good. Very good platform (lots of memory etc). Longer tests are better as it also brings out the consistency side which is important as well. Most disks can survive a 5 or 10 minute onslaught but hit them hard and for a long time and some drop massively in performance.
That is why I want to go with longer tests, to see how drives perform in the long run. They are expected to be hit continually in my case.

High write IOPS/Low latency and the ability to commit sync writes quickly. The latter seems to be where most drives fail miserably.

Bandwidth will matter as will protocol if you are going out over the network. Gigabit isn't too hard, but on 10GBe >200 MB/s writes is difficult ime.
Ability to commit sync writes quickly... Do you mean writing to slog device quickly or flushing from slog device to spindles? What test gives me this info? Random write speed at 4k?

I will post results when I get the drives, which will probably be 2-3 months. Also, I'm not sure what I will get, but I will have to pick the best from what I will get.

Matej
 

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
SLOG is never flushed to disk unless there is a power outage or some other condition that requires replaying (this will be done on boot). It is the writing of the sync transactions to the SLOG that needs to be done quickly.

Filesystem Sync Write -> Commit/Write to Slog as a sync write (to ensure it is actually there) -> Write to pool (IIRC this is done as part of a normal transaction group not sync)

Test is whatever you want to test. Remember a SLOG is only used for sync writes (unless you force it on) so if you want to test sequential sync writes you can do that , or if you want to test 4K random writes you can test that. Real workloads are typically random which is the better test but sequential is a good for an "Upper Bounds" test. The only requirement is that you are doing sync writes and/or you force it on. My suggestion with the amount of ram you have is to use a 64GB Ramdisk to rule out pool performance.
 

levak

Member
Sep 22, 2013
49
10
8
I did some more reading and I think I understand how ZIL works a little better.

So, a good SLOG device will have high IOPS, low latency and high random throughput (to be able to write sync writes as fast as possible, so this should be as fast as network connection if server is a SAN).

I've done some testing with Intel S3700 100GB SSD, as this is suppose to be a good SLOG device, with following settings: 16 threads, queue depth 16, block size 4k:
- average IOPS: 20900/s
- latency: 50% iops between 4-10ms, 36% iops between 10-20ms
- average write speed: 81MB/s

To get a better results, is writing to SLOG a multi-threaded?
What queue depth can I expect with writing to SLOG?

One more thing. If my pool is all 4k drives, is it possible or smart to format SLOG as 512b or should it also be 4k?

Matej
 

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
IIRC the writes are QD1 but not sure on the blocksize . ZFS will format the drive for you, no need to do it yourself. I would just create a 8-16GB partition and make sure it is aligned and pass that over.

81MB/s doesn't seem too far outside of the realm of possibility for a 100GB S3700 but testing it as an actual SLOG as part of an actual pool and all the stuff ZFS does to help (or hinder) will give more accurate numbers.
 

levak

Member
Sep 22, 2013
49
10
8
Great, thanks for info!

I will create a zpool with a few vdevs and test it. I think I have around 20 hard drives in the JBOD to test with.

Matej
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,656
2,068
113
Where did you read that a SLOG needs good "random", AFAIK a SLOG needs 3 important things.
- Low Latency
- Fast sequential writes
- Good Endurance


How do random writes affect the SLOG / where did you read you want a SLOG with fast random access?
 
  • Like
Reactions: aij

levak

Member
Sep 22, 2013
49
10
8
My imagination:)

I always assumed writes to SLOG are random, not sure why:) It seems reasonable:)

And it looks like you are right, SLOG writes are sequential. Best tests will probably be with a setup ZFS volume and running 'fio' on an actual zfs pool.

Matej
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,656
2,068
113
Well, no SATA is going to beat a PCIE drive. For the $ you won't beat a Fusion IO.
 

levak

Member
Sep 22, 2013
49
10
8
I will have a mirrored SLOG, but I can't have a mirrored PCIe device, since JBODs will be connected to two different servers ('controllers').
 

dswartz

Active Member
Jul 14, 2011
610
79
28
Correct. This is the killer for an HA setup. You can have (as I do) a dual-port JBOD chassis with SAS drives, with a server connected to each sff-8088 port. Works fine. However... If you put a SATA SSD in as SLOG, only one server will be able to talk to it (barring some complicated/flaky sata interposer deal). If the server that connects to the sata ssd dies and has to be fenced by the other host, any writes that were in the slog and not yet committed will not be in the pool when the other host forcibly imports it (while will fail anyway, unless you give the '-m' option.) And even if you do get the pool imported, the client will have had some amount of data magically vanish, because the first host lied and told it 'your data is on stable storage, buddy!' SAS SSDs with acceptable parameters for SLOG usage are NOT cheap!
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,656
2,068
113
Forgive my ignorance, by why can't you mirror your SLOG on PCIE devices on both boxes?
 

dswartz

Active Member
Jul 14, 2011
610
79
28
We're talking about two different things. Mirroring an SLOG device is for redundancy. The HA issue is that the storage MUST be visible from both hosts. That's can't work for PCI-E devices, AFAIK. The issue is that the ZIL is basically a transaction log - if you crash and reboot, any writes in it need to be replayed to main storage. If a host has a non-shared SLOG, and there are writes in it, and it crashes (or is fenced by the other host), when that host takes over the pool, it will import it, but it will be missing any writes that were in the SLOG on the failed host. If the HA is properly configured, the intent is that the guests will not see any outage - so someone could write A, B, C to storage, but have B never show up on disk (because it was in the SLOG on the host that failed...)
 
Last edited:
  • Like
Reactions: Patrick and T_Minus

Entz

Active Member
Apr 25, 2013
269
62
28
Canada Eh?
Never really thought of doing real HA in the context of a SLOG, most implementations I have seen are based on duplication on top of ZFS (both redundant system and disks). If it works that would be great (SAS SLOG aside). If you have done it or are going to do it you should create a thread to outline any issues you come across or if it just works. Especially the import on failure side, I assume the disk-ids would (need to) be consistent so it should just work. Would be an interesting read.
 

levak

Member
Sep 22, 2013
49
10
8
dswartz: I know SAS with parameters for SLOG are not cheap, but that is the only way I can go. They start at around 500€ for a 100GB model (HGST SSD800MH.B or Seagate S1200). As far as SATA drives go, I can't afford SATA drives with interposers. I tried, but they didn't work the way they should and had a bunch of problems. Drives don't fit into case as well with interposers, even if I use Supermicro ones. Also, it's cheaper to buy 2 SAS SSDs that are know to work as they should and be done with it. That way, when troubles arise, I'm sure SLOGs are not problematic, since they are true SAS SSDs without any emulation. Also, that way I can get some support from OmniTI, since I will run a whitelisted configuration.

Entz: I will do it, it's a matter of time. Probably in 2-3 months. I will try and post at many info as possible. Will have to talk to my boss if it's ok to post results, but I don't think he will mind. As far as HA goes, we already have all the hardware, except hard drives and SSD drives (SAS ones that is, we already have SATA drives, but they don't work in HA, since 2 controllers can't connect to single drive). It will be an interesting project and I will open a thread when project starts...

Matej
 
  • Like
Reactions: Entz

Deci

Active Member
Feb 15, 2015
197
69
28
werent you interested in the HGST Zeusram drives before? as far as an infinite write life, incredibly low latency SAS zil/slog goes, its got no competitors currently. one is enough for 10gbit, two if you want mirroring.
 
  • Like
Reactions: T_Minus