ZLOG Benchmark is coming

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,053
1,598
113
CA
Hardware raid cards dont have to be used for raid : )

configure it as a 1-drive raid 0 (or 2-drive raid 1 if you prefer redundancy), and you can benefit from the writeback cache on the raid card. It would be interesting to see what provides better performance: a 400gb dc s3700 as a "1 drive hardware raid 0", or a much faster nvme ssd.

Also I bring up some of these use cases because the workloads that I am more familiar with are in some ways similar, so perhaps those performance stats are also relevant (but perhaps not).
You clearly do not understand the basics of ZFS and your comments are dangerous for anyone else reading this thread who may be getting into ZFS and researching their SLOG device.

You want ZFS exposed to the disks directly so it can have as much information as possible. You don't want caching done that ZFS doesn't know about, you don't want disks managed by hardware RAID even if it's JBOD or as 1-drive raid 0s.


It wouldn't be interesting because if you knew about those drives and how SATA and NVME work you would understand that a SATA SSD even with Hardware RAID Card with 2GB cache in front can't keep up with NVME sustained over period of time. (not even getting into discussion of mixed work load differences which for most users is a factor in deciding a configuration)

Intel SATA S3700 - 35,000 IOPs Write (SRC: Intel SSD DC S3700 Series Enterprise SSD Review | StorageReview.com - Storage Reviews)
Intel NVME P3700 - 170,000 (SRC: Intel SSD DC P3700 2.5" NVMe SSD Review | StorageReview.com - Storage Reviews)

The Lower capacity p3700 will be down to around 80k write, still double the SATA drive.

Once the 2GB cache ran out on the S3700 the drive itself could not keep up with the cache flushes. The NVME not only has that much cache onboard (or more, forget exact #) already it has 2x (minimum) the write performance and reduced latency. SATA is also limited to 1 command queues with a depth of only 32 where-as NVME can go up to 65k on each.


People have tried using the Hardware RAID 2GB/Cache as a SLOG device but the capacity isn't enough for those who likely need the increased perofrmance, needing 8-16GB or more for today's fast networks seen in enterprises and the hassle of using it like this if you can get it to work as expected.

Just to be clear no one is saying ZFS is faster than hardware RAID card or any other file system for that matter, that's not the point of this discussion or ZFS for that matter.
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
4,610
918
113
I remember that using a HW raid was quite fast in comparison to disks back in the days over in the FN forum.
Tried it once with a Dell 5 series with 512 MB but that was not too good (slower than HUSML400, but thats a good zil drive).
It might be interesting to see what a good card can do - o/c you limit yourself to 2 (or 4?) GB Ram size, and for the price of that card you get good drives too,
But if you can get one for cheap...

Edit:
O/c its one of these things that you shouldnt do unless you know exactly what you do as @T_Minus is correctly pointing out:)
 

vrod

Active Member
Jan 18, 2015
233
33
28
28
@funkywizard - As T-Minus says, nobody should follows those recommendation. The SLOG only needs the amount of space that can theoretically be written in 5 seconds (I believe that’s the standard).

So if you have a storage box with 10gbps line to a storage client, that means a theoretical 6,25GB could be written to the slog before that is committed to the pool which the SLOG device belongs to. Therefore, you just need something fast, not big. And a hardware raid would def. not help. Get NVMe for lower latency + higher queue lengths and transfer speeds. Overprovision the SSD and it will last a loooooooong time.
 
  • Like
Reactions: Patrick

funkywizard

mmm.... bandwidth.
Jan 15, 2017
687
289
63
USA
ioflood.com
@funkywizard - As T-Minus says, nobody should follows those recommendation. The SLOG only needs the amount of space that can theoretically be written in 5 seconds (I believe that’s the standard).

So if you have a storage box with 10gbps line to a storage client, that means a theoretical 6,25GB could be written to the slog before that is committed to the pool which the SLOG device belongs to. Therefore, you just need something fast, not big. And a hardware raid would def. not help. Get NVMe for lower latency + higher queue lengths and transfer speeds. Overprovision the SSD and it will last a loooooooong time.
I wouldn't take my comments as recommendations. More like "I wonder what would happen if...."

I can say that many disk i/o tests I've run as a result of "what if", the results are not what I could have predicted, so running the tests is a good way to learn about how the storage behaves in different circumstances.

For example, the performance tests linked earlier in the thread show abysmal performance I would not have expected. Even the "good performance" example drive performs far worse than I would expect. Therefore I think it's premature to simply assume that one (untested) hardware configuration will perform better than another, when the ones tested thus far performed far differently than expected.

When I said "it would be interesting to see if....", I literally meant, it would be interesting to see the results. What makes sense in production is a different story. Sorry for not being more clear up front.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
I think slog behind hw raid could benefit in short spikes, as it for sure handles qd1 better than the ssd would.
Worst case might be the slog's performance dropping to that of the ssd if the raid cards cache is full.
But until this point it could indeed be similar to a nvme or even faster.

Not sure if ZFS really needs direct access to the slog device, too. For the data drives this is out of question.

Such a setup would need a dedicated hw raid card just for the slog plus a HBA in it-mode for the data drives (if they don't run from onboard sata).
Means one more pcie occupied plus some more watts for the raid card.
 

gea

Well-Known Member
Dec 31, 2010
2,535
856
113
DE
The Slog must be capable to log the content of the rambased ZFS writecache, per default up to 4GB so a hardware raid + cache + BBU may be a solution only if its ramcache is large enough.

Mostly I would not see hardware raid as a suitable Slog solution. This is because of cachesize and prize and reliability of BBUs.
 
  • Like
Reactions: T_Minus

BackupProphet

Well-Known Member
Jul 2, 2014
806
289
63
Stavanger, Norway
kingmakers.no
You clearly do not understand the basics of ZFS and your comments are dangerous for anyone else reading this thread who may be getting into ZFS and researching their SLOG device.

You want ZFS exposed to the disks directly so it can have as much information as possible. You don't want caching done that ZFS doesn't know about, you don't want disks managed by hardware RAID even if it's JBOD or as 1-drive raid 0s.

People have tried using the Hardware RAID 2GB/Cache as a SLOG device but the capacity isn't enough for those who likely need the increased perofrmance, needing 8-16GB or more for today's fast networks seen in enterprises and the hassle of using it like this if you can get it to work as expected.
Actually ZFS does not care much about "direct access" to disk drives. Talking to a "virtual" storage platform is fine. For example for encryption on FreeBSD, you need to talk to the drive through the GELI(GEOM) virtual storage layer.

Using a disk controller with write-back cache is in fact a great way to speed up a SLOG. The issue is cost, a good fast controller with write-back cache is very expensive. And battery lifetime is short and also cost a lot of money over time. For most of us, a pool with a few good enterprise SSD's and dropping having a dedicated SLOG is a much better investment.

And even 2GB is plenty for a slog. We're talking about doing an insane amount sync per second if you need a larger cache.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,053
1,598
113
CA
Actually ZFS does not care much about "direct access" to disk drives. Talking to a "virtual" storage platform is fine. For example for encryption on FreeBSD, you need to talk to the drive through the GELI(GEOM) virtual storage layer.

Using a disk controller with write-back cache is in fact a great way to speed up a SLOG. The issue is cost, a good fast controller with write-back cache is very expensive. And battery lifetime is short and also cost a lot of money over time. For most of us, a pool with a few good enterprise SSD's and dropping having a dedicated SLOG is a much better investment.

And even 2GB is plenty for a slog. We're talking about doing an insane amount sync per second if you need a larger cache.
Why would you want to put up a hardware controller on your disks that may not pass the proper, or all SMART info, not fast enough, etc, to ZFS, then ZFS could not do it's job and alert you of issues. Makes 0 sense to me. This is one point driven home on practically every ZFS implementation guides.

This doesn't mean it won't work, it means it's a terrible idea to do in the first place... people do run ZFS on hardware cards or passed via virtual drives in hypervisors, it may work, but it's not the proper implementation and in my opinion having safe data is WHY people are using ZFS in the first place, why start putting up things that degrade or eliminate safety features in the first place?

RE: Slog on HW card... If the system had a problem / power failure the data might be safe on the CACHE but as far as ZFS knows the SLOG device will be empty upon reboot... and it won't utilize it as-intended, and the data will be lost making the entire purpose of the SLOG null in the first place. Or an even more likely worse-case scenario is the battery fails and the cache doesn't even store the data, or an even worse case the cache and the SLOG device die... we all know SLOGs have died so putting reliance in battery + hw controller + slog you're just adding more points of failure.

As you guys have said cost is another factor... so now it cost more $, less reliable, more of the "unknown" will it be safe / work properly, and less performance than a NVME or Optane SLOG device.

I also stated "People have tried using the Hardware RAID 2GB/Cache as a SLOG device but the capacity isn't enough for those who likely need the increased perofrmance, needing 8-16GB or more for today's fast networks seen in enterprises and the hassle of using it like this if you can get it to work as expected."

HGST has implementation guides for numerous ZeusRAM not 1 mirrored with another but 2, 4, etc mirrored and that's back when network performance was not up to todays speed...

As you said 2GB is plenty for most home users, and an Enterprise SSD SLOG device is fine... but my comments weren't about what a home user can get by with, it was about not trying something ridiculous that degrades security and safety of ZFS in the first place which is beneficial to home users, enterprises and all sizes of businesses alike :)

There's many reasons why it's a bad idea pick one, go with it... but suggesting people "try out" HW RAID in front of drives, or a SLOG for ZFS goes against the entire purpose of the SLOG and ZFS as a whole so it's ridiculous to even bring it up as an option in my opinion.
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
4,610
918
113
Still wonder whether it actually would be slower ;) It would be at full RAM speed of the cache -
o/c nowadays with optane the basic idea behind this gets less and less attractive for the various uncertainties that you voiced, but back in the days when this idea was born there was no pcie/nvme and hussls where top notch drives for a ton of money (and "only" providing ~100mb+ of logged speed, not sure what they actually do).
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,053
1,598
113
CA
Still wonder whether it actually would be slower ;) It would be at full RAM speed of the cache -
o/c nowadays with optane the basic idea behind this gets less and less attractive for the various uncertainties that you voiced, but back in the days when this idea was born there was no pcie/nvme and hussls where top notch drives for a ton of money (and "only" providing ~100mb+ of logged speed, not sure what they actually do).
As I mentioned in the first reply there are threads on various forums where people have tried/used the CACHE as the SLOG device itself, not putting it in front of an actual SLOG but in fact using it as the SLOG.

Much more challenging to get this done and I gave up on playing with that a couple years ago but if you're interested I would suggest this over HW Cache + SLOG Device combo.
 

Rand__

Well-Known Member
Mar 6, 2014
4,610
918
113
O/c its the idea is that the cache is acting as the slog since it instantly confirms the write? Everything else wouldn't make any sense at all.
Must have misread the discussion if that was not the point :O
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,053
1,598
113
CA
O/c its the idea is that the cache is acting as the slog since it instantly confirms the write? Everything else wouldn't make any sense at all.
Must have misread the discussion if that was not the point :O
The other user was suggesting putting the HW RAID w/cache in FRONT of a SSD SLOD device to improve it's performance which is silly when you could implement the HW RAID w/cache as the SLOG itself.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,053
1,598
113
CA
I'm still looken forward to NVME vs Optane SLOG device comparison :)

We can all say how overpriced optane is at 1500$ for 384GB but that's cheaper than the 8GB ZuesRAM at $1k+ when they were retail/new ;)
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
i unterstood the idea was to put ssd's on a raidcard with cache and BBU and see how it compares to an nvme slog?

its out of doubt that this is for sure too expensive, introduces more failure possibilities and for sure should not be recommended.

but i think for some workloads, like i.e. a database with some hundred transacrions per minute (what wouldnt saturate the capacity until flush) the raid cache could work well in terms of latency.

would really love to see someone benchmarking it, just to know what exactly the outcome is :D
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,053
1,598
113
CA
For clarity I'm not against hardware raid ;) but suggesting it in combo with ZFS , no no no.

Someone start a new thread about comparing HW RAID SSD SAS3 to ZFS SSD SAS3 and to ZFS NVME SlOG performance and lets try to make that happen :) I don't' think there's much doubt that ZFS will be slower in all tests, it's not known as fastest.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
i think the idea was not about zfs on hw Raid but using the DRAM/writeback cache for a raid0 or raid1 only holding the slog. but right, this is quite OT and should be another thread.

I'm really no friend of hw raid, have not a single box running one but some cards left.
 

BackupProphet

Well-Known Member
Jul 2, 2014
806
289
63
Stavanger, Norway
kingmakers.no
ZFS don't need SMART data. ZFS behaves correctly as long as it can read a storage device with bytes. It is very common to run ZFS with hardware raid. A failure on a hardware raid will behave exactly the same as it would on any drive.
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
are you sure zfs would detect bitrot behind a hw-raid, with no direct access to the raw disk - or would this rely on the controller to ensure integrity?

i wonder for some time now how this would work on a iscsi or srp lun, what should be the better case than a hw-raid.