HBA with cache and BBU?

Skud

Active Member
Jan 3, 2012
115
54
28
Does anyone know if such a product exists? I think such a product would be good to find because I believe it would compliment ZFS. Here is why:

1) ZFS likes to work with disks directly
2) ZFS uses the ZIL to log writes to the pool
3) In the absence of a dedicated log device ZFS will use the pool which slows random writes down considerably
4) Cache on an HBA/RAID controller speeds up random writes

It seems to be an ongoing issue where someone is asking "what the best log device?" or "what is the best SSD to use for a ZIL?" As of now, there is no "best" device short of a FusionIO or DDRDrive and most SSDs suitable for the job are expensive. I've also found that when you do add a dedicated log device and that device isn't "fast" enough you will artificially limit the speed of the pool.

While perusing the Sun tech docs trying to figure out some other issues I came across an article where it is said that one can disable cache flushing if the back-end storage has a protected cache. It has been an issue where some storage devices don't properly honour the ZFS "flush if your cache isn't protected" command and they will instead will always flush the data -even if the cache is protected. The solution is to enable a switch that disables cache flushing or, instead of globally setting "nocacheflush", you can add a line to a config file which tells the system that this particular storage device doesn't need to be tolt to flush its cache.

I think that if there was an HBA or RAID controller which supported JBOD or passthrough mode and still used the on-board cache this would somewhat negate the need for a separate SLOG device. ZFS would use the pool as a ZIL, but these random writes would go to the controller's cache instead to be dealt with later.

I don't know how much of a performance increase this would give, but I think it would be better than using a current-gen lower cost SSD. One way to find out would be to see the performance of a pool on a regular non-cached HBA vs the same configuration on a RAID card with each device in a RAID0.

Thoughts?

Thanks!!
Riley
 

Jeggs101

Well-Known Member
Dec 29, 2010
1,497
231
63
These don't really exist. ZFS and other software schemes generally just like a dumb hba so they can work with resources over an entire pool. These things are designed for huge multirack storage systems not really home office server equipment
 

mobilenvidia

Moderator
Sep 25, 2011
1,806
120
63
New Zealand
As HBA's very rarely have Caches with out becoming RAID controllers.
Even JBOD on a caching controller ads a layer between the OS and the OS.

A quote from WIKI that might apply.
Hardware RAID on ZFS
When using ZFS on high end storage devices or any hardware RAID controller it is important to realize that ZFS needs access to multiple devices to be able to perform the automatic self-healing functionality.
If hardware-level RAID is used then it is most efficient to configure it in JBOD or RAID 0 mode (i.e. turn off redundancy-functionality). For ZFS to be able to guarantee data integrity it needs to either have access to multiple storage units (disks) or to enable redundancy (copies) in ZFS which duplicates the data on the same disk.
Using ZFS copies is a good feature to use on notebooks and desktop computers since the disks are large and it provides some redundancy with a single drive.
Note that hardware RAID configured as JBOD may still detach disks that do not respond in time; and as such may require TLER/CCTL/ERC-enabled disks to prevent drive dropouts
 

Skud

Active Member
Jan 3, 2012
115
54
28
Well, I sent off a query to Areca regarding the 1880 and 1882 series and received the following reply:

Dear Sir/Madam,

yes, when the controller or disk is in JBOD mode, the controller's onboard cache and BBU are used both.

Best Regards,

Kevin Wang
Also, some googling of "areac jbod cache" has brought forth the results of others who seem to agree with the above statement.

So, I think that would be a good test to see how ZFS behaves as a NFS or iSCSI server (sync writes) when running off of an Areca in JBOD mode. Unfortunately, I don't have one to test.

Riley
 

mobilenvidia

Moderator
Sep 25, 2011
1,806
120
63
New Zealand
The 1880 and 1882 are Areca's version of the LSI 9260 and LSI9265 both RAID controllers
They both use the SAS2108 and SAS2208 controllers respectively.

LSi for some reason can't enable JBOD on the LSI9260 controllers, but these can use single drive RAID0
With a tweak in MegaCLi you can enable JBOD on LSI9265/6 and LSI9270/1 controllers includes IBM ServeRAID M5016

I can do a test to see if the cache is used.
Will need to do tonight after work, as long as I remember, may need remindning :)
 

mobilenvidia

Moderator
Sep 25, 2011
1,806
120
63
New Zealand
Alrighty then.

LSI9261-8i

1x 2TB Hitachi 7k2000 in single drive RAID 0, Read ahead, Write Back and Caches IO to make sure the cache was used:


While at the same time a differnent:
1x 1x 2TB Hitachi 7k2000 in JBOD, there are no settings to play with.

Can easily tell RAID 0 gets full cache treatment, JBOD gets nothing (as it should)

Conclusion of SAS2208 JBOD
Uses no Cache
Nothing to tweak in LSI MSM utility.
Drive shows with name in Windows Device properties.

JBOD on a SAS2208 looks like it directly connected to the drive.
In reality making the SAS2208 just an expensive SAS2008
BUT you can mix JBOD and array on the same controller, ie enjoy a fast caching RAID0 boot with possible fast caching L2ARC/ZIL and leave the other drives JBOD for ZFS to do its thing.
And with Cachevault no need for a BBU, the ultimate setup.

I should now go and test this, shouldn't I ?
 

Skud

Active Member
Jan 3, 2012
115
54
28
Interesting!!

So, obviously, ZFS does/would benefit from a controller cache and the LSI models do not/cannot enable the cache in JBOD mode. Areca must be doing something differently to allow for this.

Thanks!!
Riley
 

mobilenvidia

Moderator
Sep 25, 2011
1,806
120
63
New Zealand
ZFS would not at all benefit from caching at all, it's really detrimental to how it works.

Very simpy, ZFS needs DIRECT access to the drives in the pool.

To me, you cannot have direct access to drives if you have cache.
Something has to intervene to check whether the cache has the data needed or get it from the HDDs
And same other way, data will be written to cache first then written to HDDs, even in writethrough mode. (write back can take longer to do so)
This would wreak havoc with ZFS self healing

To me ZFS = avoid caches, RAID anything that handles the data between HDD and ZFS
You can fiddle around with L2ARC and ZIL to tweak the ZFS caches.
ZFS is a great at what it does because it does everything it's self.
 

Skud

Active Member
Jan 3, 2012
115
54
28
I'm not sure I fully agree. I think that a JBOD with caching would be the best solution. While there would still be a "device" between ZFS and the disk it should be transparent to ZFS. Essentially, it would become the individual disks' cache, only much much larger, smarter, and protected.

ZFS was built to run on pretty much any type of storage and abstract that storage for the user. As such, it's possible and supported to use storage from RAID array's, exported LUNs from a SAN, JBOD, as well as individual disks.

From: ZFS Best Practices Guide - Siwiki

Code:
 - Set up one storage pool using whole disks per system, if possible.

 - Keep vdevs belonging to one zpool of similar sizes; Otherwise, as the pool fills up, new allocations will be forced to favor larger vdevs over
   smaller ones and this will cause subsequent reads to come from a subset of underlying devices leading to lower performance.

 - For production systems, use whole disks rather than slices for storage pools for the following reasons:
   
   - Allows ZFS to enable the disk's write cache for those disks that have write caches. If you are using a RAID array with a non-volatile write cache, 
     then this is less of an issue and slices as vdevs should still gain the benefit of the array's write cache.
   
   - For JBOD attached storage, having an enabled disk cache, allows some synchronous writes to be issued as multiple disk writes followed by a single 
     cache flush allowing the disk controller to optimize I/O scheduling. Separately, for systems that lacks proper support for SATA NCQ or SCSI TCQ, having 
     an enabled write cache allows the host to issue single I/O operation asynchronously from physical I/O.
From: ZFS Evil Tuning Guide - Siwiki

When issuing cache flushes, ZFS issues the O_DSYNC command to the storage (disk, array controller, SAN, etc.) which basically means "flush your cache if it's not protected". ZFS uses this method because it's designed to work with storage that has some sort of protected cache. However, certain storage devices do not handle the O_DSYNC command and instead treat is as a forced flush command. The result is that instead of taking advantage of on-board cache and only flushing when full or not busy it flushes each time an O_DSYNC is received - severely hampering performance.

To work around this, ZFS allows for options to be added to the sd.conf file that identifies which storage devices behave poorly and should not receive cache flush commands. Or, the "nocacheflush" switch can be set that will turn off cache flushing system-wide, but that is generally a bad idea.

Riley
 

mobilenvidia

Moderator
Sep 25, 2011
1,806
120
63
New Zealand
I refer to another wiki ;)

Hardware RAID on ZFS
When using ZFS on high end storage devices or any hardware RAID controller it is important to realize that ZFS needs access to multiple devices to be able to perform the automatic self-healing functionality.[39] If hardware-level RAID is used then it is most efficient to configure it in JBOD or RAID 0 mode (i.e. turn off redundancy-functionality). For ZFS to be able to guarantee data integrity it needs to either have access to multiple storage units (disks) or to enable redundancy (copies) in ZFS which duplicates the data on the same disk. Using ZFS copies is a good feature to use on notebooks and desktop computers since the disks are large and it provides some redundancy with a single drive.
Note that hardware RAID configured as JBOD may still detach disks that do not respond in time; and as such may require TLER/CCTL/ERC-enabled disks to prevent drive dropouts:[40]
 

Dragon

Banned
Feb 12, 2013
77
0
0
I'm not sure I fully agree. I think that a JBOD with caching would be the best solution. While there would still be a "device" between ZFS and the disk it should be transparent to ZFS. Essentially, it would become the individual disks' cache, only much much larger, smarter, and protected.
Hi, I haven't seen that statement before and I am intrigued, can you elaborate on what you mean by "best solution? Why is having more points of failure to lose data the best solution?

My current understanding is that with ZFS, the motherboard is the new raid controller, the OS is the new firmware, ZIL is the new BBU, and RAM is still RAM, just a lot more.

There's no way a conventional Raid-controller/HBA cache is "smarter" than ZFS cache, at least not v28+, so what exactly can that extra layer of cache do that ZFS cannot?

If having more redundant caching layers is "better", then how about we extend that logic and have 50 layers of these "Smart Cache"? Will it still be better?

If there are 50 miles of cable and 5 satellites that disconnects from time to time between the HDD and the HBA and the motherboard, then I can understand, but why cache the same thing in memory over and over in the same box?
 
Last edited:

Skud

Active Member
Jan 3, 2012
115
54
28
Well, my thoughts are based on the following:

1) When ZFS receives data to be written to disk in a synchronous manner it will group those writes into a transaction log for approximately 5 seconds. It does this in RAM and the ZIL, with the ZIL being a "back-up" for what is in RAM. Without a dedicated log device, the pool itself is the ZIL. So, essentially, everything gets written to the pool twice, though not at the same time.

2) During a commit, all the data in RAM is flushed to disk and ZFS waits for confirmation that the data has been written. This is what slows down synchronous writes and can be mitigated somewhat by putting the ZIL on a faster disk or SSD.

3) The problem is that while current generation SSDs are much faster than spinning disks, they seem to have issues with small sequential I/O. Things are definitely getting better (the Intel S3700 SSDs look promising!), but short of a DDRDrive, FusionIO, or ZeusRAM there aren't many options. Also, you need an SSD with power failure protection.

4) While ZFS is designed to work with "whole disks", it really only wants this because it can enable the disk's cache. ZFS is designed to work with pretty much any storage. The only catch is that in order for ZFS to heal itself you need to give it more than one disk or copy of the data. So, for example, running a single ZFS vdev on a single hardware RAID 5 array would be fairly pointless because ZFS only has one copy of the data (you would be told of corruption, though). However, running a mirror vdev on two hardware RAID 5 arrays would be acceptable from ZFS's requirements since ZFS has two copies of the data and can heal any corruption (not really a good idea, but it's just an example).

5) When we hook disks to a hardware RAID controller with protected cache the controller will usually disable the individual disks cache because 1) the disk's cache isn't protected 2) the disk may lie about having flushed the cache and 3) the controller has a whole lot more cache than the disks do. If the controller is capable of operating in JBOD mode and still use it's own cache then you have substantially increased the size of the disk cache AND protected it from power failure (or the disk lying). Assuming you can put up to 4GB of cache on you controller you would need to have 64 disks to equal the same amount of cache and it still wouldn't be protected.

So, the benefits could be:

1) Large, pooled, protected write cache for the disks.
2) Disabled cache on the individual disks.
3) I suspect increased system responsiveness. ZFS issues and completes writes to the controller much more quickly than it can to disks (this is not ZFS specific, but any system, and why RAID controllers have cache). ZFS doesn't have to wait for the controller to flush its cache to disk because it's protected and the controller will take care of it.
4) No more worrying about drives which "lie" about flushing the caches.
5) Could replace the need for a dedicated log device (up to a point).
6) You get the best of both worlds - the intelligence and flexibility of ZFS (since it's working with individual disks), the caching capabilities of a hardware RAID card, and, if the JBOD is true JBOD, you can still move the array to a regular HBA should the RAID card have issues.

Drawbacks could be:

1) As per mobilenvidia's wiki link you might need to use "enterprise" drives (not WD Green, Blue or Black) to prevent the drive from being dropped from the JBOD (if that's what the controller does in that situation).
2) Cost. Controller is more expensive, but it would be cheaper than purchasing a ZeusRAM, DDRDrive, or FusionIO.
3) Complexity. There would be an extra layer involved in the disk subsystem, but that's no different than any other hardware RAID system.

Riley