ZFS on an Open-iSCSI LUN in VMware

johngillespie · Jun 20, 2012

Hi,

I've been designing a new storage + virtualisation solution for my home over the last few months and am very interested in the features ZFS provides.
However I don't like not being able to add a drive to an existing ZFS pool and have been looking for a way round that issue... I believe I may have found it.

Here's the idea :

VMware ESXi boots off a USB stick and then starts a linux storage VM with a RAID card running the IT firmware (in passthrough).
The linux VM would be running mdraid + LVM and have a large LUN exported via iSCSI to a second VM running opensolaris.
The OpenSolaris VM would then mount the LUN via iSCSI and format it with ZFS.

The idea is that I'd then be able to benefit from ZFS dedup, compression, snapshots, etc, and still be able to increase the storage size by resizing the LUN which ZFS uses.
My understanding is that ZFS would then automatically expand in order to use all the available space.

Does this sound right to you ? Has anyone ever tried this ?
What sort of performance hit do you think I should expect from this solution vs a standard ZFS setup (in VMware) ?

Regards,
John

Nnyan · Jun 21, 2012

That does sound interesting (and if I'm not mistaken I think I've read something like this somewhere else, I'll have to search for it). My only concern would be performance so I would LOVE to hear about anyone's experience doing something like this. Hmmm maybe use SmartOS instead of ESXi?

Patrick · Jun 21, 2012

My biggest concern there is that you are adding a lot of complexity. Basically you have a VM layer, a Linux layer, a Solaris layer plus the clients connecting to the storage/ other VMs accessing the storage.

Not a bad thing, but it may make troubleshooting difficult when something goes wrong.

johngillespie · Jun 21, 2012

I'm more worried about performance than troubleshooting the system when/if it goes wrong.
Proper monitoring will help me figure out what's failed (drive, iSCSI export, fs problem, etc...).

cactus · Jun 21, 2012

Can you use ZFS for Linux and mdadm?

SmartOS is still going to limit you to needing to add a vdev to expand a pool.

sotech · Jun 21, 2012

Is scrubbing going to be as effective here as ZFS doesn't have access to the base disks? How will you do online data integrity checks of the mdadm/LVM managed array underneath it all?

johngillespie · Jun 22, 2012

Good question, it's kept me thinking for a couple of hours ;-)

Since (as you pointed out) ZFS doesn't have access to the base disks I don't expect scrubbing to be as effective. ZFS will only be sure that the data was correctly written to the LUN. If something goes wrong underneath that layer ZFS won't detect it. That being said (from what I've read on the Oracle web site) other than a hardware failure they don't expect anything to go wrong and mdraid is there for that.

The way I see things is that yes this solution will mean losing out on some of the neat data integrity tricks that ZFS has to offer. However, not being able to expand an existing pool is the reason why I've never really bothered with ZFS in the past even though it provides great features. I'm willing to stick with "old-fashioned" RAID if it means being able to use ZFS (with decent performance).

Does any of this help ?

sotech · Jun 22, 2012

johngillespie said:
Good question, it's kept me thinking for a couple of hours ;-)

Since (as you pointed out) ZFS doesn't have access to the base disks I don't expect scrubbing to be as effective. ZFS will only be sure that the data was correctly written to the LUN. If something goes wrong underneath that layer ZFS won't detect it. That being said (from what I've read on the Oracle web site) other than a hardware failure they don't expect anything to go wrong and mdraid is there for that.

The way I see things is that yes this solution will mean losing out on some of the neat data integrity tricks that ZFS has to offer. However, not being able to expand an existing pool is the reason why I've never really bothered with ZFS in the past even though it provides great features. I'm willing to stick with "old-fashioned" RAID if it means being able to use ZFS (with decent performance).

Does any of this help ?

I see where you're coming from - ease of expansion is certainly an advantage of mdadm here vs. ZFS.

Given that the data integrity side of ZFS is at least partially negated by the layering on top of mdadm would you just be better off sticking with a straight mdadm setup? It's been a while since I set up a server with anything but ZFS and my benchmarks of mdadm arrays vs. ZFS arrays are pretty long in the tooth, but what advantage will running ZFS on top of mdadm get you?

To my mind - and the reason we use ZFS despite the inconvenience of having to expand using whole vdevs rather than single disks - ZFS is all about maximum data integrity first and foremost. When combined with ECC RAM, server-level hardware and regular scrubbing I don't think you can get a better setup in terms of ensuring that all your 1s and 0s stay where you left them and that file you'll need in 6 years time will still be accessible. If you're taking away the maximum-data-integrity part of the equation I think ZFS starts to make less sense if ease of expansion is a notable consideration.

Looking back to the original post - dedup is close to worthless in the current implementation except in very specific circumstances (we tried it a number of ways on a number of systems and came to the same conclusion as you'll find many places online, sadly) - compression is nice, though... is compression an option for LVM/mdadm? That's outside my area of expertise. Snapshots are handy, too, and again I don't know whether that's an option for other filesystems.

Edit: I like these kind of discussions... it's always interesting having to think about things in a different way.

cactus · Jun 22, 2012

This gives a good explanation of why ZFS is normally used without a RAID level between it and the disks. The problem lies in trying to recover the corrupted data. ZFS checksums will find the bad block, something most FS's don't do, and then use the data from the mirror or parity block and the checksum to fix the error. When you use LVM or mdadm to give ZFS a single volume, it has no way of getting mirror or parity data to fix the error, thus requiring a restore from back up. So you retain the error detection, but lose the error correcting. JÃ¶rg also points out you can use the copies parameter, which can be set to parts of a volume, to basically set up a mirror on a single volume, which will allow you to detect and correct errors.

Looking back to the original post - dedup is close to worthless in the current implementation except in very specific circumstances (we tried it a number of ways on a number of systems and came to the same conclusion as you'll find many places online, sadly) - compression is nice, though... is compression an option for LVM/mdadm? That's outside my area of expertise. Snapshots are handy, too, and again I don't know whether that's an option for other filesystems.

I have read similar things about ZFS dedup. Dedup, in general, requires many common blocks in your data to be effective. So by design dedup is not efficient with highly random data like video and pictures. The benefit of dedup is seen when you have many files that share common data like multiple instances of the same OS or tons of files with similar header or meta data like text documents.

Compression is done at the file system level and not at the raid/volume manager levels. You will still get compression with ZFS even if you are using it on a LVM/mdadm volume.

johngillespie · Jun 24, 2012

sotech said:
Given that the data integrity side of ZFS is at least partially negated by the layering on top of mdadm would you just be better off sticking with a straight mdadm setup? It's been a while since I set up a server with anything but ZFS and my benchmarks of mdadm arrays vs. ZFS arrays are pretty long in the tooth, but what advantage will running ZFS on top of mdadm get you?

- Easy of management
- The fact you can enable/disable options on a folder basis
- dedup, compression, snapshots.

sotech said:
Looking back to the original post - dedup is close to worthless in the current implementation except in very specific circumstances (we tried it a number of ways on a number of systems and came to the same conclusion as you'll find many places online, sadly) - compression is nice, though... is compression an option for LVM/mdadm? That's outside my area of expertise. Snapshots are handy, too, and again I don't know whether that's an option for other filesystems.

I'll be running VMs from the ZFS volume so dedup should (based on what I know from working with NetApp SANs) allow provide me with a 40-50% dedup gain. That being said I have been wondering about the REAL gain from all of this. Disk space doesn't cost that much and I'll only be running about 8-10 VMs (including VDI instances) so I might just be better off adding an extra disk to the array...

The other data I'll be storing that would benefit from dedup is backups, there could be a decent gain with that too.

cactus said:
This gives a good explanation of why ZFS is normally used without a RAID level between it and the disks. The problem lies in trying to recover the corrupted data. ZFS checksums will find the bad block, something most FS's don't do, and then use the data from the mirror or parity block and the checksum to fix the error. When you use LVM or mdadm to give ZFS a single volume, it has no way of getting mirror or parity data to fix the error, thus requiring a restore from back up. So you retain the error detection, but lose the error correcting. JÃ¶rg also points out you can use the copies parameter, which can be set to parts of a volume, to basically set up a mirror on a single volume, which will allow you to detect and correct errors.

Thanks for that info as well as the link !

socra · Jun 25, 2012

Why not use mirroring..I think you will be able to grow your pool..?
http://constantin.glez.de/blog/2010/01/home-server-raid-greed-and-why-mirroring-still-best

gea · Jun 26, 2012

Using ZFS on any sort of NON-ZFS-Raid will result in the loss of one of the most valuable ZFS features: self healing of discovered errors on a disk in a raid array based on data-checksums and Raid-redundancy. Using Linux to build a software-Raid will complicate setup without any advantage (beside price) over a hardware-Raid that may support Raid-level conversion or Raid-expansion also but lacks checksumming data as well.

Summary:
While vdev expansion and vdev Raid-level conversion is not yet available in ZFS, it should be enough in all cases to expand a pool with adding new vdevs. If you need redundancy, the minimum needed disks are two to increase capacity. Adding complexity, resulting in bad reliability is not the way i would go.

Search

ZFS on an Open-iSCSI LUN in VMware

johngillespie

New Member

Nnyan

Active Member

Patrick

Administrator

johngillespie

New Member

cactus

Moderator

sotech

Member

johngillespie

New Member

sotech

Member

cactus

Moderator

johngillespie

New Member

socra

Member

gea

Well-Known Member