ZFS on an Open-iSCSI LUN in VMware

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

johngillespie

New Member
Jun 20, 2012
13
0
1
Paris, France
Hi,

I've been designing a new storage + virtualisation solution for my home over the last few months and am very interested in the features ZFS provides.
However I don't like not being able to add a drive to an existing ZFS pool and have been looking for a way round that issue... I believe I may have found it.

Here's the idea :

VMware ESXi boots off a USB stick and then starts a linux storage VM with a RAID card running the IT firmware (in passthrough).
The linux VM would be running mdraid + LVM and have a large LUN exported via iSCSI to a second VM running opensolaris.
The OpenSolaris VM would then mount the LUN via iSCSI and format it with ZFS.

The idea is that I'd then be able to benefit from ZFS dedup, compression, snapshots, etc, and still be able to increase the storage size by resizing the LUN which ZFS uses.
My understanding is that ZFS would then automatically expand in order to use all the available space.

Does this sound right to you ? Has anyone ever tried this ?
What sort of performance hit do you think I should expect from this solution vs a standard ZFS setup (in VMware) ?


Regards,
John
 

Nnyan

Active Member
Mar 5, 2012
142
42
28
That does sound interesting (and if I'm not mistaken I think I've read something like this somewhere else, I'll have to search for it). My only concern would be performance so I would LOVE to hear about anyone's experience doing something like this. Hmmm maybe use SmartOS instead of ESXi?
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
My biggest concern there is that you are adding a lot of complexity. Basically you have a VM layer, a Linux layer, a Solaris layer plus the clients connecting to the storage/ other VMs accessing the storage.

Not a bad thing, but it may make troubleshooting difficult when something goes wrong.
 

johngillespie

New Member
Jun 20, 2012
13
0
1
Paris, France
I'm more worried about performance than troubleshooting the system when/if it goes wrong.
Proper monitoring will help me figure out what's failed (drive, iSCSI export, fs problem, etc...).
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
Can you use ZFS for Linux and mdadm?

SmartOS is still going to limit you to needing to add a vdev to expand a pool.
 
Last edited:

sotech

Member
Jul 13, 2011
305
1
18
Australia
Is scrubbing going to be as effective here as ZFS doesn't have access to the base disks? How will you do online data integrity checks of the mdadm/LVM managed array underneath it all?
 

johngillespie

New Member
Jun 20, 2012
13
0
1
Paris, France
Good question, it's kept me thinking for a couple of hours ;-)

Since (as you pointed out) ZFS doesn't have access to the base disks I don't expect scrubbing to be as effective. ZFS will only be sure that the data was correctly written to the LUN. If something goes wrong underneath that layer ZFS won't detect it. That being said (from what I've read on the Oracle web site) other than a hardware failure they don't expect anything to go wrong and mdraid is there for that.

The way I see things is that yes this solution will mean losing out on some of the neat data integrity tricks that ZFS has to offer. However, not being able to expand an existing pool is the reason why I've never really bothered with ZFS in the past even though it provides great features. I'm willing to stick with "old-fashioned" RAID if it means being able to use ZFS (with decent performance).

Does any of this help ? :)
 

sotech

Member
Jul 13, 2011
305
1
18
Australia
Good question, it's kept me thinking for a couple of hours ;-)

Since (as you pointed out) ZFS doesn't have access to the base disks I don't expect scrubbing to be as effective. ZFS will only be sure that the data was correctly written to the LUN. If something goes wrong underneath that layer ZFS won't detect it. That being said (from what I've read on the Oracle web site) other than a hardware failure they don't expect anything to go wrong and mdraid is there for that.

The way I see things is that yes this solution will mean losing out on some of the neat data integrity tricks that ZFS has to offer. However, not being able to expand an existing pool is the reason why I've never really bothered with ZFS in the past even though it provides great features. I'm willing to stick with "old-fashioned" RAID if it means being able to use ZFS (with decent performance).


Does any of this help ? :)
I see where you're coming from - ease of expansion is certainly an advantage of mdadm here vs. ZFS.

Given that the data integrity side of ZFS is at least partially negated by the layering on top of mdadm would you just be better off sticking with a straight mdadm setup? It's been a while since I set up a server with anything but ZFS and my benchmarks of mdadm arrays vs. ZFS arrays are pretty long in the tooth, but what advantage will running ZFS on top of mdadm get you?

To my mind - and the reason we use ZFS despite the inconvenience of having to expand using whole vdevs rather than single disks - ZFS is all about maximum data integrity first and foremost. When combined with ECC RAM, server-level hardware and regular scrubbing I don't think you can get a better setup in terms of ensuring that all your 1s and 0s stay where you left them and that file you'll need in 6 years time will still be accessible. If you're taking away the maximum-data-integrity part of the equation I think ZFS starts to make less sense if ease of expansion is a notable consideration.


Looking back to the original post - dedup is close to worthless in the current implementation except in very specific circumstances (we tried it a number of ways on a number of systems and came to the same conclusion as you'll find many places online, sadly) - compression is nice, though... is compression an option for LVM/mdadm? That's outside my area of expertise. Snapshots are handy, too, and again I don't know whether that's an option for other filesystems.

Edit: I like these kind of discussions... it's always interesting having to think about things in a different way.
 
Last edited:

cactus

Moderator
Jan 25, 2011
830
75
28
CA
This gives a good explanation of why ZFS is normally used without a RAID level between it and the disks. The problem lies in trying to recover the corrupted data. ZFS checksums will find the bad block, something most FS's don't do, and then use the data from the mirror or parity block and the checksum to fix the error. When you use LVM or mdadm to give ZFS a single volume, it has no way of getting mirror or parity data to fix the error, thus requiring a restore from back up. So you retain the error detection, but lose the error correcting. Jörg also points out you can use the copies parameter, which can be set to parts of a volume, to basically set up a mirror on a single volume, which will allow you to detect and correct errors.

Looking back to the original post - dedup is close to worthless in the current implementation except in very specific circumstances (we tried it a number of ways on a number of systems and came to the same conclusion as you'll find many places online, sadly) - compression is nice, though... is compression an option for LVM/mdadm? That's outside my area of expertise. Snapshots are handy, too, and again I don't know whether that's an option for other filesystems.
I have read similar things about ZFS dedup. Dedup, in general, requires many common blocks in your data to be effective. So by design dedup is not efficient with highly random data like video and pictures. The benefit of dedup is seen when you have many files that share common data like multiple instances of the same OS or tons of files with similar header or meta data like text documents.

Compression is done at the file system level and not at the raid/volume manager levels. You will still get compression with ZFS even if you are using it on a LVM/mdadm volume.
 

johngillespie

New Member
Jun 20, 2012
13
0
1
Paris, France
Given that the data integrity side of ZFS is at least partially negated by the layering on top of mdadm would you just be better off sticking with a straight mdadm setup? It's been a while since I set up a server with anything but ZFS and my benchmarks of mdadm arrays vs. ZFS arrays are pretty long in the tooth, but what advantage will running ZFS on top of mdadm get you?
- Easy of management
- The fact you can enable/disable options on a folder basis
- dedup, compression, snapshots.

Looking back to the original post - dedup is close to worthless in the current implementation except in very specific circumstances (we tried it a number of ways on a number of systems and came to the same conclusion as you'll find many places online, sadly) - compression is nice, though... is compression an option for LVM/mdadm? That's outside my area of expertise. Snapshots are handy, too, and again I don't know whether that's an option for other filesystems.
I'll be running VMs from the ZFS volume so dedup should (based on what I know from working with NetApp SANs) allow provide me with a 40-50% dedup gain. That being said I have been wondering about the REAL gain from all of this. Disk space doesn't cost that much and I'll only be running about 8-10 VMs (including VDI instances) so I might just be better off adding an extra disk to the array...

The other data I'll be storing that would benefit from dedup is backups, there could be a decent gain with that too.

This gives a good explanation of why ZFS is normally used without a RAID level between it and the disks. The problem lies in trying to recover the corrupted data. ZFS checksums will find the bad block, something most FS's don't do, and then use the data from the mirror or parity block and the checksum to fix the error. When you use LVM or mdadm to give ZFS a single volume, it has no way of getting mirror or parity data to fix the error, thus requiring a restore from back up. So you retain the error detection, but lose the error correcting. Jörg also points out you can use the copies parameter, which can be set to parts of a volume, to basically set up a mirror on a single volume, which will allow you to detect and correct errors.
Thanks for that info as well as the link !
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,141
1,184
113
DE
Using ZFS on any sort of NON-ZFS-Raid will result in the loss of one of the most valuable ZFS features: self healing of discovered errors on a disk in a raid array based on data-checksums and Raid-redundancy. Using Linux to build a software-Raid will complicate setup without any advantage (beside price) over a hardware-Raid that may support Raid-level conversion or Raid-expansion also but lacks checksumming data as well.

Summary:
While vdev expansion and vdev Raid-level conversion is not yet available in ZFS, it should be enough in all cases to expand a pool with adding new vdevs. If you need redundancy, the minimum needed disks are two to increase capacity. Adding complexity, resulting in bad reliability is not the way i would go.
 
Last edited: