Does Illumos have any high-availability file storage platforms?

AveryFreeman

consummate homelabber
Mar 17, 2017
254
23
18
40
Near Seattle
averyfreeman.com
Just what the subject says -

Want to create the simplest high-availability storage system possible: 3 hosts - 2 mirrored file storage + 1 witness.

Is this possible on the Illumos platform? I have a single OmniOS file server right now and I love it because of its stability and windows domain compatibility

But I fear it may be lacking in the distributed or mirrored file storage area compared to OS like Linux. Am I right about this?

If anyone knows of any Illumos solutions for something like drbd, gluster, lustre, AFS etc. Maybe a port of pve-zsync? please let me know.

Must be able to continue providing files while one file server is down temporarily, so I don't think a simple zfs send will work...
 

gea

Well-Known Member
Dec 31, 2010
2,675
915
113
DE
Basically there are three HA options as ZFS is not a Cluster filesystem

1. Network mirror via FC/iSCSI targets
2. Continous replication (Nexentastor offers this)
3. Multipath storage (two heads/servers, common mpio SAS storage)

Easiest and fastest is 3.
In its easiest, you can use two servers where you can import the pool on either. You must only care to never import it simultaniously so at least a scripted failover with a alive check of second is needed.

As NFS and SMB are pure ZFS properties, you do not need a failover management like with SMB on non-Solaris or via SAMBA. For cloud storage a S3 cluster ex via minIO would be another option.

For napp-it, I have added management support for 3.
with one server active and the second in standby
see http://www.napp-it.org/doc/downloads/z-raid.pdf

Another similar mpio option would be RSF-1 from high availability,
that offers additionally active/active mode (both server can offer active services)
 
Last edited:
  • Like
Reactions: AveryFreeman

BoredSysadmin

Active Member
Mar 2, 2019
590
200
43
Current TrueNAS hardware support Active-Passive HA on the controller level, but the upcoming TrueNAS Scale might achieve what you're looking for.
 

AveryFreeman

consummate homelabber
Mar 17, 2017
254
23
18
40
Near Seattle
averyfreeman.com
Basically there are three HA options as ZFS is not a Cluster filesystem

1. Network mirror via FC/iSCSI targets
2. Continous replication (Nexentastor offers this)
3. Multipath storage (two heads/servers, common mpio SAS storage)

Easiest and fastest is 3.
In its easiest, you can use two servers where you can import the pool on either. You must only care to never import it simultaniously so at least a scripted failover with a alive check of second is needed.

As NFS and SMB are pure ZFS properties, you do not need a failover management like with SMB on non-Solaris or via SAMBA. For cloud storage a S3 cluster ex via minIO would be another option.

For napp-it, I have added management support for 3.
with one server active and the second in standby
see http://www.napp-it.org/doc/downloads/z-raid.pdf

Another similar mpio option would be RSF-1 from high availability,
that offers additionally active/active mode (both server can offer active services)
Fascinating. I saw somewhere a whitepaper from 2009 that discussed making a zpool mirror in Solaris over network using iSCSI LUN, but for some reason it required 5 hosts. Other than that, I haven't really seen much regarding HA on Solaris/Illumos (except Oracle's new commercial framework).

I read the PDF you linked, I hadn't thought of having a shared disk cluster, I was thinking something more like #1 - network mirror - as I've been looking at solutions like FreeBSD HAST, ctl HA, drbd, Starwind vSAN, etc. I don't really have any money to invest in an external storage enclosure, besides the fact that they're usually REALLY loud, and this is a homelab.

I have two SM E5 servers in rackmount cases with 3.5" backplanes, and about 18 2TB - 8TB drives. I have one more host that handles "always on" workload, things like firewall/gateway, 1 DC/DNS, vcsa, etc. So I had this notion that my current setup would lend itself best to a 1+1 10GbE storage mirror w/ witness.

It's nice to know what options are available, though...
 

gea

Well-Known Member
Dec 31, 2010
2,675
915
113
DE
1.) and 3.) are quite identical from ZFS management.
1.) is slower but allows a whole pool mirrorring

3.) is fastest and a lot easier to handle, based on single disks
 

AveryFreeman

consummate homelabber
Mar 17, 2017
254
23
18
40
Near Seattle
averyfreeman.com
1.) and 3.) are quite identical from ZFS management.
1.) is slower but allows a whole pool mirrorring

3.) is fastest and a lot easier to handle, based on single disks
So when you say multipath, is this limited to NFSv41 multipath protocol, or is there a way to do multipath with block storage target? (e.g. iSCSI, FCoE etc.). Sorry, I'm having to imagine all scenarios from what I've gathered through reading, as it's all very new to me.
 

gea

Well-Known Member
Dec 31, 2010
2,675
915
113
DE
This is not network multipath but disk multipath with SAS disks.

Every SAS disk has two connectors that you can use to double performance, redundancy and HA when connecting one port to one server and the second to the other server.

FC/iSCSI would allow multipath over network to LUNs
 
  • Like
Reactions: AveryFreeman

dswartz

Active Member
Jul 14, 2011
496
49
28
Question: for #3 (which I am interested in trying), does illumos support the zpool 'multihost' attribute? I was using ZoL for awhile, and it had that, which provides some insurance against more than one host importing a pool at the same time.
 

AveryFreeman

consummate homelabber
Mar 17, 2017
254
23
18
40
Near Seattle
averyfreeman.com
This is not network multipath but disk multipath with SAS disks.

Every SAS disk has two connectors that you can use to double performance, redundancy and HA when connecting one port to one server and the second to the other server.

FC/iSCSI would allow multipath over network to LUNs
I've been exploring RDMA to reduce storage network latency. Much of the RDMA hardware and protocols are extremely limited in scope with ESXi v7.0u1 now. PVRDMA driver only compatible with a handful of cards (Connectx-5 or newer) only compatible with Linux VMs, only available through SR-IOV. And I *think* only ethernet protocol, no more IB.

Can't really use for general networking tasks (AFAIK), must be on separate vDGS or vSwitch than TCP/IP NICs. See reference:

Even if I could not use PVRDMA and only passthrough, RDMA-capable hardware offer enough benefit to consider it still? How's the latency reduction?

So what options are there for multipath network LUN. iSER? RoCE? FCoE?

What are the basic requirements, just get an HBA supported by OmniOS and set up the fabric?

Would that allow target be used in OmniOS ZFS as basic block device, use as drive directly in zpool mirror?

What about another technology like using FCoE HBAs? Is there another protocol that could offer this functionality I'm missing?

Being able to expose a target as a disk to each machine would be really cool though. I'm thinking something really basic, and low-level, like this:

zmirror zmirror
Host1 [ Local disk - target disk ] <---> [ Local disk - target disk ] Host2


Edit: I found an answer to the first question. This article is benchmarks of throughput, latency, IOPs, etc. w/ several different network technologies using ram disk, so storage system not bottleneck: iSCSI vs iSER vs SRP on Ethernet & InfiniBand

This is also a good ESXi-related article in regards to iSER vs iSCSI: vSphere with iSER - How to release the full potential of your iSCSI storage! - VROOM! Performance Blog
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
2,675
915
113
DE
You can build a ZFS pool or ZFS raid based on everything that "smells" like a blockdevice, does not matter if its a disk, a file or a network LUN. The LUN can be a single disk or a whole pool, depends only on target settings so a whole network/network or network/local pool mirror is doable as well as any "disk" based vdev.

Only "problem"
The more special a setup the less experience from others.
 
  • Like
Reactions: AveryFreeman

AveryFreeman

consummate homelabber
Mar 17, 2017
254
23
18
40
Near Seattle
averyfreeman.com
Well, I've been looking into all these options for what basically amounts to a RAID1 over network using 2 hosts for HA:

Linux drbd (most common)
Starwind vSAN
FreeBSD HAST + ctl (iscsi) + u/carp

AFAIK all offer some sort of active-active configuration, all would present a single IP/iSCSI target, HA would be negotiated at the storage OS layer

I'm trying to imagine how to do this w/ 2x OmniOS VMs on different hosts. Maybe share a disk with each other via COMSTAR iSER, set up z-mirror with 1 local disk, 1 iSER target.

Then, set up ucarp for a target based on both VMs:

Code:
Last login: Sun Jan 24 00:41:35 2021 from 192.168.1.25
OmniOS r151036  omnios-r151036-c874c7527f       December 2020

[avery@hedgehoggrifter:~] $ pkgin search ucarp
ucarp-1.5.2nb2       Common Address Redundancy Protocol (CARP) for Unix
Lemme know if you think I'm crazy...
 

gea

Well-Known Member
Dec 31, 2010
2,675
915
113
DE
I have played with my HA solution and disk mirroring of a local disk and a iSCSI Lun to overcome the cable limitation (10m) of SAS. It worked but performance was not comparable to mpio sas pools with many disks.

Also setup and maintenance is much more complicated
 
  • Like
Reactions: AveryFreeman

AveryFreeman

consummate homelabber
Mar 17, 2017
254
23
18
40
Near Seattle
averyfreeman.com
I have played with my HA solution and disk mirroring of a local disk and a iSCSI Lun to overcome the cable limitation (10m) of SAS. It worked but performance was not comparable to mpio sas pools with many disks.

Also setup and maintenance is much more complicated
Thanks for sharing your experience! So you recommend setting up multipath SAS (as per your very helpful PDF you shared).

I was looking into that, but it looks like there are no SAS2/3 backplanes for one of my server chassis, only the backplanes with one individual SATA connector per drive...

I have 2U 825TQ-R740LPB and SC836-xx (not sure). Both have the individual SATA port backplanes.

I could get a BPN-SAS2/3-836EL2 backplane for the 3U chassis, but I haven't been able to find a multipath SAS 825 backplane. It's an odd size, 2U w/ 8 drives (instead of 12 like 826). It's old.

Eventually if I can find a suitable backplane for the 825 (or a new chassis) I was thinking of getting something like this for each server so they can share each others' backplanes:



I found a really good general article about different SAS configurations:


Do you need a "witness" (sometimes called "quorum") for multipath SAS, like you do with drbd, etc. to prevent split brain? (1+1+W config)
 

gea

Well-Known Member
Dec 31, 2010
2,675
915
113
DE
You need either an SAS enclosure with 2 sata ports per disk (mostly 4-5 disks in a 3x5,25" disk bay) or an jbod with a dual expander where each expander is SAS connected to one of the heads.

If you mean "witness" as a protection against a dual concurrent mount at the same time, no not needed. The napp-it solution rely on a alive check + stonith (independent hardware reset of a former head) and the ZFS feature multihost to prevent dual mount.

maybe read also

 
  • Like
Reactions: AveryFreeman

AveryFreeman

consummate homelabber
Mar 17, 2017
254
23
18
40
Near Seattle
averyfreeman.com
You need either an SAS enclosure with 2 sata ports per disk (mostly 4-5 disks in a 3x5,25" disk bay) or an jbod with a dual expander where each expander is SAS connected to one of the heads.

If you mean "witness" as a protection against a dual concurrent mount at the same time, no not needed. The napp-it solution rely on a alive check + stonith (independent hardware reset of a former head) and the ZFS feature multihost to prevent dual mount.

maybe read also

Cool, thanks for the resources. Yeah, my current backplanes will not work. Will keep hunting for a replacement, or a decent deal on a new chassis.