ZFS mirror across servers

rthorntn · May 7, 2016

Hi,

Thanks for taking a look at my post!

I have a couple of the mITX Supermicro c2550 boards and want to achieve low-cost redundant clustering at home.

8TB disks are expensive, I have two at this point, I would like to put one in each c2550 server and synchronise the data across ethernet, so mirroring across ethernet.

Is this possible using Solaris 11.3 (I believe the c2550 boards work now), if it will work, can I take advantage of all the funky ZFS resiliency features, to prevent silent data corruption through bit rot (disks are <1 in 10^14) or would I need to have two mirrored disks per c2550 for that?

Thanks again.

Richard

Patrick · May 8, 2016

Richard,

Usually clustering will require 3+ nodes. The reason for this is that if there is a network issue and both of two nodes are still up but cannot talk to each other it is very hard to figure out which changes are valid.

Are you looking for a solution to basically keep a copy of all changed data on the second server (e.g. zfs send / receive or rsync could work) or actual RAID 1 mirroring across the servers (writes happen to both simultaneously) or a clustered system (scale out to say 5-100 nodes one day).

rthorntn · May 8, 2016

Thanks Patrick!

Ah OK 3 nodes to handle split-brain decisions

I guess it might help everyone if I state my requirements:

Home network
6TB of data stored at this point
Plex server
Hypervisor to spin up research VM's (prefer VSphere)
N+1 disk redundancy but server redundancy would be cool also
ZFS (prefer Solaris)
NFS
Solution capable of correcting bit-rot
CPU assisted AES disk encryption
Easy storage capacity upgrades
Very happy in the Unix CLI

I have a few servers:
2 x 6-core ATX
1 x c2750 mITX
4 x c2550 mITX
1 x 4-core ATX

I have a bunch of SSD's, 2TB drives, 10GbE cards and SAS HBA's. I like overkill if it's running on fairly energy efficient hardware, the minimum amount of servers to allow me to recover fairly quickly from a hardware failure would be great, it helps me learn, I don't need amazing performance.

I think I would like to have Plex running in at least Docker but maybe in a container or seperate VM. The Plex requirement probably means Solaris running on the c2550 for storage and a seperate server for VSphere (it doesn't look like people are having much luck running Solaris on VSphere with a SAS HBA in passthru and i'm not sure I could get Plex running with Solaris's Linux support).

Hope that helps and thanks again!

Richard

Patrick · May 8, 2016

How fast is your data growing? Where do you project it to be in 24 months?

The reason I am asking is because I would probably think about server-level redundancy and a backup solution. For example, I have a pretty nifty ZFS server. I use RAID 1 on disks there with L2ARC and ZIL devices. I then have another NAS box with its sole purpose of being a backup target.

The benefit I get is building a bigger/ better main storage server and having a bit more redundancy with the dedicated backup target.

rthorntn · May 8, 2016

Thanks again Patrick!

I think it will grow at about 2TB per year.

Do you use Solaris ZFS, is your NAS box another DIY server backing up to JBOD/RAID?

Where do all your "services" live (Plex in my case)?

Richard

whitey · May 10, 2016

There are no issues vt-D (pass-thru) of an LSI HBA to a Solaris/Illumos/Linux/FreeBSD stg appliance VM. Dunno where you got that sentiment from.

There WAS an odd issue where we found some linux kernel regression on LSI 2008 chipsets and I documented the hell out of it in several threads here/over on rockstor forum.

rthorntn · May 10, 2016

It was this one and I see you posting in there:

ESXI 6U1 passthrough with Solaris 11.3

I have this board with onboard 2308:

Supermicro | Products | Motherboards | Xeon® Boards | X9SRH-7TF

Also I was thinking of using my c2550 boards and AFAIK they don't support VT-D

I'm leaning towards Solaris 11.3 on bare metal X9SRH-7TF with Plex running in an LX branded container and a 11.3 backup server that comes on for a couple of hours a day to backup to the cheapo 8TB archive disks.

Thoughts?

unwind-protect · May 11, 2016

Is this mostly for the case where one server blows up completely, destroying all drives?

And you want it faster than sending ZFS snapshots can do?

One kinda-straightforward way is to base the filesystem in the active server on ZFS parts that are not directly on the disks. Instead each "disk" that ZFS sees is a raid1, backed twice: local and iSCSI to the other server.

Obviously if there is ever a connection problem you have the mother of all resyncs to do. And the backup machine can at best make casual readonly use of the filesystem (in ZFS probably not even that).

It isn't quit clear to me what kind of scenario you want to prepare against.

gea · May 11, 2016

The intention is not clear to me as well

You usually start with a regular storage server where you can replicate a ZFS filesystem to another ZFS filesystem on the same or another server. This is the usial procedure.

BTW
Rethink the archive disks. They are a bad idea for any raid. Performance on a resilver can be a disaster.

You can then think to virtualise services, either via zones on Solaris or you can use ESXi as a barebone virtualiser and virtualise anything including the storage VM. This gives you more flexibility regarding your VMs (can be anything from BSD over OSX, Linux, Solaris to Windows) and a very fast recover after a crash (under a minute for the storage VM from a ESXi OVA template).

Server mirroring or building Raid-Z arrays over servers is a very special use case.
You should only think of when you really need a a huge capacity or a recovery option of realtime data (cannot allow a delay of say 1-5 minutes like with replikation) or in a HA solution where you want to survive a whole system failure (Server + Storage). In such a case you can

- use 2+ ZFS storage server or a storage head + 2 or more storage node with disks
- create a pool with same size on any node and create an iSCSI target on the pool up to poolsize

- on the storagehead (can be a combined with a node), create an iSCSI intiator create a pool from a mirror or Raid-Z over the targets. This would allow to use cheap nodes with Sata disks to achieve a cheap but fast Petabyte box.

If the head fails, you can use an initiator on another server to import the disks/ pool/ data and keep services uo with current data.

Search

ZFS mirror across servers

rthorntn

Member

Patrick

Administrator

rthorntn

Member

Patrick

Administrator

rthorntn

Member

whitey

Moderator

rthorntn

Member

unwind-protect

Active Member

gea

Well-Known Member