Storage (re)Architecture

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Zombielinux

Member
Jun 14, 2019
73
24
8
I know this is in Chassis and Enclosures but it relates to those.

I currently have 5 sandy bridge/haswell era Proxmox nodes, each with about 30TB worth of drives (of sizes 8, 4, 3, 2, and 1TB) spun up and all pooled together with Ceph. This works well for the most part, but I'm not sure its the most optimal.

I currently use CephFS to get data on/off and treat it as my NAS and storage for my docker containers.

I'm considering re-architecting to two storage nodes, each with ~75TB (half the total pool) in some BTRFS/LVM span, and then mirroring between the two nodes with GlusterFS. Then selling all the other components and buying a beefy compute server (probably Epyc).

I have on-prem working tape backup (tested). This is connected to a dedicated server with fiber channel

I also have offsite-cloud backup (also tested).

Question 1: Is this a wise(r) idea than what I currently have?

All these nodes are in 2u Supermicro Chassis (SC825 and S826's).

Question 2: Am I better off keeping these 2U systems, or should I investigate disk shelves? The systems are currently stock with no modifications.

I'm less familiar with the noise/power of disk shelves than the servers (65W draw each, without disks)

The ultimate goal would be to reduce power while maintaining HA storage resources and minimal HA core infrastructure (could run on a raspi easily, so low hw requirements)
 

Sean Ho

seanho.com
Nov 19, 2019
843
396
63
BC, Canada
seanho.com
The most substantive part of your proposed downsize is moving from 5-node clustered storage to 2-node plus witness. You could in theory achieve this with ceph (you'd still have to recreate your pools and cephfs) using size=2 min_size=1 replication, with MONs on both storage nodes plus a witness. There are very good reasons why you don't see people doing this, though.

Is HA storage necessary? A single NAS with disk shelves and zfs may save on power and be easier to administer. Also, upgrading to denser drives and getting rid of your old small spinners.
 

Zombielinux

Member
Jun 14, 2019
73
24
8
The reason I have 5 nodes to start with was because I found a 3 node ceph cluster to be dangerous to data integrity and not terribly performant.

As for HA storage, yes. Required for sufficient levels of SAF.

I've considered upgrading to denser drives, however with ceph, that means I lose IOPS and the whole homelab slows to a crawl (tried that kinda already). It seems in small clusters, ceph REALLY wants SSDs.

The closest Gluster Architecture I can find is Distributed Dispersed and making each disk a brick. Alternate would be Distributed Replicated where each brick would be some underlying FS that manages multiple drives.

I guess the next question would be what performance would look like with either of the two glusterfs modes.
 

Sean Ho

seanho.com
Nov 19, 2019
843
396
63
BC, Canada
seanho.com
My point is that ceph vs gluster is less important than your essential clustered storage design -- how many nodes, how many disks, what failure tolerance and what latency / client load expectations
 

rtech

Active Member
Jun 2, 2021
362
129
43
Wouldnt be better to solve redundnacy/data integrity on FS level and export the data via NFS?
 

Zombielinux

Member
Jun 14, 2019
73
24
8
Wouldnt be better to solve redundnacy/data integrity on FS level and export the data via NFS?
Given the different drive sizes, that may be challenging. Further, exporting via NFS doesn’t solve the HA aspect as I’m not sure you could put a load balancer in front of NFS.
 

Sean Ho

seanho.com
Nov 19, 2019
843
396
63
BC, Canada
seanho.com
I think rtech was thinking of a single NAS with software raid or similar, exporting shares over NFS. How you expose the storage for client use is kind of a separate issue; e.g., run the NFS server using an orchestrator (e.g., k8s) that restarts / live-migrates it to another node if the first node fails.
 

Zombielinux

Member
Jun 14, 2019
73
24
8
I think rtech was thinking of a single NAS with software raid or similar, exporting shares over NFS. How you expose the storage for client use is kind of a separate issue; e.g., run the NFS server using an orchestrator (e.g., k8s) that restarts / live-migrates it to another node if the first node fails.
And that’s kind of the architecture I’m looking at. Use gluster to facilitate the mirroring of data between NASs and containerize an NFS/SMB server for non-gluster capable clients.

The goal is to be as performant as the current ceph cluster (which occasionally has significant latency issues, and uses enough memory that some of my vms are OOMed), while also allowing for node maintenance, HA, and reduced power consumption.