Storage (re)Architecture

Zombielinux · Aug 14, 2023

I know this is in Chassis and Enclosures but it relates to those.

I currently have 5 sandy bridge/haswell era Proxmox nodes, each with about 30TB worth of drives (of sizes 8, 4, 3, 2, and 1TB) spun up and all pooled together with Ceph. This works well for the most part, but I'm not sure its the most optimal.

I currently use CephFS to get data on/off and treat it as my NAS and storage for my docker containers.

I'm considering re-architecting to two storage nodes, each with ~75TB (half the total pool) in some BTRFS/LVM span, and then mirroring between the two nodes with GlusterFS. Then selling all the other components and buying a beefy compute server (probably Epyc).

I have on-prem working tape backup (tested). This is connected to a dedicated server with fiber channel

I also have offsite-cloud backup (also tested).

Question 1: Is this a wise(r) idea than what I currently have?

All these nodes are in 2u Supermicro Chassis (SC825 and S826's).

Question 2: Am I better off keeping these 2U systems, or should I investigate disk shelves? The systems are currently stock with no modifications.

I'm less familiar with the noise/power of disk shelves than the servers (65W draw each, without disks)

The ultimate goal would be to reduce power while maintaining HA storage resources and minimal HA core infrastructure (could run on a raspi easily, so low hw requirements)

Sean Ho · Aug 14, 2023

The most substantive part of your proposed downsize is moving from 5-node clustered storage to 2-node plus witness. You could in theory achieve this with ceph (you'd still have to recreate your pools and cephfs) using size=2 min_size=1 replication, with MONs on both storage nodes plus a witness. There are very good reasons why you don't see people doing this, though.

Is HA storage necessary? A single NAS with disk shelves and zfs may save on power and be easier to administer. Also, upgrading to denser drives and getting rid of your old small spinners.

Zombielinux · Aug 14, 2023

The reason I have 5 nodes to start with was because I found a 3 node ceph cluster to be dangerous to data integrity and not terribly performant.

As for HA storage, yes. Required for sufficient levels of SAF.

I've considered upgrading to denser drives, however with ceph, that means I lose IOPS and the whole homelab slows to a crawl (tried that kinda already). It seems in small clusters, ceph REALLY wants SSDs.

The closest Gluster Architecture I can find is Distributed Dispersed and making each disk a brick. Alternate would be Distributed Replicated where each brick would be some underlying FS that manages multiple drives.

I guess the next question would be what performance would look like with either of the two glusterfs modes.

Sean Ho · Aug 14, 2023

My point is that ceph vs gluster is less important than your essential clustered storage design -- how many nodes, how many disks, what failure tolerance and what latency / client load expectations

rtech · Aug 14, 2023

Wouldnt be better to solve redundnacy/data integrity on FS level and export the data via NFS?

Zombielinux · Aug 14, 2023

rtech said:
Wouldnt be better to solve redundnacy/data integrity on FS level and export the data via NFS?

Given the different drive sizes, that may be challenging. Further, exporting via NFS doesn’t solve the HA aspect as I’m not sure you could put a load balancer in front of NFS.

Sean Ho · Aug 14, 2023

I think rtech was thinking of a single NAS with software raid or similar, exporting shares over NFS. How you expose the storage for client use is kind of a separate issue; e.g., run the NFS server using an orchestrator (e.g., k8s) that restarts / live-migrates it to another node if the first node fails.

Zombielinux · Aug 14, 2023

Sean Ho said:
I think rtech was thinking of a single NAS with software raid or similar, exporting shares over NFS. How you expose the storage for client use is kind of a separate issue; e.g., run the NFS server using an orchestrator (e.g., k8s) that restarts / live-migrates it to another node if the first node fails.

And that’s kind of the architecture I’m looking at. Use gluster to facilitate the mirroring of data between NASs and containerize an NFS/SMB server for non-gluster capable clients.

The goal is to be as performant as the current ceph cluster (which occasionally has significant latency issues, and uses enough memory that some of my vms are OOMed), while also allowing for node maintenance, HA, and reduced power consumption.

Search

Storage (re)Architecture

Zombielinux

Member

Sean Ho

seanho.com

Zombielinux

Member

Sean Ho

seanho.com

rtech

Active Member

Zombielinux

Member

Sean Ho

seanho.com

Zombielinux

Member