Proxmox and shared (remote) storage question - what happens when storage hosts is down and how to guard against it?

EngChiSTH

Member
Jun 27, 2018
94
32
18
Chicago
Proxmox newbi here
Building a basic cluster of 3 nodes (SFF business desktops from HP with i7-10700, 32GB RAM, 1TB SSD for OS drive)

Was evaluating both using distributed local storage (Ceph) and remote storage (NFS/iscsi share) , have a question on remote storage

Assume I have my cluster running and expose an NFS or iscsi from QNAP 332X LAN to the Proxmox - how do I guard against failure of the NAS itself?
Is there a way to transfer/failover the iscsi share or any NAS maintenance (i.e. firmware install for security patches) is a full system down event

for people that use Proxmox and consolidate storage on TrueNAS or equivalent - isn't this the same problem?

i think I have all the pieces in place for decent homelab set up - redundant hosts, decent CPU, local storage with proxmox installed, NAS with redundant storage and 10GBe connectivity to the switch. wondering what I else I may be missing or should be thinking about

Thank you!
 

Terry Wallace

PsyOps SysOp
Aug 13, 2018
184
112
43
Central Time Zone
So I manage 3 production Proxmox clusters and 2 home ones.
Its a matter of scale and cost. Iscsi and NFS can be setup to failover.. but then your talking a dual path nas system with redundant controller, some decent switching stack (either dual switches or managed stacked switches) and a lot of cabling.

Whenever you consolidate storage, your setting up a spof (single point of failure). Which can be fine if your running some medias services and experiments at home and you backup the nas to something else incase of nas failure.

The 3 node proxmox lets you run replication (which is not the same as distributed storage but has some perks)
3 nodes with ceph will need a couple of extra network cards and and an extra switch (or vlan) and some decent bandwidth (but switchs are cheap these days) but give you the most reliability.

Tell me what your looking to accomplish with your setup and I can give a little better advice


Here's one of my smaller Ceph clusters

1633096597135.png
 

EngChiSTH

Member
Jun 27, 2018
94
32
18
Chicago
what I have
- 1 W16 Server running barebone hardware from Lenovo SFF PC from 2016
- same Lenovo hardware hosts two Hyper-V VMs (one for Pi-Hole and one for Unifi controller). every time windows updates run internet access (pi-hole) gets disrupted. I have Veeam backup running and recovery key to do full restore to, however what do I do if I have say motherboard failure?
- spare hardware I bought (3xHP SF001 upgraded to intel 10700 CPU, 32 GB non ECC RAM)
- storage layer , QNAP 332x for NAS (15 TB of RAID-5 storage connected to 10G Switch Brocade 6450). I backup the NAS to secondary NAS device from Synology
- a physical machine I want to turn into VM containing software I care about (mostly tax returns over last X years)
- few virtual disks (all in Microsoft hyper-v format) for older VMs that I may need in the future
- I also have secondary Brocade 6450 currently in turned off state that I can bring online and load config to if primary fails


what I want
- keep infrastructure running by virtualizing domain controller into proxmox guest. be able to snapshot (if I get poisonous windows update) is a nice bonus. I dont want to depent 100% on Lenovo desktop from 2016 (its PSU, etc)
- move the existing VMs for pi-hole/unifi controller off domain controller into their own VMs/containers. Windows Update should not disrupt internet access..
- spin up new VMs :) . Kids are asking for Minecraft server, I want to retire the physical machine I use for taxes into guest that wakes up for few weeks a year, etc.

How do I structure this?
a. keep all VMs locally, create Ceph OSD , use the distributed storage. Each node contains both VMs and OSD , shared storage is only used for things like backup to NFS share and for ISO repository.
This gives auto failover for VMs, NAS could do its restarts without impacting VMs, etc. However requires fast network and would take good chunk of CPU power just to keep Ceph running. I asked on reddit for Proxmox and almost every comment was 'Ceph would never work for you' (you dont have ECC, 8C16T CPU is not sufficient, single OSD is too little, 3 nodes is too little, network too slow without at least 10G).
b. keep VM locally, create ZFS storage on each node, replicate/copy VMs on schedule between nodes. no auto failover, no Ceph resource requirements
c. keep very little locally - Proxmox is just compute. VMs are stored on NFS or iscsi exposed to the cluster, and use storage similarly on NFS/iscsi.

What would you recommend? My uptime requirements is -get internet access back within a day :) and dont lose all data (have backup you can recover from).
No zero downtime requirements.

I can budget spend money too if buying second QNAP for failover would help and/or can repurpose spare hardware into trueNAS host (however back to question on what would happen if I have to reboot remote storage for my cluster).
 
Last edited:

Terry Wallace

PsyOps SysOp
Aug 13, 2018
184
112
43
Central Time Zone
okay with the hardware you have.. yes ceph is out. My ceph nodes are 16 core/32threads 256gig ram and 40gig backbones x 2. And I dont run Vm's on them directly.
Best scenarios I can see for you.
make a 3 node cluster (note probably buy a 50$ sff pc no storage or cpu to speak of) to act as an extra node to keep (3 nodes in quorum should a real node go down)

on the 3 node cluster make sure you have enough ram and storage space and network connectivity (if possible 10 gig nics off ebay can be had for $20)
Run vm's on each node (spread the load) and make sure you replicate nightly to another node. then also backup the machines to your storage nas.
If you need extra large storage for a vm you can (inside the VM mount an nfs/smb attachment to your nas..)
I would recommend you run at least 2 domain controllers on different hosts if your relying on windows domain credentials for your home.
each of those should have the dns service installed and then all your machines should point to those 2 machines as DNS services. When you reboot one for windows updates.. client machines will use the second domain controller.
The only thing that should take down your machines is Proxmox root updates.. but you can migrate a machine to another host. to do that.
(assuming local space and enough ram)

That's how I would do it. (Other will probably have other opinions)

This is smaller 3 node setup I run for a dev house that's similar to what your talking about.

p.s. couldn't find any " HP SF001 " models so not sure what machines your running.

1633109001283.png
 

EngChiSTH

Member
Jun 27, 2018
94
32
18
Chicago
Thank you
First answers, then additional questions :)

- hardware for the cluster node is HP S01-PF1013W (link to details HPS01-PF1013W ) , comes with below specs (3 SATA ports max, 2 low profile PCIe slots , 1xM2 NVMe support . PCIe speed does not reach 10G so installing SPF+ card would give at most 7Gb due to bandwidth limitations..

- second domain controller :(, I don't own a license to run second Windows Server software. buying license would cost more than cluster hardware and hard to justify for home use (bought first one fully year ago). Unless Microsoft changed its licensing for virtualization, it would not be proper for me to spin up another one..

Now to questions
- regular copying VMs from primary to secondary/secondaries? How exactly does it work? i.e. I have prox01 node that runs vm "100", is it copied to both prox-02 and prox-03? or it is copied to one of those and cluster knows where to failover should prox-01 is not available? what happens if prox-02 (copy target) is not available when prox-01 goes down, VM just goes down as it is nowhere to recover? Do I have to maintain a map of what copies to where?

- the $20 10G cards, any recommendations? I typically look for Mellanox connectx3 searching for '311a' models and those are 35-40. my offers of less ($25) were rejected?

- getting a 'spare node' ( "an extra node to keep") , is there a guide for this? I've read that there is a way to spin up basic debian install, set up corosync (?) and that would allow cluster to keep quorum should single node go down. Is this actual install of proxmox even if proxmox would never be hosting VMs on that weak hardware spare node?

on storage -> I am coming full circle back to local storage. I can put at most 3 devices into any single node limited by the 2L case size (2 SATA SSDs and 1 NVME), any thoughts on what to put where?

Thank you!
 

Terry Wallace

PsyOps SysOp
Aug 13, 2018
184
112
43
Central Time Zone
Does that hardware have onboard video or are you required to use a card ?
Ideally I'd stick a low profile dual port 10gig(or 40 gig if your feeling adventureous) in the x16 slot. and a dual/quad (1gig) port in pci-x1 slot
Then use the nvme as boot/proxmox drives/ and add 2 ssd's in each as your data drives. and then 1 3.5" spinner as yoru slow tier data drive. (Slow tier would be for stuff that runs.. but is not speed dependant. like (my firewall, my dns etc.. those run in ram once the instance starts and have very low disk needs)

ebay (also where I pickup alot of mellanox) goes up and down last time I brought a bunch of 50 that were being dumped it got it down to 20$ each. But that was probably due to bulk.
 

EngChiSTH

Member
Jun 27, 2018
94
32
18
Chicago
The hardware does not require any video (CPU I use came with IGPs) so that slot is empty

I stood up a cluster to hosts VMs and in process of adding shares for things like backups, ISOs.
My next steps is setting up throwaway VMs for testing failover (ZFS ->ZFS regularly scheduled copies ) and then what I really want this for is an attempt to P2V Windows 10 install using Clonezilla to Proxmox destination

Overall , very pleased with Proxmox - logically laid our and simple to understand. Yes, a lot of features to (slowly) learn.


Still got things to do on networking front - pick up few more 10Gb cards for the nodes, either stack my Brocade 6450 switches or upgrade to 7250 so I can get enough SFP+ ports , probably increase node count to 5 (4 nodes plus quorum device). however, that is all not as urgent as having backups/virtual version of key infrastructure


1633379742902.png
 
Last edited:

Terry Wallace

PsyOps SysOp
Aug 13, 2018
184
112
43
Central Time Zone
the 7250 would do well for the port count.
Did you run a primary and a cluster network ?
or is everything on vmbr0
also your not going to get hot failover on replication.. only on shared storage.. so hot failover is a ceph solution. What you can do is make sure your machines are replicated to another node. And then should you need to take a host down you can migrate to the other node in a matter of milliseconds. (if you migrate to the node where you don't have a replication.. its cant transfer just the snapshot change and will copy the whole disk.. which will be slow)
also to support the snapshotting I really hope you formatted those 1tb disks as ZFS on install and not ext4.
 
Last edited:

EngChiSTH

Member
Jun 27, 2018
94
32
18
Chicago
the 7250 would do well for the port count.
Did you run a primary and a cluster network ?
or is everything on vmbr0
also your not going to get hot failover on replication.. only on shared storage.. so hot failover is a ceph solution. What you can do is make sure your machines are replicated to another node. And then should you need to take a host down you can migrate to the other node in a matter of milliseconds. (if you migrate to the node where you don't have a replication.. its cant transfer just the snapshot change and will copy the whole disk.. which will be slow)
also to support the snapshotting I really hope you formatted those 1tb disks as ZFS on install and not ext4.
single network right now - everything is on vmbr0. once I add 10G cards I think I would redo/upgrade that layer. what do you recommend?
for storage - I have OS drive that is intended for Proxmox own operations. Those are ext4 default Linux FS
for VMs, I use ZFS on drive created with the same name on each of the node and then designated as shared storage so if/when VM moves it comes up on the storage of the same hardware, same name, same filesystem on a separate node. obviously need to test ..

ISOs, backups, etc go to NFS share on the NAS.