Understanding Corosync Quorum for Proxmox Cluster

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

firworks

Member
May 7, 2021
37
27
18
I've been running a 3 node hyperconverged cluster on Proxmox with Ceph for almost a year now. Recently I had a particular single threaded workload that really benefits from fast cores so I added a fourth node to the cluster with a much faster CPU but limited other resources. It allows me to migrate VMs to it when better performance is needed. However, it is not part of the Ceph cluster as it is a client only using the storage for VM disks.

Essentially, the cluster should be able to run without this 4th node, so my thinking is that I should be able to configure Corosync in Proxmox to ignore the 4th node for quorum purposes by setting expected votes to 3, and setting the 4th node's number of votes to zero. I couldn't actually find any examples of someone running like this though so I was hoping to maybe talk it through here first and see if I'm missing something. Also if a node has no votes for Corosync quorum, can it still be used to run HA VM/CTs?

I'm wondering about all this because just a few minutes ago I heard the fans spin up and went down to check and all nodes had rebooted. This has happened twice now since adding the fourth node to the cluster a few days ago. I believe this is expected behavior when the cluster loses quorum to restart all nodes after some short timeout. I'm not sure why it lost quorum but it's been running reliably for like I said almost a year without any issues with only the original 3 nodes. Just from examining the Corosync log on some of the nodes it appears there are a tremendous amount of warnings about incompatible MTU between nodes but I don't know if that would be something that could prevent them from syncing. I've since bumped the new node up to 8000 to match the other nodes.