Sorry for the long post, it was an even longer process but I am sure it will be of interest to some or at least entertaining enough for the others
Prelude:
I have been running 3 and 4 node vSan clusters for 3+ years now since I was looking for reliable VM hosting for my home environment. I run almost everything on VMs, firewall, monitoring, AD, file services and especially desktops & gaming boxes. My wife opted out in the early days (too much downtime) so its only the Kid and I on it but I do have a quite high expectation set and he is o/c not known for his patience
Original reason to move to VMs on everything was noise and power saving - the former was achieved (Zero clients), the latter ... well lets not talk about that.
I have been running various variations of vSan during that period (2 node + remote witness, 3 node, and finally 4 node) and decided to stay at 4 nodes due to increased resilience. O/c 2 node vsan with witness on my VPN connected remote backup server worked fine - until one day due to I can't remember what both of my local hosts went down at the same time ...
O/c i was running vCenter on vSan so it could move around ... but no vCenter, no dvSwitch... no dvSwitch no vSan... no vSan no vCenter... o/c that meant also no Firewall/no VPN, no AD and actually no vSan hosted desktops. What a fun day
Since that time I have an 'emergency admin box' (physical) in my office (directly connected to the router) so I can at least google thing in case of error.
The move from 3 nodes to 4 nodes came when doing maintenance on one node and it did not come up properly any more for whatever reason - no fun running on two nodes when any downtime of either of these would cause the cluster to fail again... so eventually I moved to 4 for peace of mind.
This was reinforced when one day an ESXi update went haywire and knocked two of my 4 boxes out of the cluster due to some dvSwitch issues (no communication despite the vmk's being there and hosts being member of the dvSwitch. Needed to remove the host from the switch and recreate the vmks by readding. This issue happened at least 10 times in those years... no proper cli for dvSwitch either and clone option in Gui did suck. They got it under control now I think, did not happen in a while).
So at that point there was a more or less stable cluster so I could think about other aspects - performance.
During that instable phase I had been moving VMs on and off vSan a lot - to local SSDs or a FreeNas box and it sucked. Also vSan hosted VMs did not perform that well with file operations, so I decided I need more performance. At that point I was looking to get 500MB/s for vMotions - sounded like a nice number and would keep vMotion times acceptable even for those 40G Windows boxes or even the 100G+ game VMs.
Prelude:
I have been running 3 and 4 node vSan clusters for 3+ years now since I was looking for reliable VM hosting for my home environment. I run almost everything on VMs, firewall, monitoring, AD, file services and especially desktops & gaming boxes. My wife opted out in the early days (too much downtime) so its only the Kid and I on it but I do have a quite high expectation set and he is o/c not known for his patience
Original reason to move to VMs on everything was noise and power saving - the former was achieved (Zero clients), the latter ... well lets not talk about that.
I have been running various variations of vSan during that period (2 node + remote witness, 3 node, and finally 4 node) and decided to stay at 4 nodes due to increased resilience. O/c 2 node vsan with witness on my VPN connected remote backup server worked fine - until one day due to I can't remember what both of my local hosts went down at the same time ...
O/c i was running vCenter on vSan so it could move around ... but no vCenter, no dvSwitch... no dvSwitch no vSan... no vSan no vCenter... o/c that meant also no Firewall/no VPN, no AD and actually no vSan hosted desktops. What a fun day
Since that time I have an 'emergency admin box' (physical) in my office (directly connected to the router) so I can at least google thing in case of error.
The move from 3 nodes to 4 nodes came when doing maintenance on one node and it did not come up properly any more for whatever reason - no fun running on two nodes when any downtime of either of these would cause the cluster to fail again... so eventually I moved to 4 for peace of mind.
This was reinforced when one day an ESXi update went haywire and knocked two of my 4 boxes out of the cluster due to some dvSwitch issues (no communication despite the vmk's being there and hosts being member of the dvSwitch. Needed to remove the host from the switch and recreate the vmks by readding. This issue happened at least 10 times in those years... no proper cli for dvSwitch either and clone option in Gui did suck. They got it under control now I think, did not happen in a while).
So at that point there was a more or less stable cluster so I could think about other aspects - performance.
During that instable phase I had been moving VMs on and off vSan a lot - to local SSDs or a FreeNas box and it sucked. Also vSan hosted VMs did not perform that well with file operations, so I decided I need more performance. At that point I was looking to get 500MB/s for vMotions - sounded like a nice number and would keep vMotion times acceptable even for those 40G Windows boxes or even the 100G+ game VMs.
Last edited: