vCenter High Availability trigger maintenance/shutdown

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Rand__

Well-Known Member
Mar 6, 2014
6,636
1,768
113
Hi,
I am running the native 6.5 vCHA in my home cluster as I couldnt enable FT for the vCenter vm.

Working good so far but requires manual intervention on host reboot which is kind of stupid. Basically this is due to the vCenter HA maintenance mode not being integrated into the ESX host maintenance mode.

So at the moment I have to put the host into maintenance mode (which does not complete if the vCenter VM is active), then need to set on maintenance for the vCenter, optionally fail over to secondary, manually shut down vCenter vm before the host maint mode is achieved and I can reboot or do whatever. Reverse procedure after maintenance is done.

Now the question is - is there a way to make this smarter ? Trigger a script when I trigger host maintenance mode for example ...

Or should I just wait it out until this gets improved from VMWare side?
Not doing this every day after all (at least not planned) :)

Thanks
 

NetWise

Active Member
Jun 29, 2012
596
133
43
Edmonton, AB, Canada
First, this won't get improved from VMware side, as it just doesn't really have to. Your struggles only really exist for ultrasmall/micro vSphere environments (SMB), and I'm pretty sure VMware spends more time having conference calls with Sasquatch, than they do with SMB's. But secondly, you just need to flesh out what you're trying to do better, and break out some PowerCLI.

It sounds like you may want to check out the other thread, regarding UPS shutdown, and my long posts in there about the high level process - ESXi / VCenter - New HomeLab Question

Some questions:
- what's vCHA? I thought I was up on my acronyms. Is this VCSA? The vCenter Server Appliance, or something else? I don't want to go into the weeds, and sound like an idiot only to find out it's one of the hundreds of flings they have that I forgot about :)
- FT is almost certainly to be avoided, I wouldn't worry about it. All it does is prevent an HA restart and just leaves it running. Your VCSA can bounce all day long, things will be just fine, in general.
- If you don't have DRS, then yes, of course when you put the host in maintenance mode, you will have to migrate your VM's manually. You either need DRS to automate it, manually do it, or script it. The "is there a way to make this smarter" is "DRS, automated mode". (back to VMware _already_ having improved/solved this)

So I'd trigger a script.

RebootHost1.ps1
===
- vMotion all VM's from Host1 to Host2+
- loop/wait/watch
- maintenance mode Host1 (can be alternated with above, as it will go into maintenance when it's empty anyway) - do you NEED maintenance mode? Could you just reboot? Do you need/want to prevent the host from auto-rejoining the cluster upon restart and manually intervene to put it back? If you don't need that pause/manual verification, then don't use maitnenance mode, just reboot it.
- reboot
- if your script runs from a surviving VM on NotHost1, then it can now sit with a "watch/wait/loop" cycle for the host to come back up, then remove it from maintenance mode - but this is likely an unnecessary step if you just rebooted vs using maintenance mode.

Make a similar RebootHost2.ps1 for doing the other host(s).

If you only had one host, or you want VCSA to be on host1 and start/stop with the host, then use the other link. Use your script to PUT the VCSA on Host1, connect to the host, disable DRS/HA if needed, set the VM startup order so that VCSA starts automatically (must be reset each time you shutdown, as if the cluster moves the VM off then back on, it won't auto pick that up, this is a process you have to do during shutdown scripting), set the VM stop option to be shutdown or suspend the VM, then tell the host to reboot - NOT use maintenance mode. The reboot will use the VM stop option to trigger the shutdown/suspend on all the VM's. By NOT being in maintenance mode, the host will automatically start the VM's you specified in order - or use the "any order" if you set the rest of them to that - and start everything back up.

That's what you typically have to do if you have an SMB single host, local storage, VCSA/DC on the same box. Which precludes you from using VUM for patching, as it can't be offline during host updates. But you can install that on a workstation somewhere if needed, and solve for that.
 

Rand__

Well-Known Member
Mar 6, 2014
6,636
1,768
113
Hi,
actually I saw the other thread thats what triggered the question :)

vCHA => vCenter High Availability is a 6.5 feature with a mirror'ed vCenter and a witness
Those are not on a shared storage so can't be moved around

FT - I have a bunch of VMs on FT and they just get moved off as expected

I will need to check powercli to see whether it can deal with the vCHA (better than with FT machines, didn't find a 6.5 way to temporarily stop FT for my backup pre script :( - The old way does not seem to work any more. But this is for another post I think)
 

NetWise

Active Member
Jun 29, 2012
596
133
43
Edmonton, AB, Canada
Right. Too many acronyms. What's the use case that requires HA for vCenter in your environment? Other than as a learning excercise. Never seen it actually implemented anywhere.
 

NetWise

Active Member
Jun 29, 2012
596
133
43
Edmonton, AB, Canada
So anyway, if they're on local storage and they're HA, just shut them down. They're on local storage, tell the host directly that they're part of the startup/shutdown, let them shutdown or suspend with the host. Then just shutdown the host. Shouldn't be anything more you have to do. The host will go down, it'll do the stop action on the VM that's left which can't move because of local disk, it'll reboot, and come back up and start the VM again, and once it boots up, it'll reconfigure and join the HA, I would think.
 

Rand__

Well-Known Member
Mar 6, 2014
6,636
1,768
113
I used to have vCenter on my vSan so it would survive (accidental or intentional) reboot of the vCenter hosting ESX. Unfortunately in my experience vSan does not start up properly if there is no vCenter around. So I ended up in a catch 22 situation when at some point my robo cluster with remote location witness went down completely - couldn't bring up vSan without vCenter (and witness) and couldn't bring up witness (connected via sophos red tunnel) without the sophos vm hosts on vSan for HA :p

So I set up a 3 node vsan cluster locally now and wanted to FT enable vCenter (used to have that in 6.0 ) but that didn't work. vCenterHA worked and so this hopefully ensures that I can get up vsan to start all the critical basic stuff (sophos, ad, horizon vms) which runs on top. At least as long as 2 of my 3 hosts are or can be brought up.
I am still contemplating standing up a 4th node but no idea what to do with all the compute power. And my electric bill is sky high anyway;)

So back on track - My issue is that ESX host maintenance mode is not in sync with vCenter Mainatenance mode, those are managed independently.
My workflow currently looks like this


Set maintenance mode on on ESX host
Wait until potentially running HA enabled VMs are moved off
Shut down potentially locally running vms (Freenas or maybe a gaming vm with passthrough device)
Enable vCenter HA
Shut down vCenter component on particular host

After all vms are down maintenance mode is on (before it will not be able to complete)
Do maintenance
Reverse procedure

I' had bad experiences with vsan enabled boxes when I simply rebooted them without setting on maintenance mode so I prefer to avoid this.

So basically I'd need to trigger a powercli script when I activate maintenance mode on a host.
Simple solution would be regularly polling to see whether MM is in progress, more elegant would be a trigger o/c, not sure that vmware supports that. Might be able to trigger this on a syslog message maybe or log entry, but have not dabbled into that area with esx yet.


In the end its probably easier just to continue the way I do now and run the steps manually :)
Hm alternatively I could start the whole maintenance mode activity via external script then I don't need a trigger at all... Edit: Just saw that's what you said;)
 
Last edited: