Then how do I get the whole setup to start up again?
Does that mean my pfsense router VM also needs to be on the same Host1? Which should be the one to start up first, pfSense or VCSA? All my hosts are also connect to a APC network UPS via Powerchute Network Shutdown VM. I was told THAT Vm should be the the first to start, and last to shutdown.
If the host is in an HA cluster, setting the VM startup/shutdown order is disabled. So there's that to understand. But that's if you're configuring the host in vCenter. If you configure the host directly - via the Host Client (I'm on v6.5) you can use PowerShell or the GUI to set the host VM startup/shutdown options.
You want to set:
HOST -> Manage -> Settings -> System tab -> Autostart. Enabled=Yes. Start Delay 120 seconds or whatever you like. I'd enable "wait for heartbeat". STOP ACTION should be Shutdown or Suspend, according to your wishes. Once you do THAT, now when you tell the host to reboot or shutdown, it will/should attempt to shutdown/stop any VM's it can - or DRS vacate them.
So you need to make sure that DRS doesn't vacate them. You can:
- disabled HA and DRS if you like, this helps you have some control over what's happening.
- ensure that Host1 has your core VM's - your router if it's software, your VCSA, your Domain Controller. Maybe your SQL server or whatever if needed for other systems. Personally, I like to have a list/matrix of "homehost=" for each VM so when I do the shutdown, I can move the VM's to an appropriate host. That way, not only can I control the startup of Host1, but I can control the startup of VM's on Host2/Host3, which helps me get more VM's running if it takes 60-120 seconds for them to boot or "wait for heartbeat".
- set the VM startup order for those "known VM's" on each host to be what you want for a specific startup order. Set the rest to "any order" and they'll start after all the numbered ones, and keep going.
- stop all the VM's you don't need. They're going to take time to shutdown, especially at once. Use a loop to wait for them to shutdown on Host2/Host 3
- shutdown the non-primary hosts now that the VM's are off.
- now that only one host is up, HA and DRS can't move your VM's around anyway.
- shutdown the non-survivable host (eg: Host1) that will keep your core VM's on it. You've already set your startup order, so they'll startup on boot of the host. Because you also set the STOP ACTION to shutdown or suspend, those VM's will do that before the host powers off, as it does that in a controlled fashion.
How do you get it all to start up again?
- start Host1
- it will autostart the VM's in the order you specified. You can do your router before VCSA if you want, but until VCSA boots up some VM's, your router can't do much anyway - inbound or outbound. It's really personal preference. The only reason you need your Powerchute VM up, is if you are worried about a secondary power failure during your startup, and you need to emergency shutdown again. Personally, I feel that's a multiple cascade compound failure, and pretty hard to know where you're going with it. But it is a discussion to have, and design around. In your case, I'd start it up third. If you're a windows environment, you probably want your DC to come up before your router and before your VCSA, especially if your VCSA is AD integrated in any way.
- Power on Host2/Host3 - manually or via a script on the VCSA or another VM that sends a poweron signal to the IPMI/IDRAC, or the PDU ports to enable power to the host with the BIOS set to auto-power on, on power resume. Whatever will get your hosts to turn on.
- Host2/Host3 will start their specific startup VM's in order, then their any order - while Host1 continues to do it's own.
At this point you're started up.
Bonus points if you have to figure out how to auto start it ALL from scratch, with no humans, Switches, SAN, and all. In that case, I:
* Had a "building power PDU" that did NOT auto power on on power resume.
* Had a "UPS PDU" that powered on via the port on the UPS - ONLY if the UPS was sufficiently recharged to tolerate a second emergency shutdown. So not upon power resumption, but upon battery level reaching 60% or whatever the level is.
* The PDU then, upon having power resume, turned on the PDU power Primary power to the switches - which took 6 minutes to fully boot.
* The PDU then, turned on the PDU power Primary to the SAN - which took 4.2 minutes to start, but shouldn't/couldn't come up before the switches, or the network could go into a panic mode.
* The PDU then, turned on the PDU power Primary to Host1 - which had the BIOS set to Auto-Power On. This was set to be about 1 minute before the SAN was booted, so it had that time to get to the point that it needed the SAN.
* Host2/Host3 then get their PDU power Primary set to enabled, powering them up - about 2 minutes after Host1 PDU port comes on
* The VCSA/Windows vCenter runs a startup script to tell the Building Power PDU powering all the secondary PSU's that it can now turn on and restore secondary power to all devices.
Not saying it's perfect. But it works well enough for a 24U 3 Host/2 Switch/1 SAN environment, in a PLC/Gas Plant that has to be able to shutdown completely AND power on, with no trained or IT staff of ANY kind, or even humans period, as it might be in a remote location that no one can even get to.
I'm sure I can dig up my scripts for this, to help out. But the above is the basic process. There's absolutely no need for VCSA to be off cluster. You just need to look at each piece, its dependancy, and solve for it. Took me a lot of trial and error, and some stopwatch timers to figure out the startup order, and I had to retime things after firmware upgrades to ensure things still took the same time. But worked like a charm.
I hope that helps?