HA Error

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
So after upgrading one of my boxes (new CPUs, latest Bios, new Housing) I experienced an issue with HA on my ESX box:

upload_2019-2-10_21-3-50.png

I tried for ages to find a reason yesterday, and today tried reinstalling - no avail.
I used a host profile to reconfigure dvswitch and everything, so maybe that also encompassed the reason for the HA issue, but I have found no way to fix this :(

Any ideas before I remove the customizations and then start recreating everything from scratch?
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
So disconnecting *and* removing from inventory helped with that particular error, but of course thats only a trade for another one...

upload_2019-2-10_23-42-23.png
 
  • Like
Reactions: ste.mo

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Turns out that a host profile does *not* actually create the vm kernel modules on the target host just add its to the switch and *pretends* it did the heavy lifting.

Manually created all my vmk's again (since you still can't do that from cli for a dvswitch afaik)... and now we are again at:

upload_2019-2-11_0-24-13.png

So much fun ...
 
  • Like
Reactions: ste.mo

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Ok, rebuilt the box from scratch now, as soon as I add it to the cluster - same HA error again...

Makes me start to think there is something wrong with the installation drive (32GB SATA DOM)... will switch to a regular SSD...
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Nope, same issue.
So the only thing left that I can think of is downgrading the bios again... else I am lost :(
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Yeah, thanks:)
If it only were so easy;)

ls /var/run/log/fdm-installer.log
ls: /var/run/log/fdm-installer.log: No such file or directory

Edit1
But thanks, your KB linked to this (outdated) KB VMware Knowledge Base which describes copying from Windows vcenter (which I dont have).
But then I looked at my VCSA and found I can't see the vib there at all... so maybe its not the box but the current vcenter appliance patch level...
This box is the first/only I added to the cluster with the latest release so that might be it ... makes way more sense than a bios issue ...

Edit2:

So and this greets me when I log in to vcenter mgmt:
upload_2019-2-12_8-35-25.png

Incomplete installation might explain a bit;)
Very weird that all else was working fine, have not noticed that in days with
upload_2019-2-12_8-38-37.png

Edit3
Nope. Failed over to second box, looks good there (in admin gui), still same HA issue, still file missing on cli (vcsa)

Just FYI, thats VCSA 6.7.0.21000

Edit4:
If I move out of the cluster, run Reconfigure for HA outside it works just fine.
Then I move back into the cluster - broken again
Also dvSwitch acting up, sometimes entries are missing, but exist on recreate... weird s**t
 
Last edited:

Dawg10

Associate
Dec 24, 2016
220
114
43
Yeah; I didn't think the link would be the answer, but it might get you thinking about a vib or VCSA version mismatch holding you out. Tough to offer direct help when you're only an apprentice...
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
No worries, really appreciate it.
Gave up for now, began to believe its a bug.
Will need to add another box to the cluster to verify that though.
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Second box has the same issue - so either its a bug or my VCSA has an issue...

Can anyone confirm its working with their VCSA ( & HA Cluster) at 6.7.0.21000?
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
So since rebuilding my existing vCenter failed all the time on importing the backup I set up a new one and imported that particular host into it.

Result - same error. Note this is the same .21 version, so will now have to try with an older one.
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
So since it is a pain in the behind to rebuild a vCenter Server with everything I was very happy to discover I had an actual vm backup of the vCenter server from just a month ago.

So I spent the day trying to get that to run... had to find out that o/c I needed to remove the HA state first... and then o/c without a running vCenter server dvSwitches are not working so I needed to move the vCenter IF to another switch... o/c I only have the dvswitch normally so I needed to use one with another ip range.,.. which then o/c did also not work...
Took me a while, but now the backup is up and running and I reconnected all the ESX boxes again, fixed the vsan issues occurring to mixed VC servers and incorrect dvSwitch config flying around and ... all is working again (i hope).

At least I can say that the issue with HA on that one particular node is not occurring with 6.7.0.20000.

So while I have learned a lot the most important lesson is - do actually take backups of all vCenter VMs, don't rely on internal vcenter backups and don't rely on quality control of vmWare.
 
  • Like
Reactions: Dawg10