ESXI 7 vmotion issue

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
Hey All,

Adding to the list of ESXI 7 annoyances, i recently been trying to get vmotion working between similar server builds. Originally the primary build had an e-2288G cpu and the second build had an e-2176G cpu. I though the issue with vmotion a live vm was due to cpu generation changes. So i recently got a replacement e-2286G cpu for the second build. Both builds are the same motherboard, memory, etc. Only major change is cpu. I didnt realize the cpu was 6 core vs 8 core when i got it off ebay. I wasnt paying too much attention since i was trying to find one that was severely over priced. I got the replacement cpu in a used dell precision tower, so i was planning to reuse the old cpu in that.

Well last night i replaced the cpu in the second unit and tried to do a vmotion of live VM. Still got the cpu features difference error and saying to setup EVC.

Am i missing something here? Does vmotion now require exact cpu to do live moves? Or is the e-2286G that much different from e-2288G that there is feature differences? All i could tell was the cores were different, features were the same.

I tried to add the 2nd host cluster with EVC enabled for icy lake and it failed with cpu features again. So i dont know if this is an issue with esxi host not picking up cpu, or bios issue, or cpu differences even if in the same generation.

Any ideas?
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
Shut down all running VMs on all hosts. Set EVC mode to Intel/Skylake on both hosts. Power VMs back up. If your network is set up right,vmotion should work.
You'd want EVC enabled in any case, in event of future cluster expansion with a newer CPU.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
Shut down all running VMs on all hosts. Set EVC mode to Intel/Skylake on both hosts. Power VMs back up. If your network is set up right,vmotion should work.
You'd want EVC enabled in any case, in event of future cluster expansion with a newer CPU.
Thanks i did see that is the general consensus for fixing vmotion. I want to know why the two cpu dont report the same features enabled. My understanding is the only difference between the two cpu is cores.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
VMware is quite sensitive to even the smallest difference in CPU microcode, generally, identical CPUs aren't required, but don't get stuck on it.
My home vmware lab has 3 identical HP desktops in it and I still enabled EVC for reasons in my last post.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
fair enough. esxi 7 been a mixed bag lately so im not going to question it. Just wished i knew before i spent money on another cpu lol.

Anyways i tried to move the host into the new cluster setup with skylake and still get error with the host.


The host's CPU hardware does not support the cluster's current Enhanced vMotion Compatibility mode. The host CPU lacks features required by that mode. MDS_NO is not supported. RSBA_NO is not supported. IBRS_ALL is not supported. RDCL_NO is not supported. AVX-512 Vector Neural Network Instructions (AVX512VNNI) are unsupported. XSAVE of high 256 bits of ZMM registers ZMM0-ZMM15 is unsupported (ZMM_Hi256). XSAVE of Protection Key Register User State (PKRU) is unsupported. XSAVE of opmask registers k0-k7 is unsupported. XSAVE of ZMM registers ZMM16-ZMM31 is unsupported (ZMM_Hi16). Protection Keys For User-mode Pages (PKU) is not supported. Cache line write back (CLWB) is unsupported. AVX-512 Vector Length Extensions (AVX512VL) are unsupported. Advanced Vector Extensions 512 Foundation (AVX512F) are unsupported. Advanced Vector Extensions 512 Doubleword and Quadword (AVX512DQ) are unsupported. Advanced Vector Extensions 512 Confict Detection (AVX512CD) are unsupported. Advanced Vector Extensions 512 Byte and Word Instructions (AVX512BW) are unsupported. See KB 1003212 for more information. Host is of type: vendor intel family 0x6 model 0x9e

Any idea what gives? I updated all my hosts to latest patch levels in case that matters.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
can you provide a bit more details: You have an existing skylake cluster with EVC enabled(?) and you are trying to add a new host with what CPU?
Just FYI: you'd get the same errors adding a new ESXi host cluster or then trying to vmotion for basically the same reasons.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
No i wasnt using clusters at all prior to this. i just had a datacenter with 3 hosts setup. 2 of the hosts are similar builds, third is a dell r730 server.
i wanted to vmotion VCSA, network vms etc. between the two hosts when i do host updates. That was what i did in the past on older 6.7. Right now i can only vmotion a powered off vm. I cant do live.

So last night under the datacenter i added a new cluster. enabled EVC on it and tried to move the host 2 into the cluster. It was failing on cpu error.

i just disabled the EVC on the cluster and was able to move the host2 into it. When i try to set EVC it wont let me saying host 2 has issues

I think something broken between esxi host and it not seeing the cpu correctly. Im going to power off the host, unplug it, restart it and clear out bios logs etc. to see if that helps. Something feels broken here lol.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,050
437
83
What level EVC did you set on the new cluster and what is the CPU model on dell r730? Did you shut down all running VMs? You can't add new Hosts to the EVC cluster with running VMs.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
I tried cascade, sky and lake all three failing on host 2 with e-2286G cpu. All VMs were shitdown on host2 before moving into cluster. Dell r720 cpus are E5-2680 v3, but i have a lot of vms on it so i dont want to bring it down ATM.

I have the host in the cluster now after i disabled the EVC on the cluster. Now i cant change the EVC as it complains about cpu.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
e-2286g
1637191930102.png

e-2288G
1637191970768.png

cpu/evc metrix
1637192137514.png


looks like Intel® "Broadwell" Generation is the highest i can set with these cpus.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
weird thing i noticed is when i stop a VM the Vmware EVC is disabled under that VM configuration tab.
1637198000432.png
But when i start the VM it changes to Enabled and then lists a cpu mode thats not supported per the evc mode pics from prior post.
1637198064257.png

Also i cannot enable EVC on a VM that has performance counters enabled - IE for nested vm.

Is this normal behavior or is my VCSA screwed up?


VCSA

Version:7.0.3
Build:18901211

esxi hosts
Hypervisor:VMware ESXi, 7.0.3, 18905247

Edit - probable why i can vmotion the vm when it is off.... the evc is off.

Edit 2- this is standard behavior i found out. Ie when powering on a vm.
Configure the EVC Mode of a Virtual Machine (vmware.com)
 
Last edited:

arnbju

New Member
Mar 13, 2013
26
11
3
EVC is only required for live vMotion. When a VM is power off there is no issue powering it on again with a different CPU.

Different bios settings can cause issues, such as enabling/disabling AES NI.

vMotion between cpus with the same instruction set shut be no issue. In my homelab I can vMotion between Xeon D 1520 and Xeon E5-2650 v2 without any EVC set.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
EVC is only required for live vMotion. When a VM is power off there is no issue powering it on again with a different CPU.

Different bios settings can cause issues, such as enabling/disabling AES NI.

vMotion between cpus with the same instruction set shut be no issue. In my homelab I can vMotion between Xeon D 1520 and Xeon E5-2650 v2 without any EVC set.
yeah for most use cases shutting down and vmotion a vm will work. My issue is i need to move the VCSA VM off one host to another to do host upgrades. So its a pain to have to migrate storage to a nfs share, then power off VCSA, remove it from one host and register i on the other then start it back up. I've had a few times VCSA getting corrupted doing that. After everything is done on host being updated i reverse the process.

so for my setup now, i added a forth host using the dell tower and the e-2176G cpu/32gb ram. I added an old 750 ti and usb card to the new host. I moved my VM that requires nesting to the new host and setup a new w10 host with passthrough video and usb. I am using that host for server room access.

Next i plan on adding a new cluster and moving my two hosts with e-2200 series cpu into them and setting the EVC to highest common cpu set. I am hoping that this will fix my VCSA migration issue.
 

dswartz

Active Member
Jul 14, 2011
610
79
28
did you really mean 'power off' the vcsa? that might explain the corruption. why not shut down instead?
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
in the past it was due to newly setup QNAP Nas that had issues so it was data corruption. I just moved from freenas build to QNAP and assumed it all worked the same lol.

Also a side note - was kindly told u3 is pulled from vmware

So much for keeping systems up to date. In my case this is a mostly prod home lab for me, so these issue suck but can usually find a workaround. I feel bad for admins dealing with these patches on prod systems on large scales.
 

marcoi

Well-Known Member
Apr 6, 2013
1,532
288
83
Gotha Florida
I will preface with : I'm not VMware export so my experience is just home lab.

So i finally got a new cluster setup. I Setup two clusters for different purposes. First thing to note, if your VM is doing nested VM (performance counters etc.) that will not work under the EVC cluster. Thus part of a need for 2nd cluster to be setup.
I setup a cluster called EVC_Enabled and set the EVC to boardwell gen, since that is what is supported on the e-2200 cpus.
1637606269196.png

Under 7.0.3, the cluster looks like above. You have your home-->Cluster-->hosts-->VMs. I dont like how they are listed but that is just me. I moved one host at a time. Once the first host was setup and running in the cluster. I vmotioned the VCSA storage to NFS mounted datastore. I then shutdown the VCSA VM on non clustered host. I went to non clustered host Esxi interface and removed VCSA VM. I then went to the clustered Esxi host and added VCSA VM since the NFS was also on that host. I then started VCSA. It started with EVC enabled. Once VCSA was up and running, i shutdown the other VMs on the non clustered host and put it into the cluster. Then I was able to Vmotion VCSA from 2nd host back to the original host and also set the storage back to a local datastore.

So far things are working, idk what other headache will come down the line.
So i also setup another cluster called TPM_Enabled. I moved my 3 host into that cluster. I disabled EVC at the cluster level since it will not allow nested VMs. I wanted to have a VM of W11 running and to do so i had to setup a key provided and cluster before i could add a tpm module to a VM. Once i got that setup, i added the TPM to w10 VM and upgraded. One thing that is nice with ESXI7 and Nvidia, gpu pass-throughs work without having to "fix" things at VM and OS level. It just passthrough and drivers installed. So i have two VMs running on host 3 in this cluster. One is W11 with GPU and usb passthrough. It is acting as my server room pc and connected to monitor/keyboard/mouse. The other VM is what I use for business and has the need to run VM within the host.

So far all things are working with this setup. My distributed Switches still work, etc.

One last thing i noticed was a VCLS folder/VM was being created and on my NFS mount. I didnt know what that was till i looked it up, something new with 7.0.1. I had to tell the clusters to use local datastore for those system VMs and that removed the need for NFS.

Well hope this post helps others dealing with ESXI 7 issues.