Descriptor corrupt on every new disk

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Tinkerer

Member
Sep 5, 2020
83
28
18
Running a freshly installed esxi 8.0 with 1 vm on local datastore in my homelab for some testing purposes. That VM has my LSI controller passed through and is running zfs on Archlinux.

I exported an NFS 4 share no_root_squash and using root credentials from ESX to mount the NFS share. It works and I can upload files from the ESX console using the datastore browser.

When I create a VM, it creates it fine but on power on it fails with the following message:
Code:
Failed - The file specified is not a virtual disk
Errors   The file specified is not a virtual disk
         Cannot open the disk '/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test.vmdk' or one of the snapshot disks it depends on.
         Module 'Disk' power on failed.
         Failed to start the virtual machine.
I found an article about this error and I can assert this is the issue described there:
Code:
[root@esxi:/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test] vmkfstools -v6 -d thin -i test.vmdk test_clone.vmdk
DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test-flat.vmdk" : open successful (269) size = 4096, hd = 0. Type 3
DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test-flat.vmdk" : closed.
DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test-flat.vmdk" : open successful (14) size = 17179869184, hd = 1056288. Type 3
Destination disk format: VMFS thin-provisioned
Cloning disk 'test.vmdk'...
DISKLIB-LIB_CLONE   : DiskLibCreateNativeClone: Incompatible object type 'file' specified.
DISKLIB-LIB_CLONE   : DiskLibCloneGrowInt: Failed to clone disk using Object Cloning.
Clone: 9% done.DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test_clone-flat.vmdk" : open successful (33554433) size = 4096, hd = 0. Type 3
DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test_clone-flat.vmdk" : closed.
DISKLIB-VMFS  : VmfsExtentCommonOpen: possible extent truncation (?) realSize is 0, size in descriptor 33554432.
DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test_clone-flat.vmdk" : failed to open (The file specified is not a virtual disk): Size of extent in descriptor file larger than real size. Type 3
DISKLIB-LINK  : DiskLinkOpen: Failed to open '/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test_clone.vmdk': : The file specified is not a virtual disk
DISKLIB-CHAIN : DiskChainOpen: "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test_clone.vmdk": failed to open: The file specified is not a virtual disk.
DISKLIB-LIB   : Failed to open 'test_clone.vmdk' with flags 0x8208 The file specified is not a virtual disk (15).
DISKLIB-LIB_CLONE   : DiskLibCloneGrowInt: Failed to open: The file specified is not a virtual disk
DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test_clone-flat.vmdk" : open successful (1115137) size = 0, hd = 0. Type 3
DISKLIB-VMFS  : "/vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test_clone-flat.vmdk" : closed.
DISKLIB-LIB   : DiskLibUnlinkInt: Disk delete successfully completed { result:0, Msg: 'The operation completed successfully', fileName:'test_clone.vmdk'}
Failed to clone disk: The file specified is not a virtual disk (15).
I can follow the article to fix corrupt disk descriptors, but the issue is that every VM I create, has this problem.

I tried dabbling with permissions on the datastores, applying root explicitly admin permissions, but that didn't help.

I removed the disk and deleted it from the datastore, and mounted a bootable ISO. The VM still fails to power on, this time with this error:
Code:
State Failed - Module 'Nvman' power on failed.
Errors         Module 'Nvman' power on failed.
               NVRAM file open /vmfs/volumes/60549f42-58c23f57-0000-000000000000/test/test.nvram (One of the parameters supplied is invalid).
               Failed to start the virtual machine.
This tells me all the files are actually corrupt.

One other thing I tested is to force sync writes on the NFS share. DIdn't help.

So Im at a loss, this has taken me several days now and I could really use some help :).
SInce I picked up this idea somewhere on these forums, I am hoping someone here has an idea how to fix this?

Thanks!
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
I don't know for sure - But maybe the datastore explorer file upload vs. vmdk/nvram have different I/O operations required, and for whatever reason those are being denied.

When you say you can "follow the article to fix..." do you mean the VM then works normally after? If so, which article are you following?

Also, maybe try:

- Use NFS3 instead - I've had some odd issues with 4
- Enable the NFS debug log facility on the server-side and see if there's any interesting info
- Create a user account that isn't root and chown your exported folder to that user
- If you use a thick provisioned disk, is the error different?
 
  • Like
Reactions: Tinkerer

Tinkerer

Member
Sep 5, 2020
83
28
18
I don't know for sure - But maybe the datastore explorer file upload vs. vmdk/nvram have different I/O operations required, and for whatever reason those are being denied.

When you say you can "follow the article to fix..." do you mean the VM then works normally after? If so, which article are you following?

It doesn't work because I can fix one file, but then another file complains. It creates the whole structure, allocates files and writes to them. When I open the config files they look good, but somewhere somehow there is something wrong with them.

I also have 2 local datastores on single disks, they work fine and VM's create and run normally.

Also, maybe try:

- Use NFS3 instead - I've had some odd issues with 4
- Enable the NFS debug log facility on the server-side and see if there's any interesting info
- Create a user account that isn't root and chown your exported folder to that user
- If you use a thick provisioned disk, is the error different?
Just want to say thank you for your suggestions. I will be giving these a shot today (just waiting for a data migration to finish) and I will let you know how it works out.
 
Last edited:

Tinkerer

Member
Sep 5, 2020
83
28
18
It has something to do with how ZFS exports NFS shares. It uses NFS4 but it gets confusing with ID mapping. I tried creating a new user on ESXi with the same UID as the Linux server that exports the share, but it refuses to connect, no matter what I try.

I disabled sharenfs on ZFS, and created an export with the system NFS utitilies using `all_squash,anonuid=948,anongid=948`. UID/GID 948 is the esx user that has permissions on the export directory.

This works.

I have no idea whether NFS4 is faster (I thought I had read that somewhere). For now it works.

Thanks for the suggestions!