ESXi 6.0 NFS with OmniOS Unstable - consistent APD on VM power off

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

socra

Member
Feb 4, 2011
81
2
8
No psu is not the issue..will post progress as soon as I restarted my OI vm ...(30 min) typing from tablet sucks
 

socra

Member
Feb 4, 2011
81
2
8
So this is what I did tonight:
shut down all vm's
removed all vm's from inventory
shut down my host
pulled powercable
waited 5 mins
connected the 2.5 160 GB SATA drive to the M1015
powered on..works!!!! (must have been the complete pulling of the powercable that fixed it)

Onward to the testing

TEST 1
- powered on OI
- created a new testpool (v28) from the 160 GB drive
- exported the testpool
- shutdown OI, disconnect M1015 and gave it OmniOS
- imported the pool
- enabled NFS
- reset acl from Nappit menu
- created a vm
- started the vm, result: APD after starting a VM


TEST 2
- upgraded zfs to "omnios version"
before zfs upgrade:


after zfs upgrade:

- Created a new VM on this datastore, then started the VM..
Result: APD after starting a VM

TEST 3
- Destroy the OI imported pool and created a new pool from OmniOS itself

plus new zfs filesystem


- Created a new VM on this datastore, then started the VM..
Result: APD after starting a VM

TEST 4
- Upgraded my host from ESXI 5.5 patch 3 to the latest patch 5 (2718055),
- Created a new VM on the OmniOS datastore, then started the VM..
Result: APD after starting a VM (NFS from OmniOs stays pingable)

Code:
2015-08-06T18:03:45.683Z cpu2:41888)VSCSI: 271: handle 8196(vscsi4:0):Input values: res=0 limit=-1 bw=-1 Shares=-1
2015-08-06T18:03:45.939Z cpu3:41889)VMMVMKCall: 224: Received INIT from world 41889
2015-08-06T18:03:45.968Z cpu1:33844)Config: 346: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
2015-08-06T18:03:45.968Z cpu0:41889)WARNING: NetDVS: 547: portAlias is NULL
2015-08-06T18:03:45.968Z cpu0:41889)Net: 2312: connected TESTVM02 eth0 to CIFS-VM Network, portID 0x6000008
2015-08-06T18:03:45.969Z cpu0:41889)NetPort: 1426: enabled port 0x6000008 with mac 00:00:00:00:00:00
2015-08-06T18:04:00.002Z cpu1:36612)World: 14302: VC opID hostd-da76 maps to vmkernel opID 78eeee4
2015-08-06T18:04:10.213Z cpu1:32790)StorageApdHandler: 265: APD Timer started for ident [1a961821-6de33ef6]
2015-08-06T18:04:10.213Z cpu1:32790)StorageApdHandler: 414: Device or filesystem with identifier [1a961821-6de33ef6] has entered the A                                          ll Paths Down state.
2015-08-06T18:04:10.213Z cpu1:32790)StorageApdHandler: 856: APD Start for ident [1a961821-6de33ef6]!
2015-08-06T18:04:20.004Z cpu0:33849)World: 14302: VC opID 42CF685C-00000290 maps to vmkernel opID c0f5563c
2015-08-06T18:04:24.237Z cpu2:32790)NFSLock: 610: Stop accessing fd 0x4108be70f9e8  3
2015-08-06T18:04:24.237Z cpu2:32790)NFSLock: 610: Stop accessing fd 0x4108be712908  3
2015-08-06T18:04:24.237Z cpu2:32790)NFSLock: 610: Stop accessing fd 0x4108be6e2c28  3
2015-08-06T18:04:24.237Z cpu2:32790)NFSLock: 610: Stop accessing fd 0x4108be69d718  3
2015-08-06T18:04:28.623Z cpu2:41889)WARNING: VSCSI: 3565: handle 8196(vscsi4:0):WaitForCIF: Issuing reset;  number of CIF:1
2015-08-06T18:04:28.623Z cpu2:41889)VSCSI: 2447: handle 8196(vscsi4:0):Reset request on FSS handle 152607 (1 outstanding commands) from (vmm0:TESTVM02)
2015-08-06T18:04:28.623Z cpu1:32879)VSCSI: 2728: handle 8196(vscsi4:0):Reset [Retries: 0/0] from (vmm0:TESTVM02)
2015-08-06T18:04:28.623Z cpu1:32879)VSCSI: 2521: handle 8196(vscsi4:0):Completing reset (0 outstanding commands)
TEST 5:
Added an E1000 NIC to Gea's OmniOS appliance (disabled the NFS VMXNET3 adapter)
gave the E1000 NIC another IP address to make sure
Mounted the pool, enabled NFS share on the datastore, added the store to ESXi,
Created a new VM on the datastore..after powering it on..
Result: APD after starting a VM


During all tests the NFS ip address of OmniOs stays pingable from ESXi

So shut the OmniOS down, gave the M1015 back to OI and boom I'm back (so can't be a hardware issue)

I mean WTF...
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
WTF indeed, ok can you provide OmniOS release/build. I've used version from 008 to I think 014 now w/ no issues.

Can you try to setup iSCSI from OmniOS/Napp-it and conn to ESXi hosts via SW iSCSI initiator and see if it behaves the same/exhibits same symptoms? If you need help w/ this I can assist further or provide more details on how to accomplish this.

EDIT: Hey bro, btw you 'should' be vmkping'ing that NFS export interface not regular pings but I digress since the OI is working. You on a flat network or have that IP Stg traffic vlan'ed/carved off?
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Can you try iSCSI to humor me? Even w/ AIO setup this should be possible (napp-it/omni Vm runing on ESXi host, passes disks to HBA, create iSCSI volume instead of NFS FS, setup SW iSCSI init on ESXi and your off to races)

Just for reference though, I am running pure OmniOS v11 r151010 (no napp-it) but i know I have recently done an AIO setup on r151014. Scratching my head for sure...
 

socra

Member
Feb 4, 2011
81
2
8
I am truly happy you want to help (and @everyone thanks so far for pitching in)..damn these timezones..(I'm in EU you in US I guess)..already nighttime over here..
Will see if I can get iscsi to work..but what then? do I need to create a VM on my local ssd and then give it the iscsi lun as an extra hd?
Is there also somewhere in OmniOS where I can check if things are going haywire?

Was also thinking that I could try installing OmniOS from scratch to see what happens but I love the idea of just downloading and importing the appliance, give it some ip's and off we go..

Maybe when I give the M1015 from OI to OmniOS I should reboot my ESXi first before trying to start a VM..? (it's a stretch but getting desperate)
 
Last edited:

socra

Member
Feb 4, 2011
81
2
8
Also..as mentioned earlier I didn't even needed to start a VM, just adding the NFS share and then going to SSH and doing a LS command was enough to get APD

What firmware are u running on the LSI M1015 with the AIO setup?
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
v19, 9211-8i's (LSI) and H310's (Dell, re-flashed to LSI FW v19 as well)

Can you access/mount that NFS share and access data from another NFS client (Linux box arnd or another VM to try to isolate issue further...grasping at straws here...
 

socra

Member
Feb 4, 2011
81
2
8
v19, 9211-8i's (LSI) and H310's (Dell, re-flashed to LSI FW v19 as well)

Can you access/mount that NFS share and access data from another NFS client (Linux box arnd or another VM to try to isolate issue further...grasping at straws here...
I could try that also..I have VMware workstation on my PC, could try installing a Linux VM and then access the NFS share and see what happens..
 

socra

Member
Feb 4, 2011
81
2
8
Oke last post for me then for now..any tips what Linux I should use to test connecting to the NFS share I'm going to create?
(Preferably with a GUI :) )
I'll create a new OmniOS pool on that one 160GB disk I have and then will try to access it from a Linux VM from the local SSD on my ESXi machine because I just realized I can't connect from outside the ESXI box cause it is a AIO after all

Also where can I best check OmniOS for errors in realtime when I access the NFS share..?
 

socra

Member
Feb 4, 2011
81
2
8
Well, didn't go the route of Linux just yet, wanted to create a VM from the ground up so I did using plain OmniOS 151014
(just using E1000 adapters)

After that installed Napp-IT, imported the testpool and enabled NFS by using the the root=@<esxi ip address>
Is it enough to just enable NFS or should you always use root=@<ip address>?

Went into ssh and at first I got an APD after accessing the share.
Waited for it to come back and then tried creating a folder which worked!
After that I was able to create a VM, starting it and even installed Windows 8 on the VM in the test pool !!!

I thought I was homefree but after a short while, the APD issues came back.. :(


some info from hostd.log
Code:
2015-08-07T10:16:23.709Z [58240B70 verbose 'Hostsvc.FSVolumeProvider'] RefreshOneNasVolume: calling ProcessVmfs on 192.168.20.44:/testpool02/testds02
2015-08-07T10:16:23.709Z [58240B70 verbose 'Hostsvc.FSVolumeProvider'] DeleteApdStarted: Clearing volume testds02 from APD Started list
2015-08-07T10:16:23.709Z [FF99FB70 info 'Vimsvc.ha-eventmgr'] Event 113 : Device or filesystem with identifier 85f7d75e-e65b5768 has entered the All Paths Down state.
2015-08-07T10:16:23.709Z [58240B70 verbose 'Hostsvc.FSVolumeProvider'] ProcessNas: Inaccessible datastore testds02, volUid 85f7d75e-e65b5768, reason APD Start
2015-08-07T10:16:23.710Z [58240B70 verbose 'Hostsvc.Datastore'] NotifyIfAccessibleChanged -- notify that datastore 192.168.20.44:/testpool02/testds02 at path /vmfs/volumes/85f7d75e-e65b5768 now has accessibility of false due to AllPathsDown_Start
2015-08-07T10:16:23.710Z [FF99FB70 verbose 'Vmsvc.vm:/vmfs/volumes/4fff0ea3-5b924ce8-9acd-001b785d0f15/VNAS02/VNAS02.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x58d7f758]UPDATE-NOW-DISCONNECTED, 192.168.20.44:/testpool02/testds02, /vmfs/volumes/85f7d75e-e65b5768;
2015-08-07T10:16:23.710Z [FF99FB70 verbose 'Vmsvc.vm:/vmfs/volumes/4fff0ea3-5b924ce8-9acd-001b785d0f15/DUMMYVM/DUMMYVM.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x58d7f758]UPDATE-NOW-DISCONNECTED, 192.168.20.44:/testpool02/testds02, /vmfs/volumes/85f7d75e-e65b5768;
2015-08-07T10:16:23.710Z [58240B70 verbose 'Hostsvc.DatastoreSystem'] DatastoreSystemImpl.UpdateSystemSwapConfigIssue : System swap is not active
2015-08-07T10:16:23.710Z [FF99FB70 verbose 'Vmsvc.vm:/vmfs/volumes/4fff0ea3-5b924ce8-9acd-001b785d0f15/VNASTEST03/VNASTEST03.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x58d7f758]UPDATE-NOW-DISCONNECTED, 192.168.20.44:/testpool02/testds02, /vmfs/volumes/85f7d75e-e65b5768;
2015-08-07T10:16:23.710Z [FF99FB70 verbose 'Vmsvc.vm:/vmfs/volumes/4fff0ea3-5b924ce8-9acd-001b785d0f15/VNAS01/VNAS01.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x58d7f758]UPDATE-NOW-DISCONNECTED, 192.168.20.44:/testpool02/testds02, /vmfs/volumes/85f7d75e-e65b5768;
2015-08-07T10:16:23.710Z [FF99FB70 verbose 'Vmsvc.vm:/vmfs/volumes/4fff0ea3-5b924ce8-9acd-001b785d0f15/VNASTEST02/napp-it_15b.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x58d7f758]UPDATE-NOW-DISCONNECTED, 192.168.20.44:/testpool02/testds02, /vmfs/volumes/85f7d75e-e65b5768;
2015-08-07T10:16:23.710Z [FF99FB70 verbose 'Vmsvc.vm:/vmfs/volumes/85f7d75e-e65b5768/TESTVM07/TESTVM07.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x58d7f758]UPDATE-NOW-DISCONNECTED, 192.168.20.44:/testpool02/testds02, /vmfs/volumes/85f7d75e-e65b5768;
2015-08-07T10:16:23.710Z [FF99FB70 warning 'Vmsvc.vm:/vmfs/volumes/85f7d75e-e65b5768/TESTVM07/TESTVM07.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 192.168.20.44:/testpool02/testds02 is not accessible
2015-08-07T10:16:23.710Z [FF99FB70 info 'Vmsvc.vm:/vmfs/volumes/85f7d75e-e65b5768/TESTVM07/TESTVM07.vmx'] UpdateStorageAccessibilityStatusInt: Vm's storage accessibility status changed to false
2015-08-07T10:16:23.710Z [FF99FB70 info 'Vmsvc.vm:/vmfs/volumes/85f7d75e-e65b5768/TESTVM07/TESTVM07.vmx'] VM config backing gone -- marking VM invalid.
2015-08-07T10:16:23.710Z [FF99FB70 info 'Vmsvc.vm:/vmfs/volumes/85f7d75e-e65b5768/TESTVM07/TESTVM07.vmx'] Marking VirtualMachine invalid
2015-08-07T10:16:23.710Z [FF99FB70 warning 'Vmsvc.vm:/vmfs/volumes/85f7d75e-e65b5768/TESTVM07/TESTVM07.vmx'] Failed to find activation record, event user unknown.
2015-08-07T10:16:23.711Z [FF99FB70 info 'Vimsvc.ha-eventmgr'] Event 114 : Configuration file for Unknown 2 on esxihost.domain.local in ha-datacenter cannot be found
2015-08-07T10:16:23.711Z [FF99FB70 info 'Vmsvc.vm:/vmfs/volumes/85f7d75e-e65b5768/TESTVM07/TESTVM07.vmx'] State Transition (VM_STATE_OFF -> VM_STATE_INVALID_LOAD)
2015-08-07T10:16:23.728Z [59240B70 verbose 'Default' opID=E49A80CC-00000099 user=root] AdapterServer: target='vmodl.query.PropertyCollector:session[99604911-1ab0-f762-195e-fde5af866344]525f88af-11c8-0f3f-1839-c0ee4497ccdc', method='waitForUpdates'
2015-08-07T10:16:23.729Z [58281B70 verbose 'Default' opID=E49A80CC-0000009A user=root] AdapterServer: target='vmodl.query.PropertyCollector:ha-property-collector', method='waitForUpdates'
2015-08-07T10:16:24.716Z [58281B70 verbose 'Default' opID=E49A80CC-0000009B user=root] AdapterServer: target='vmodl.query.PropertyCollector:session[99604911-1ab0-f762-195e-fde5af866344]525f88af-11c8-0f3f-1839-c0ee4497ccdc', method='waitForUpdates'
2015-08-07T10:16:24.731Z [FF93C920 verbose 'Default' opID=E49A80CC-0000009C user=root] AdapterServer: target='vmodl.query.PropertyCollector:ha-property-collector', method='waitForUpdates'
2015-08-07T10:18:07.239Z [57EE2B70 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 9483MHz/28571MB to 9483MHz/28570MB
2015-08-07T10:18:11.881Z [FF93C920 info 'Hostsvc.VmkVprobSource'] VmkVprobSource::Post event: (vim.event.EventEx) {
-->    dynamicType = <unset>,
-->    key = 0,
-->    chainId = -5913944,
-->    createdTime = "1970-01-01T00:00:00Z",
-->    userName = "",
-->    datacenter = (vim.event.DatacenterEventArgument) null,
-->    computeResource = (vim.event.ComputeResourceEventArgument) null,
-->    host = (vim.event.HostEventArgument) {
-->       dynamicType = <unset>,
-->       name = "esxihost.domain.local",
-->       host = 'vim.HostSystem:ha-host',
-->    },
-->    vm = (vim.event.VmEventArgument) null,
-->    ds = (vim.event.DatastoreEventArgument) null,
-->    net = (vim.event.NetworkEventArgument) null,
-->    dvs = (vim.event.DvsEventArgument) null,
-->    fullFormattedMessage = <unset>,
-->    changeTag = <unset>,
-->    eventTypeId = "esx.problem.vmfs.nfs.server.disconnect",
-->    severity = <unset>,
-->    message = <unset>,
-->    arguments = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          dynamicType = <unset>,
-->          key = "1",
-->          value = "192.168.20.44",
-->       },
-->       (vmodl.KeyAnyValue) {
-->          dynamicType = <unset>,
-->          key = "2",
-->          value = "/testpool02/testds02",
-->       },
-->       (vmodl.KeyAnyValue) {
-->          dynamicType = <unset>,
-->          key = "3",
-->          value = "85f7d75e-e65b5768-0000-000000000000",
-->       },
-->       (vmodl.KeyAnyValue) {
-->          dynamicType = <unset>,
-->          key = "4",
-->          value = "testds02",
-->       }
-->    ],
-->    objectId = "ha-eventmgr",
-->    objectType = "vim.HostSystem",
-->    objectName = <unset>,
-->    fault = (vmodl.MethodFault) null,
--> }
2015-08-07T10:18:11.882Z [FF93C920 info 'Vimsvc.ha-eventmgr'] Event 115 : Lost connection to server 192.168.20.44 mount point /testpool02/testds02 mounted as 85f7d75e-e65b5768-0000-000000000000 (testds02).
2015-08-07T10:18:18.325Z [FF93C920 verbose 'Hostsvc.FSVolumeProvider'] RefreshOneNasVolume called on 192.168.20.44:/testpool02/testds02
2015-08-07T10:18:18.325Z [57EE2B70 verbose 'Hostsvc.DatastoreSystem'] StorageApdUpdate: got Storage APD message [N11HostdCommon24VmkernelUpdateStorageApdE:0x58686780] timestamp=864018786 updated=85f7d75e-e65b5768 eventtype=2 devicetype=2 with identifier 85f7d75e-e65b5768
2015-08-07T10:18:18.325Z [57EE2B70 verbose 'Hostsvc.DatastoreSystem'] StorageApdUpdate: Received APD_EXIT

2015-08-07T10:18:18.326Z [59240B70 info 'Vimsvc.ha-eventmgr'] Event 116 : Restored connection to server 192.168.20.44 mount point /testpool02/testds02 mounted as 85f7d75e-e65b5768-0000-000000000000 (testds02).
I've been at this now for three nights straight..gonna take a small break from this...reading the threads above I might now be the only one struggling with this
 

TechIsCool

Active Member
Feb 8, 2012
263
117
43
Clinton, WA
techiscool.com
So I just read through all your posts since I last looked.

The last option is to assume something is funky in ESXi

Clean Install of ESXi on Media after a full format.
You can still keep the VMFS but make sure to keep only the vmdk not the virtual machine file.
 

socra

Member
Feb 4, 2011
81
2
8
That could be, also have a spare usb drive so I could try re-installing esxi 5.5 u2 ...I've now downloaded a solaris iso to see what that brings me with regards to creating a new pool from that 160 GB drive I've still got connected to my LSI.
Installing Napp-it as we speak..
 

socra

Member
Feb 4, 2011
81
2
8
Also: Is it enough to just enable NFS or should you always use root=@<ip address>? when enabling the NFS share?
I've been reading a lot and can't help but wonder why people are getting so much diversity in their experience with OmniOS and networking for example (VMXNET3 and E1000) most people are using the same hardware and still...so...much...randomness...
This post tells that you should move away from E1000 and use VMXNET3 ?
[H]ard|Forum - View Single Post - OpenSolaris derived ZFS NAS/ SAN (Nexenta*, OpenIndiana, Solaris Express)