Datastore not found

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Rand__

Well-Known Member
Mar 6, 2014
6,635
1,768
113
Looking for some help here from more proficient VMware guys...

So I made a stupid mistake today and hot-unplugged a Mellnox card from my box running ESX 6.5.

I accidentally also shook loose a network cable, but that happens once in a while so I was not worried when the box didnt react. Didnt think of it at first though and rebooted it via power switch. Never had an issue with that before.
Now after reboot the box came up but only some of my vms where starting as expected, some indicated a missing datastore.

Turns out the nvme drive (Intel P3500) those (o/c important vms) resided on was not mounted as datastore any more.
In fact its not recognized as datastore/storage adapter at all any more.

Its still there - visible in lspci and pass through but not in storage adapters.
I then went on to remove potential older pci pass through xonfigs and rebooted - nothing.
I moved the adapter to another host, same issue.

I see some errors in vmkernel log but can't make much of them
017-01-13T21:12:02.675Z cpu4:66024)VMK_PCI: 915: device 0000:07:00.0 pciBar 0 bus_addr 0xfb710000 size 0x4000
2017-01-13T21:12:02.675Z cpu4:66024)DMA: 646: DMA Engine 'nvmeCtrlrDmaEngine' created using mapper 'DMANull'.
2017-01-13T21:12:02.675Z cpu4:66024)VMK_PCI: 765: device 0000:07:00.0 allocated 2 MSIX interrupts
2017-01-13T21:12:08.393Z cpu2:65641)nvme:nvmeCoreLogError:370:command failed: 0x4305592b8f90.
2017-01-13T21:12:08.393Z cpu2:65641)nvme:nvmeCoreLogError:370:command failed: 0x4305592b9110.
2017-01-13T21:12:08.394Z cpu2:65551)nvme:nvmeCoreLogError:370:command failed: 0x4305592b9290.
2017-01-13T21:12:10.396Z cpu4:66024)nvme:NvmeCore_SubmitCommandWait:1044:command 0x4305592b9410 failed, putting to abort queue.
2017-01-13T21:12:10.396Z cpu4:66024)nvme:NvmeCtrlr_RequestIoQueues:1164:Failed requesting nr_io_queues 0x0
2017-01-13T21:12:10.396Z cpu4:66024)nvme:NvmeCtrlr_Start:1647:Failed to allocate hardware IO queues.
2017-01-13T21:12:10.396Z cpu7:66275)nvme:NvmeCore_SubmitCommandWait:1044:command 0x4305592b9590 failed, putting to abort queue.
2017-01-13T21:12:12.496Z cpu7:66275)nvme:NvmeCore_SubmitCommandWait:1044:command 0x4305592b9710 failed, putting to abort queue.
2017-01-13T21:12:12.496Z cpu7:66275)nvme:NvmeCtrlr_ConfigAsyncEvents:2763:Async event config failed
2017-01-13T21:12:14.498Z cpu7:66275)nvme:NvmeCore_SubmitCommandWait:1044:command 0x4305592b9890 failed, putting to abort queue.

So I am kinda stumped now ... any Ideas?
And no, o/c I dont have a recent backup due to the fact I am currently redesigning the whole env...
And no if those vms are gone then shit, but I'll survive;), its just 'vcenter, PDC and all end user vms'. Just means a whole bunch of work and some complaints from the in house (home) users.

Edit:
I have now booted up win on that box and it does see the controller but does not see it as drive either.
Not good :/
So I dont think its a vmware issue at all, more like a hardware problem :(

Will run Intel Drive tool when all the prereqs are installed...
 
Last edited:

azev

Well-Known Member
Jan 18, 2013
769
251
63
ouch... were you doing a hot unplugged test ?? I am not sure if this is supported on just any box.
Reminds me the old days of proliant servers with a special software that allows you to shut down a pci slot prior to removing a card.
 

Rand__

Well-Known Member
Mar 6, 2014
6,635
1,768
113
No. plain stupidity :(
The cards were not in use and i didnt want to have the hassle to restart the box which takes a while, so I thought 'What can happen' ... now i know :(
 

Rand__

Well-Known Member
Mar 6, 2014
6,635
1,768
113
So, drive is in a "disable logical state" which means contacting Intel ...
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,650
2,066
113
Hmmm, first thought was PCI address issue but it not working on other systems is strange.

Since it's detected can you throw commands at it? Since it's seen but the 'drive' is not you may be able to issue commands at it to tell it what to do, or get a status. I forget what 'tool' Intel has for these but Enterprise something another tool, I can't recall the name?
 

Rand__

Well-Known Member
Mar 6, 2014
6,635
1,768
113
I used the Intel Enterprise tool (only thing running on R2), thats where I got the disabled state from.
Cant run anything else atm while in this state (it seem, still searching)
 

Rand__

Well-Known Member
Mar 6, 2014
6,635
1,768
113
So, good news (for me o/c) - found a Backup of the PDC, just had to restore Nakivo functionality first. Manual clones of the Client VMs as well on the nfs dump.
Now only thing missing is the vSphere Server, but that is not a big deal, only need to run the upgrade to 6.5 again on the old one.
Then to remap the vms in Horizon and perform a bunch of updates & see what changes I am missing;)

So looks like I got off with a slap on the wrist... data wise at least.
Not sure if the drive is toast or recoverable and whether this will be covered by warranty... will see :)
 

Rand__

Well-Known Member
Mar 6, 2014
6,635
1,768
113
Just fyi
Attempts to restore (as directed by Intel support) where limited to updating firmware and low level format which both didn't work due to state
=>The drive needs to be replaced.
 

Rand__

Well-Known Member
Mar 6, 2014
6,635
1,768
113
Ah well, could have been worse :)

/me lost only a day or two of time and a few things I had on my desktop
I assume the drive will be replaced under warranty unless they are especially picky

In general its not so reassuring that you cant recover a drive in this state :/