hot swap/plug esxi RDM disks?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

dragonme

Active Member
Apr 12, 2016
282
25
28
just like the title says.. esxi 6.0 I have a couple sata disks RDM'd to a napp-it VM and it would seem that exporting the pool and pulling the RDM drive then plugging it back in freezes the VM...

any thoughts.. is this a known limitation that DRM mapped disks dont hot plug?

thanks
 

Dev_Mgr

Active Member
Sep 20, 2014
133
48
28
Texas
Hotswapping on a VM level is definitely possible with ESXi, however, hot swapping on the physical level is pretty limited (you can hot swap a drive in a redundant raid set, but not much more than that).

RDM is probably not meant for hot-removal.

If you want to be able to hot-remove a drive, you may need to get a separate SAS controller, pass this through to the VM and attach the removable drive to this SAS HBA. This way the hot-removal of that drive is between the guest OS and the SAS HBA.
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@Dev_Mgr

thanks that is what I thought, and was afraid of.

my chasiss is a half depth rackable systems unit.. its a 3U but a 12 disk cage sits over the top of the pci slots.

its a intel 5520 so there are plenty of slots but only 1U (or less) of space so only room for one card laying down using a 90 degree adapter, that is currently filled with a lsi 4e4i card that is passed through.

that card is passed through to napp-it and it works fine and I can turn off the disk shelf attached to the 8088 4e port and all is well, shows back up when turned back on

the RDM drives are attached from sata ports on the motherboard, but using lsi SAS controllers in the VM

I figured either ESXI does not like to see the physical backing come and go.. or that the virtual adapter cant deal with it.

thanks

so I am thinking of using another lsi 8i card.. perhaps relocated somewhere in the case on a mining 4x pci ribbon cable!!

very getto !!!
 

vrod

Active Member
Jan 18, 2015
241
43
28
31
RDM-Mapped disks was designed to be provided from a SAN which presents a virtual disk / LUN to the ESXi hypervisor. It's by no means designed for local drives.

When removing HDD's from an ESXi-host which is running, you first need to "offline"/unmount the devices, otherwise you will get a PDL (Permanent Device Loss).

Only option you have to hotswap your drives is to pass through a SAS HBA/Controller where you will connect your disks to. Much easier this way anyway.
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

thanks..

I have also been told that if I :

1: removed the rdm drive from the VM it is attached to...

then

2: went into esxi .. ( he didnt elaborate how) and removed the drive..

I should then be able to pull it safely..

have not tried it yet.
 

vrod

Active Member
Jan 18, 2015
241
43
28
31
Yes you can do this, you will have to detach the device in ESXi before you remove it to avoid a PDL. After that you can easily pull it, insert a new one and then rescan storage to make it available for the host.

It's definitely possible but the big cons of this, is the many extra steps you need to take in order to make it work, not to mention that you could risk the VM halting in case the device dies and ESXi locks the VM as the .vmdk for the RDM-device no longer is pointing to a valid device.

In any case I would recommend to go the passthrough way unless you can accept the higher risk of downtime in case anything happens.
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

thanks for the confirmation...

yeah.. i know I am pretty much flogging how esxi is 'supposed to work' but its just a home server and I have some challanges with my hardware / financial setup.

if you could take a minute here is what I have

its a rackable systems 3U chassis with ports facing front, and a 12 drive cage basically sitting atop a 1U space on the motherboard (intel s5520hc) which provides like 5 pcie slots but only physical room for 1 on a 90 deg riser facing front ( currently a lsi 9212 4E4I).

I have a backup disk shelf for zfs backups connected to the lsi 4E and 4 internal drives connected to the 4I that are currently intel 3500 ssd. the LSI card is passed to a napp-it VM which provides zfs storage to both host VMs and data. the SSD pool holds VMs. I have 4 drives in the backplane connected to onboard sata and these drives are DRM physical inependant being passed to napp-it for a storage pool


instead of using sata, which with the exception of hot swap/plug seems to be pretty stable and offers enough speed for bulk data, I would love to use another pcie LSI card for the remaining 8 slots instead of using sata at all.. but there is no physical room unless I use like a 4x or 8x pcie ribbon cable and relocate the card somewhere else in the cassis ...

as for detaching the drive at the esxi host level. can you outline the esxcli commands for me to do that.. or thr0ugh vcenter/host web gui for removing and rescanning a sata drive..

thanks for you assistance... I am a bit of a esxi noob so bear with the stupidity...
 

vrod

Active Member
Jan 18, 2015
241
43
28
31
@dragonme

Sure no problem. :) I have been where you were (using local disks as RDM), but I learned that it's actually really important (in regards to smart and error detection) to use passthrough. Thereby you should be able to prevent a failure before it occurs.

I have to be honest, I tried to picture the setup in my head but I am not exactly sure if I understood you correctly when you explained how the server is put together.

I can understand you got the S5520hc board, which unfortunately just has a single SATA controller. I am using a S2600CP in one of my boxes which has both a onboard SAS and an onboard SATA controller. I looked at the intel page and I found some expansion options for a raid controller...

Intel® Integrated Server RAID Module AXXRMS2LL040 Product Specifications <- this one could be an option, although it does not have "jbod" mode listed. however, this is the controller you could use for ESXi-based storage and then passthrough the onboard sata controller to your ZFS VM. This module has 4 ports but there's other options too: Intel® Server Board S5520HC Product Specifications

What version of ESXi 6.5 are you running? In the Windows vSphere Client you are able to do it under Storage -> Devices. I have not seen an option for this in the ESXi HTML5 client yet. In the vSphere Web Client (vCenter only), you can go to the Host -> Configure -> Storage Devices -> Select your device -> All Actions -> Detach.

For the CLI-way, ther eare commands to do this through SSH or in the ESXi-shell. Be sure you detach the correct device. You should not be able to detach one which is already in use, but you never know, it can cost valuable data and time if you mess it up. :)

First, get a list of the devices:

esxcli storage core device list <- Will list all your devices with a lot of info. You need the ID which is right above the "Display Name" field, example:

Code:
[root@esxi01:~] esxcli storage core device list
t10.NVMe____INTEL_SSDPE2MD020T4C____________________CVFT4280002C2P0KGN__00000001
   Display Name: Local NVMe Disk (t10.NVMe____INTEL_SSDPE2MD020T4CCVFT4280002C2P0KGN__00000001)
...
   Status: on
...
Note that the "Status" shows as "on"

Now, I have a NVMe disk I wanna detach, so use the device name "t10.NVMe____INTEL_SSDPE2MD020T4C____________________CVFT4280002C2P0KGN__00000001"

Now, again, be sure you aren't using the device for anything. It should fail if it's in use, but again, you never know. :)

So, to then detach the device, in this example my NVMe drive:

Code:
[root@esxi01:~] esxcli storage core device set -d t10.NVMe____INTEL_SSDPE2MD020T4C____________________CVFT4280002C2P0KGN__00000001 --state=off
It should not return any message if successful. If unsuccessful it might say "device/system busy" or similar.

If you do another "esxcli storage core device list" command again, note the "Status" line..

Code:
t10.NVMe____INTEL_SSDPE2MD020T4C____________________CVFT4280002C2P0KGN__00000001
   Display Name: Local NVMe Disk (t10.NVMe____INTEL_SSDPE2MD020T4CCVFT4280002C2P0KGN__00000001)
...
   Status: off
...
In the ESXi HTML5 Client you should see that it is now showing the status as "Unknown" since the disk is now detached.

Do a rescan of all storage in the ESXi HTML5 Client or do it through esxcli as well...

Code:
[root@esxi01:~] esxcli storage core adapter rescan --all
Now you should be able to remove the disk and insert a new one. Do a rescan again after inserting the new one and it should show up as online. If the new one doesn't show up as online, you can manually attach it with (using my NVMe device as example again)

Code:
[root@esxi01:~] esxcli storage core device set -d t10.NVMe____INTEL_SSDPE2MD020T4C____________________CVFT4280002C2P0KGN__00000001 --state=on
I hope this helped.
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

thanks for the detailed notes... I will try that next time I bring most everything down for a zfs cold snapshot of the VM pool

I picked up a 24U b-line networking rack/cabinet so its not full depth.. max is about 25" deep I think so I have to stay with half depth type servers.. which made these rackable systems chassis perfect as they are true half depth.. in a standard rack you can put them back to back and essentially double the density of the rack.

in the 3U box.. the pcie sockets as well as the i/o panel are in the front and the 12 drive cage sits on top.. essentially leaving 1u of space under it hence only being able to utilize the 1 LSI card at 90 degrees on a riser under it. the 4E connector presents out the front allowing the 8088 cable to go to an external drive shelf that I power on when needed for backups.

inside the case at this point again its half depth.. leaves me about 2U of room by about 6-8 inches deep of space behind the drive rack. the drive rack has 4 drives into a single connector but no multiplier. so 1 row of 4 goes to the LSI card. one row of 4 on a reverse breakout go to the onborad SATA. 2 of the onboard sata hold an SSD for esxi to use during boot to bring up napp-it which provides the ZFS backed storage for everything else. napp-it VM presents NFS storage to the ESXI (6.0U3) host.

for control I do have vcsa VM (6.5) as a VM running 'BUT' its only of real use when things are working as the host has to be up, the napp-it storage VM has to be healthy as its providing the storage that vcsa boots from.. If I get a bigger SSD I may put vcsa on the esxi native SSD with napp-it so it come up first.

I found one of those intel daughter board 'LSI' controllers and was told it could be crossflashed to IT mode.. but it bricked and could not find a way to bring it back.. it fell off the pcie list in shell and no LSI utilities ever saw it again. To use that for napp-it without crossflash I dont know if esxi would allow me to pass the device to napp-it.. or if I would have to pass each drive via RDM. and even if it did allow a passing of the controller, I would have had to set up each drive as a single drive raid to present the disks to ZFS... not the best solution either but it did sit on the board in a place where there was room.

if this remains stable with just the loss of hot plug I will be happy enough as the drives on the SATA controller are bulk storage and dont get moved around much. so for ZFS if a drive should fail I can bring it all down to swap the drive.

by the by.. in esxi.. is the RDM a path to a controller slot or tied the actual drive.. i.e... if a RDM drive moves to a different slot, its RDM just moves with it at re-attach as I understand it..

anyhow.. thanks again VROD.. learning esxi has been fun..
 

vrod

Active Member
Jan 18, 2015
241
43
28
31
No worries, always happy to help. :) In regards to the controller, I meant that you should passthrough the onboard controller to the VM. This would give you 6 ports which will work with the Napp-it VM. Then use the LSI additional controller, without cross-flashing to host the VM-files for the napp-it VM. So in the end you would have the 4I4E (8 disks) and onboard sata (6 disks) while the additional RAID controller takes care of ESXi-only storage. At least you will be able to have 10 drives inside directly passed through which is what you wanted, right?
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

I cant pass the sata controller.. I need one drive at boot time to spin up the napp-it VM that provides the storage backing to this all-in-one.

yes.. if I was not using ZFS for the storage backing and wanted to keep everything on vmdk's I could have done that too.

I tried a hack that permits the use of USB for datastores .. but unfotunately I need to pass a couple USB devices to VMs like UPS status and a ZWave controller and that hack shuts down the usb arbirator service.. and it was super flakey.. so not an option. there is no other storage controller on board the s5520 that I am aware of..

I boot ESXI off the internal usb header

then esxi boots the napp-it VM off the SSD connected to sata 1

napp-it has 2 storage pools 1 on the ssds connected to the pass through LSI adapter (for VM) and another pool for buld media on 3x sata RDM passthrough drives.

then napp-it presents esxi with an NFS target

esxi then sees and boots the VM's on the NFS datastore

so I get all the bennifits and data integrity of ZFS wihout the additional power overhead of an external SAN and the complexity of iscsi .. the need for additional physical networking (as the s5520 only has 1 working nic) the other was firmware crippled by the previous owners to only provide access to ipmi.

all tolled.. this rig only draws about 170 watts.. not bad for $400 bucks (less drives) for 48GB and 24 virtual cores on dual l5640's and 24 TB of zfs backed media storage.
 

vrod

Active Member
Jan 18, 2015
241
43
28
31
@dragonme

I think you misunderstood me. :) Yes you need something for the ESXi to boot on, but this is where the RAID-module from intel would come into play... You could use THIS instead of the onboard sata controller to host the VMDK's and OS. No need to cross-flash or risk anything. I've done exactly what you are describing and it works really well!
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

ok.. i can see that.. so it would exliminate DRM by passing the SATA controller through to napp-it and use the esxi drivers to use the intel board for native vmfs storage.. for napp-it and perhaps vcsa vcenter.. compelling.. now if I hadnt bricked the one I bought for 12 bucks becasue the cheapest one I have found is like 100 so might as well just get another 8 port LSI card and relocate it with a pcie ribbon.. hehe
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

I found a AXXROMBSASMR which is based on a lsi 1078 which I am assuming as a 3gbs board it would be limited to 2tb drives? I would only be haning a couple small ssds on it so I dont think it would be a problem? being an older board and not 6gbs I would wonder if it would keep up with VM iops

what say you..
 

vrod

Active Member
Jan 18, 2015
241
43
28
31
If you are just going to use it for the VM files for your napp-it VM, then it will be fine. If the controller doesn’t have a cache-module, avoid parity-based RAID’s. Go for a mirror. :)
 

dragonme

Active Member
Apr 12, 2016
282
25
28
found a fujitsu d2607 that I forgot I had.. sas2008 based..

just have to see if I can flash it to IT mode the get an 8x pcie ribbon cable to move it somewhere in the case... developing...
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

sounds like the the fujitsu is 'difficult' to flash.. and I already bricked that intel daughterboard last year because in part, the uefi boot on this intel board sucks.

adding an 8i card .. all 12 drive spots in the cage would be attatched to lsi controllers leaving another 6 sata for some native to esxi vmfs SSDs but they would have to be placed in the case.

I might just go for that intel daughterboard based on the lsi 1078 to use for the esxi datastores just to boot napp-it and probably center .. probably 2 SSDs either stripped or paired.. and put the 6 sata controller through to napp-it.. that amount of storage would take longer to fill than the servers lifetime.

safe the fujitsu for a different project.. provided I can flash it at some point.. I found the modified bin files that are purported to work..
 

dragonme

Active Member
Apr 12, 2016
282
25
28
@vrod

to any and all who care here are some updates

I was able to hot remove a sata (ich10) hard drive that was RDM phyical passed to a napp-it VM that was being used to serve the esxi host with an NFS path.. it was a bit involved but it didnt crash this time..


1. vcsa take the NFS datastore offline.. requires moving those VMs somewhere else or out of inventory.

2. in napp-it .. export the pool

3 in vcsa remove the hard drive from the VMs inventory using edit

4 ssh into napp-it .. I ran some solaris commands to offline the hard disk and cleanup some things.. I dont have the notes in front of me

5 in vcsa -- host -- configure -- storage remove the disk from the esxi host

6 cross fingers and yank the drive...


I dont think I will be able to resurect an intel lsi 2008 mezz raid controller that I bricked a year ago trying to flash to IT mode.. was going to pass it to napp-it

but the suggestions here were to use the mezz board for napp-it and perhaps vcsa to boot from before the ZFS backing pools are up.. passing the sata ich10 controller to napp-it. good idea.. had not thought of that option.

so folks sent me some files to try and unbrick the lsi2008 based card but it that fails I picked up a lsi 1078 based intel mezz board for this motherbord for 5 bucks on ebay.. 12 with the shipping.. it will be limited to 2tb luns but I plan on just putting a couple 80gig intel 3500 ssds into a raid 0 or 10 just for napp-it and vcsa so it should work ok..

would have preferred the faster lsi 2008 one that I bricked last year but oh well..