Local storage options for ESXi

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Tinkerer

Member
Sep 5, 2020
83
28
18
Currently I am running a home server with an Asus Z11PA-U12/10G-2S. It has 8 disks connected to the onboard SATA controller.

I also have an old IBM M1015 lying around.

The server is running Red Hat 8.4 with ZFS and RHV virtualiation. ZFS is configured with 4 pairs of mirrors. I don't boot from ZFS, there's a dedicated SSD for OS and booting.

I am considering moving the hypervisor to ESX, this poses a problem. ESX does not support ZFS. I am not married to ZFS, but I do need redundancy so I do need some way of creating a logical volume that ESX can work with.

I tried the M1015. Flashed it back to IR mode with the BIOS but man, this makes me cringe and gnash my teeth in agony. It's like burning in hell. Its slow, tedious UI, boot time takes well over 5 minutes before it even jumps to "initializing". My attention span for watching "nothing happen" isn't 5 minutes, so half the time I miss the "PRESS THIS BUTTON TO CONFIGURE" and I need to reboot and sit through those >5 minutes again. It makes me want to break something. A keyboard, or some random object lying around on my desk ... anything. This controller is not an option for me to configuring a logical volume. Moreover, once I had this logical volume configured and activated it wasn't visible in RHEL. I didn't want to continue with >5 minutes delays during boot so I dumped that M1015 back in its corner. I can't handle that.

I do wonder whether I actually need the BIOS ... I don't boot from it so if I can create the logical volumes without that cringy BIOS interface and 5 minute delay... Linux didn't see the volume but if ESX will .. ? I don't know, any ideas?

So I wonder, would the onboard controller suffice? Back in the days I got burned by onboard RAID, is it still bad? The data on ZFS is important enough to make 4 mirrors instead of RAIDZ1 or 2, I make offsite backups and local backups. I do NOT want to loose some of that stuff so it has to be reliable. Not enterprise class or anything like that, but I don't want to go through a reboot or a BIOS upgrade one day and find my mirrors gone or corrupt, when nothing actually happened. Also, it needs to reliably sync a new disk when it got replaced. ZFS has never disappointed me (been using that since version 0.6.x and replaced quite a few disks over the years).

I don't mind spending some money on another controller. All it needs to do is RAID 0/1/10 and it needs to do it relatively fast (should match or exceed ZFS which admittedly isn't the fastest), and be reliable. No need for RAID-5 or 6.

Thanks in advance for any suggestions!
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
You can follow my All in One idea that I offer since 2008.
This means creating a ZFS storage VM on a local datastore, passthrough the HBA and manage a ZFS POOL there. Share a ZFS filesystem via NFS (and SMB for ZFS snap access, VM copy, move, clone) and place VMs there, see my manual https://napp-it.org/doc/downloads/napp-in-one.pdf

I offer a ready to use free server template based on OmniOS, OmniOS Community Edition. This quite the the compactest full featured ZFS OS (NFS, SMB, iSCSI, S3, lowest resource needs, ultra stable, LTS, often bi-weekly bug and security updates, ZFS in its native environment)
 
Last edited:

Tinkerer

Member
Sep 5, 2020
83
28
18
Very interesting read, thanks for all the effort you have put into that! Must have been quite a few hours there ;).

Do you think it would be possible to import my current ZFS pools? I have all volumes with native 2.0.x encryption. I can easily import those keys to your AIO appliance, if I may call it that. Does it run the latest 2.x ZFS version and would the OmniOS version offer the same compatible native ZFS encryption as I am using on RHEL 8?

Have you done any performance tests from within a VM running on that virtualized ZFS storage via NFS? I run some I/O heavy workloads, like Red Hat Satellite for example.

Last but not least, in the past it was impossible to take snapshots from a VM that had a PCI card pass through enabled. Do you know if that has changed with ESXi 6.x or 7.x?
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
OmniOS is based on the Solaris fork Illumos. Illumos has its own repository but takes over newer ZFS features like encryption from ZoL as it was done the other way round. Main advantage of the release dedicated OmniOS is that there is an additional quality check. OmniOS is often more stable than newest ZoL or Illumos, the upstream of OmniOS.

OmniOS and Zol are quite on par ZFS feature related so usually a pool move works between them.

You do not use virtualized storage with AiO as OmniOS has direct hardware access due passthrough. Given the same RAM and CPU, you have a barebone alike performance.

With pass-through there are two restrictions:
- All Ram assigned to OmniOS is fix assigned. I see this as an advantage. ZFS use all available RAM anyway so overbooking RAM is a bad idea and with the dedicated assignment OmniOS use exact the RAM you give it.

- For an ESXi snapshot you must shutdown the VM. ESXi cannot guarantee the data related to a pci device otherwise. But given that an ESXi snap is a new delta file with all modifications after snap creation, you are very restricted. Only one or two short-time snaps are a good idea in ESXi.

For an AiO this is not a real restriction as this affects only the storage VM. You do not need to backup (or in offline state) as there is nothing special there. If it crashes, just re-deploy the template, import the pool and you are up again. In ESXi re-mount the NFS filesystem and import the VMs. Crash recovery from scratch with half an hour.

All other user VMs do not have passthrough devices. No problem to snap with ESXi. Usually you use ZFS snaps anyway. Thousands ZFS snaps -no problem. You can even do an ESXi hotsnap (inkl memory state) prior the ZFS snap and delete it afterwardards. This allows you to go back to a hot running ESXi state from ZFS snaps.
 
Last edited:

Tinkerer

Member
Sep 5, 2020
83
28
18
Sounds too good to be true :D.

ESXi is booting as we speak on another (test) host, an old ML10 Gen9 with a few spare disks. I'll test your setup the coming days.

Thanks!
 

Tinkerer

Member
Sep 5, 2020
83
28
18
I changed the napp-it webui password for admin and operator, but I can't login anymore. How can I reset that password through the console?

Weird thing is I use Bitwarden to generate and store the password. Typo's are impossible so it might be a char limit or a special that gets in the way?

Thanks
 

Tinkerer

Member
Sep 5, 2020
83
28
18
I was able to reset it by removing the napp-it.cfg file. Editing out the password hashes didn't work ( adminpw|| ). I hadn't changed anything so removing and resetting was easiest by removing the file.

Still its weird it I couldn't login with a 32-char long random generated string with specials.
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
Info for the pw field in About > Settings:
allowed in pw: [a-zA-Z0-9,.-;:_#] , max 16 char
 

nickf1227

Active Member
Sep 23, 2015
197
128
43
33
A couple of important things to remember:
Your network card's speed is the limiting factor of how fast you can access your NFS or ISCSI mount, even though the traffic is never really leaving the box. This is because of a limitation in how standard virtual switches work.

You are using some CPU cycles to facilitate this that you otherwise wouldn't be wasting if ESXI had native ZFS.

If you auto-start your VMS after a reboot, you need to allow adequate time for the storage VM to boot up before the next VMs can boot.

:cool:
 

gea

Well-Known Member
Dec 31, 2010
3,141
1,182
113
DE
You should use the fast vmxnet3 vnic that is much faster (lower CPU needs) than e1000. Nic performanve on the internal switch can be several Gb/s
 
  • Like
Reactions: Tinkerer

Tinkerer

Member
Sep 5, 2020
83
28
18
Yeh I think you're right, internally should not be limited to physical nic speeds, but maybe e1000 is (I don't know). But even if it would be, I could bond 10 of them together :).

Good point on the auto boot Nick will keep that in mind! :D