Esxi free is dead. Alternative?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

unwind-protect

Active Member
Mar 7, 2016
418
156
43
Boston
This is just making it worse with another layer that is not perfectly synced. Snapshotting is not the problem. Guaranteeing that the data in the snapshot is perfectly consistent is a problem and you can not solve that by changing the backend. ZFS supports perfect synchronized writes. The FS won't be corrupt in itself. It's the software that might have data in flight because it is not programmed for situations that are somehow like pulling the plug without knowing when.
But NFS' internal consistency model makes the situation a whole lot better than having a block device inside the VM.
 

Zedicus

Member
Jul 12, 2018
52
21
8
But NFS' internal consistency model makes the situation a whole lot better than having a block device inside the VM.
you have just stacked 2 band-aids on top of a paper cut.

the VM OS does report a 'power cut' but there is no disk data out of sync from the ZFS. you are just taking a snapshot of that exact point in time (a RUNNING system) and then asking it to boot up again, the OS has to regain its composure, period.

adding multiple layers to get 'around the disk issue' is actually going to shoot the disks point in time freeze off further from what the software thought was happening compared to the missing data from RAM at the time.
 

NPS

Active Member
Jan 14, 2021
147
44
28
But NFS' internal consistency model makes the situation a whole lot better than having a block device inside the VM.
Can I have my root FS on NFS? And does the guest OS handle the mounted NFS share differently than a typical FS?
 

ghost792

New Member
Jun 19, 2023
27
20
3
I'm probably an outlier here, but I replaced my free ESXi hypervisor with Windows Hyper-V on the same hardware. I'm using the evaluation version of Windows Server 2019 Datacenter.

Under ESXi, I had a Windows Server, piHole, and Xpenology NAS. I transferred the Xpenology data to file shares on my new Windows server install and recreated the piHole as a Hyper-V VM.
 
  • Like
Reactions: i386

Zedicus

Member
Jul 12, 2018
52
21
8
Windows Hyper-V
we did a test deployment with Hyper-V when we were looking at options to replace VMWare. It actually was high on our list of potential solutions as with our MS Enterprise Agreement at the time is was a cost effective option. The deployment worked fine, and one of the windows server admins said he liked it way more than vSphere. trying to ask questions about Hyper-V support with vendors, or get VMWare trained staff to work on Hyper-V, was all basically impossible.
 
  • Like
Reactions: ghost792 and i386

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
The problem with ZFS snaps and VM storage:

If a ZFS filer crashes during write (pull the power plug):
ZFS filesystem remains ok due Copy on Write with the proper state prior the crash.
If you need to protect commited writes: enable sync to do missing writes from ramcache on next reboot

For a VM, situation is different (does not matter how you offer ZFS storage to a VM like file ,NFS, zvol, iSCSI)
If you pull the power plug or do a snap during write (is like pulling the power plug from VM view):
it can happen that during atomic writes that must always be done completely like write data + adjust metadata:
some data is on disk and needed metadata for them is not. This means a corrupted guest filesystem even if a guest filesystem is ZFS

If you enable sync on ZFS below the VMs, the VM filesystems remains mostly ok on a crash as missing atomic writes are done on nexr reboot.
For the snaps on the underlying ZFS filesystem there is nothing ZFS can do to ensure valid guest fileystems. A filesystem halt prior a snap
or a guest filesystem switch to a secure state is the only option.

To make it clear. This is a statistical problem. Chance of problems is not very high. If VMs are critical, this can be a problem to consider, either with ZFS snaps in VM off state, a filesystem halt/freeze/hotmemory option or a backup method that guarantees validity of backups from running VMs.

btw:
No one with experience about a current Solaris/OmniOS Unix under Proxmox or Hyper-V to replace ESXi AiO.

For Hyper-V or Windows 10/11/Server: Open-ZFS 2.2 for Windows release candidate with napp-it beta is out.
 
  • Like
Reactions: itronin

Zedicus

Member
Jul 12, 2018
52
21
8
it can happen that during atomic writes that must always be done completely like write data + adjust metadata:
some data is on disk and needed metadata for them is not. This means a corrupted guest filesystem even if a guest filesystem is ZFS
the guest filesystem will not be corrupted as far as ZFS is concerned. And the issue you speak of is the same as if you yank the power chord on a real PC, it is why server SSDs have PLP. It is not a problem of the hypervisor or filesystem, and is not fixable at the level of hypervisor or filesystem.

if you do not want to risk a missed write, the OS you are backing up can not be in use at the time. does not matter if it is bare metal or VM.

No one with experience about a current Solaris/OmniOS Unix under Proxmox or Hyper-V to replace ESXi AiO.
Solaris in its current form does work on ProxMox, (with an occasional install work around needed for some people) but it has been years since i have seen a build with Napp-IT done that way. might be a question for [H] forums though, seems to be Napp-It has a large user base there.
 
  • Like
Reactions: itronin

unwind-protect

Active Member
Mar 7, 2016
418
156
43
Boston
Can I have my root FS on NFS? And does the guest OS handle the mounted NFS share differently than a typical FS?
Yeah, most Unixens support that. I run PXE+NFS for most of my machine zoo for many years.

The guest OS handles NFS completely different from the situation where it has a local block device for a non-networked filesystem. It is always consistent against the server. Not is why you can now safely snapshot on the server.

Note that this is not true if you use iSCSI.
 

SRussell

Active Member
Oct 7, 2019
327
152
43
US
Proxmox is amazing!!

It has incredible features including replication, migration, snap shots, lxc containers, linked clones, etc.
You can build a 2 node cluster and the sky is the limit on what you can configure.

You will never miss ESXi ever again, and if you really need to you can just run ESXi nested as a guest vm :)
When you run a two node Proxmox do you run a simple third node for quorum?
 

SlowmoDK

Active Member
Oct 4, 2023
141
77
28
When you run a two node Proxmox do you run a simple third node for quorum?
U can use corosync-qnetd to create the 3rd vote. it can be installed on any linux node/vm

I used to run it in a VM on my baremetal truenas box that was seperate from the 2 proxmox nodes, but you could run it on anything (Think Raspberry PI)
 
Last edited:
  • Like
Reactions: Zedicus

BobTB

Member
Jul 19, 2019
81
20
8
imho xcp-ng is still the closest to what esxi does for me. Sadly I can not get omnios/solaris/illumos to run on it. Omnios installs even, but crashes on boot.

I am thinking (not) of completely redoing it all, and go with some distributed file system, so I will have like 6x more points of failure as compared to this clean NFS solution with napp-it. o_O

I need to run some windows VMs, thinking to go directly to Hasicorp Nomad. But I don't like the persistent storage solutions for it. I have like 20+ locations where I am running all-in-one, single server with VMs for ZFS and then NFS to run VMs on. This gives me great flexibility.

I can not add more hardware just to run ZFS on bare metal in each location.
 

tsteine

Active Member
May 15, 2019
171
83
28
I replaced ESXi with KubeVirt installed on RKE2, Ubuntu OS, with Rook Ceph storage. Looking at replacing Ceph with 2 standalone active/passive failover NFS nodes, since Ceph performance is absymal even on NVME storage with 5 nodes with appropriately scaled PGs on rados block device pool on 100gbe network.

I rather like KubeVirt since I can schedule VMs with kubernetes scheduler.
 
  • Like
Reactions: BoredSysadmin

BobTB

Member
Jul 19, 2019
81
20
8
Hm, great idea. I was also looking at RKE2+ Rancher. Thank you for the Ceph confirmation, I thought I do not know how to set it up on my test machine, its so incredibly really slow.
 
  • Like
Reactions: tsteine

Zedicus

Member
Jul 12, 2018
52
21
8
. I have like 20+ locations where I am running all-in-one
why not actually spring for the VMWare Essential tier of product and just stay on VMWare? I understand budgets and all that, but the cost of finding, testing, and replacing a known good solution will likely be higher than just paying the Broadcom tax.
 

NPS

Active Member
Jan 14, 2021
147
44
28
The guest OS handles NFS completely different from the situation where it has a local block device for a non-networked filesystem. It is always consistent against the server. Not is why you can now safely snapshot on the server.
Can you point me somewhere where I can read up on this?
 

unwind-protect

Active Member
Mar 7, 2016
418
156
43
Boston
Can you point me somewhere where I can read up on this?
Well :)

Just joking.

It should just be obvious. The server has an open view of all the files it shares to the NFS client. You could chroot into the tree if you wanted. As a result there is no buffering that would be affected by snapshotting the raw block device - because there is no block device. No filesystem block device cache and consistency model present.

Performance is a different matter of course. And if you use iSCSI this breaks down, because now you have a block device.
 

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
If you want to be part of the fight with XCP
OmniOS provides a bloody installer (boots but with remaing problems)

 

BobTB

Member
Jul 19, 2019
81
20
8
well, currently bloody installer just makes it possible to install it in UEFI, but then it panics the same way as in BIOS mode. Will see if some further progress will happen, in the meantime I will try to run it on some KVM based hypervisor, to see it it works.

I am also looking at SmartOS (Triton, mnx) to run as a hypervisor and then add Nomad on top, which would be great if I figure out how to import ESXI VMs to SmartOS.

@gea do you think napp-it would work on SmartOS?

 
Last edited: