Hyper-V Two Node Fail Over Cluster Help

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

chadtn

New Member
Jan 15, 2025
5
0
1
Hoping you guys could point me in the right direction for configuring a Hyper-V failover cluster that can handle live migrations. I think I have everything setup to the point that I just need to bring some shared storage online, but started getting a little overwhelmed with all of the different technologies available.

Background:
I started out with a single Server 2022 Datacenter host running Hyper-V. It's a dual CPU Epyc 7B13 (128 cores total) setup with 512gb of DDR4. For VM storage I setup a S2D simple four column storage pool on four 2TB NVME drives. I also setup another storage pool running a two column mirror on four 12TB spinning drives. So the VM's all run on 8TB of striped NVME storage and I've been making VM backups on 24TB of redundant HDD storage. Everything lived behind a virtualized Opnsense firewall and I've had about a dozen or so VM's chugging along for the last six months.

I've recently added a second node with the same hardware and software specs as the first one. The only real change so far is that I added Mellanox ConnectX-5 CDAT cards and have the hosts directly connected with DAC cables over a 200gb SET Team. The second node is hosting another Opnsense firewall and I've spent the last 2-3 weeks sorting out all of the HA network configuration. I've been able to successfully failover the network between virtual firewalls without dropping any connections on the VM's.

Where I'm at right now:
I migrated all of my VM's to a different NVME drive and removed the old VM storage pool. The cluster has been deployed and I setup an SMB file share witness running on a Raspberry Pi. I've created separate vnics on the 200gb SET Team for cluster, cluster storage, and Live Migration traffic. The only warnings I'm getting from the cluster validation wizard is that I haven't mapped vnics to physical adapters and that RDMA is enabled, but the vnics are reporting unknown for the RDMA Technology.

Here is where I may have screwed up. I ran "Enable-ClusterStorageSpacesDirect" and didn't notice it scooped up my redundant HDD storage pool on node 1 and is presenting it as Clustered Windows Storage. I had intended that to remain stand alone. Is it ok to leave it that way, or should I try removing it? Other than that, everything looks good. Both nodes show eight NVME drives (Four on each side) as primordial Clustered Windows Storage available disks. What I had originally envisioned was each node striping their four drives and mirroring changes for the shared storage. I thought this was something I could configure directly through S2D, but started reading about nested S2D resiliency, scale-out file servers, and several other options that left me a little overwhelmed. heh.. Could anyone give some advice on how to proceed? Not sure if I'm supposed to create the storage pools for each node in Server Manager and then use Cluster Manager to mirror those striped volumes as a cluster pool or if I should be looking at some of the other options.

Thanks!

Chad
 

chadtn

New Member
Jan 15, 2025
5
0
1
Maybe I wasn't as far off as I thought. Just watched a video that showed I was supposed to have some extra parameters naming the CSV storage when I ran "Enable-ClusterStorageSpacesDirect" so that it would show up as a pool in Cluster Manager. Might wait to see if anyone has some advise before messing with it any more.

Should I try to roll back "Enable-ClusterStorageSpacesDirect", then run it again naming the NVME CSV storage pool and excluding my old HDD pool? Mainly worried about the HDD pool since it has all my backups on it. The NVME disks were already empty in anticipation of rolling out the shared storage.

Thanks!

Chad
 

chadtn

New Member
Jan 15, 2025
5
0
1
I was able to successfully create a CSV S2D two way mirror with four columns. I migrated all of the VM's over to the clustered storage last night and did some testing. I was able to RDP into a VM and live migrate it to a different node without any noticeable impact while I was clicking on random web pages in the browser.

So far everything appears to be working, but I may need to work on the network settings some more. I didn't map vnics to physical nics when I created the SET Team since both 100Gb interfaces are on the same card and the hosts are directly connected. I'm reading that this can affect how SMB3 handles multichannel streams. One thing I noticed during the migrations was that all of the traffic appears to be going over the vnic I created for Live Migrations, but it's capping out at 10Gbs of throughput. Prior to clustering I was able to push sustained 130Gbs of throughput over the SET Team just by throwing a few extra cores at iperf. Not sure if that was hitting a pcie barrier, single thread cpu or ddr4 limit, but I was happy enough at the time to move onto something else. heh..

Does anyone know how to specify which vnic S2D uses for SMB3 traffic? My internet search recommend "
Enable-ClusterS2D -S2DPreferredNetwork", but I think that may have been deprecated. From what I read, it sounds like SMB3 tries to use all pathways instead of a single point. I primarily created the different vnics to segregate traffic so I could apply QOS settings in the future, but the 200Gb pipeline may not really need it.

Thanks!

Chad