RDMA(RoCE v2) with Nutanix 7-8 nodes Cluster

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,053
437
83
Hello, I am seeking advice or consensus if we should try to engineer our Nutanix cluster with RDMA.
It stands now the nodes are fairly beefy: Each node is 2x 6226R and 1.5TB ram each,12x3.84TB SAS12 SSDs (no NVME drives).
The load is a mixed server load, various biz apps, SQL databases, some in-house apps etc.. Vast majority is windows based.
Right now the build includes a 1x dual-port 25g Intel V710 Nic.
Last-minute I asked if Nutanix supports RDMA - They do with lots of limitations. Mostly we comply, except a minor change is swapping Intel nic for a pair of dual-port Mellanox CX4 cards. The cost difference is minimal.
The TOR switch already supported .

The question to the experts - is the extra complexity of implementing (and managing) RDMA (plus the extra 1 or 2 25gig ports on the switch) worth it?
What kind of performance benefit (if any) could we expect in this config without changing drives to NVMe?

p.s: Dear mods - Not sure if this right sub-forum - feel free to move it as you see fit.
 

vangoose

Active Member
May 21, 2019
326
104
43
Canada
Are you running AHV or Nutanix on VMware or Hyper-V?

Good luck with AHV if that's the case, I wouldn't touch that in production. I have no experience using RDMA in Nutanix, only Nutanix on VMware and AHV. Nutanix on VMware is excellent, I don't think RDMA is supported but it performs very well without it.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,053
437
83
The plan is for ESXi. As for AHV in production - Probably not for us yet. What I know is for now it's a non-starter for us due to on-prem Cisco UC (voice), but we did recently spoke with two (small) financial companies which are running Hybrid clusters with AHV - go figure - I guess they hate VMWare licensing that much.

According to Nutanix RDMA is officially supported on both AHV and ESXi (no for HyperV). they also said that most likely it's not worth it due to complexity issues at scale :( Sad face...

This blogger seems to know whats he's doing (unlike me):
His cluster is also more powerful than our (4x Platinum 28c/56t) CPUs and NVMe SSDs.
I guess RDMA only makes sense for really top-end stuff
 

Drewy

Active Member
Apr 23, 2016
208
56
28
54
Apologies for digging up an old thread but curious to know if you got rmda working and if so, was it worth it?