Hey,
So I've finally taken the plunge into trying to configure some of my cards to run iSCSI on ESXi for some datastores and direct-shares of zvols. Eventually would like to do RDMA (I think?), but even ISCSI has been a total PITA compared to stuff like NFS and SMB.
I've had this stuff for at least a year now, but I find acronyms super annoying, so I think a certain avoidance has become second nature. I thought I'd be able to do 56Gb IB when I bought the cards (I have a handful of ConnectX-3 MCX354As flashed to FCBT ) but then I found out my upgrade to ESXi 7.0 made that out of the question. So now I'm trying to figure out the 40Gb ethernet + RDMA links to storage providers like LIO/targetcli, and it's been quite the wellspring of obnoxious minutiae.
For example, like, I was reading that to connect a VM to an RDMA endpoint in vSphere, the HCA (host controller adapter - I had to look that up, I thought CX3s were considered NICs, but that would waste a perfect opportunity to introduce another useless acronym) has to be connected to vSphere through a vDS (virtual distributed switch), but the vDS can only have ONE uplink connected (phys HCA port), even though my cards have two? e.g.: https://docs.vmware.com/en/VMware-v...tml#GUID-4A5EBD44-FB1E-4A83-BB47-BBC65181E1C2
Is that just during setup, or like, all the time?! It's totally not clear. Here's another gem:
From The Basics of Remote Direct Memory Access (RDMA) in vSphere | VMware
Or can I maybe change my CX3 HCAs to RoCE v2 with this setting: https://kb.vmware.com/s/article/79148
Oh, looks like I found an answer to that question here: https://blogs.vmware.com/vsphere/2020/10/para-virtual-rdma-support-for-native-endpoints.html
If I have to set them all ahead of time to be the same version throughout my stack, will I still be able to use them with iSER?
And seriously, though ... WTF is iSER? That's gotta be the lamest acronym ever (saying a lot). Is that still considered "networking", like SCSI over IP? Or does it render the HCA useless for other more networky stuff while it's running?
...I don't even know where to begin with this pile of flaming garbage:
There's just so much going on here, it's a lot to work with. If anyone has suggestions for someone trying to make this at least SORT OF simple, lemme know!
So I've finally taken the plunge into trying to configure some of my cards to run iSCSI on ESXi for some datastores and direct-shares of zvols. Eventually would like to do RDMA (I think?), but even ISCSI has been a total PITA compared to stuff like NFS and SMB.
I've had this stuff for at least a year now, but I find acronyms super annoying, so I think a certain avoidance has become second nature. I thought I'd be able to do 56Gb IB when I bought the cards (I have a handful of ConnectX-3 MCX354As flashed to FCBT ) but then I found out my upgrade to ESXi 7.0 made that out of the question. So now I'm trying to figure out the 40Gb ethernet + RDMA links to storage providers like LIO/targetcli, and it's been quite the wellspring of obnoxious minutiae.
For example, like, I was reading that to connect a VM to an RDMA endpoint in vSphere, the HCA (host controller adapter - I had to look that up, I thought CX3s were considered NICs, but that would waste a perfect opportunity to introduce another useless acronym) has to be connected to vSphere through a vDS (virtual distributed switch), but the vDS can only have ONE uplink connected (phys HCA port), even though my cards have two? e.g.: https://docs.vmware.com/en/VMware-v...tml#GUID-4A5EBD44-FB1E-4A83-BB47-BBC65181E1C2
Does that mean I can only have one PER HOST, or only one uplink connected to THE ENTIRE vDS?!Virtual machines that reside on different ESXi hosts require HCA to use RDMA . You must assign the HCA as an uplink for the vSphere Distributed Switch. PVRDMA does not support NIC teaming. The HCA must be the only uplink on the vSphere Distributed Switch.
Is that just during setup, or like, all the time?! It's totally not clear. Here's another gem:
From The Basics of Remote Direct Memory Access (RDMA) in vSphere | VMware
If my CX3 HCAs are only RoCE v1 compliant, does that mean that I have to make sure my little storage-only network VyOS VM I made basically just for running dnsmasq needs to be OFF unless the HCAs are using RoCE v2? (v1 = switched network only vs v2 = routing compliant) Is that what that means?RDMA over Converged Ethernet (RoCE) is supported with PVRDMA . . .fabric needs to support Priority Flow Control (PFC). PVRDMA supports both RoCE v1 and v2. The difference here is that RoCE v1 supports switched networks only, where RoCE v2 supports routed networks.
Or can I maybe change my CX3 HCAs to RoCE v2 with this setting: https://kb.vmware.com/s/article/79148
And PVRDMA supports RoCE v2, but IF my cards only support v1, does that mean I'm limited to the capabilities of my card (v1), or could I conceivably do some host-only (internal) networking with PVRDMA unattached to the HCA? Would that even make any difference if it were host-only, anyway...? And they all have to be set to the same thing? It's all just so unclear.Changing RDMA NIC's RoCE Version in ISER Environments:
Code:esxcli rdma iser params set -a vmhba69 -r 2
Oh, looks like I found an answer to that question here: https://blogs.vmware.com/vsphere/2020/10/para-virtual-rdma-support-for-native-endpoints.html
So that's cool, but still ... do I connect them to the host through the same networking endpoint? So, even the host-only PVRDMA needs to be connected to a vDS?VMs running on the same ESXi host use memory copy for PVRDMA. This mode does not require ESXi hosts to have an HCA card connected
If I have to set them all ahead of time to be the same version throughout my stack, will I still be able to use them with iSER?
And seriously, though ... WTF is iSER? That's gotta be the lamest acronym ever (saying a lot). Is that still considered "networking", like SCSI over IP? Or does it render the HCA useless for other more networky stuff while it's running?
...I don't even know where to begin with this pile of flaming garbage:
Pre-requisites to enable PVRDMA support for native endpoints
- ESXi host must have PVRDMA namespace support.
- ESXi namespaces should not be confused with the vSphere with Tanzu/Tanzu Kubernetes Grid namespaces. In releases previous to vSphere 7.0, PVRDMA virtualized public resource identifiers in the underlying hardware to guarantee that a physical resource can be allocated with the same public identifier when a virtual machine resumed operation following the use of vMotion to move it from one physical host server to another. To do this, PVRDMA distributed virtual to physical resource identifier translations to peers when creating a resource. This resulted in additional overhead that can be significant when creating large numbers of resources. PVRDMA namespaces prevents these additional overheads by letting multiple VMs coexist without coordinating the assignment of identifiers. Each VM is assigned an isolated identifier namespace on the RDMA hardware, such that any VM can select its identifiers within the same range without conflicting with other virtual machines. The physical resource identifier no longer changes even after vMotion, so virtual to physical resource identifier translations are no longer necessary
There's just so much going on here, it's a lot to work with. If anyone has suggestions for someone trying to make this at least SORT OF simple, lemme know!
Last edited: