NEED HELP!
I need help trying to figure out why I can't seem to pass more than 11.9Gb/sec across my switches.
Pic of my setup below.
Basic is that I have a pfsense router with a 40Gb mellanox card in it, connected to an ICX6650.
I then have the ICX6650 connected to two ICX6610's using 40G QSFP+ DAC Cable cables.
And then I have 2 ESXI v7 servers , each is using 10Gb connections to the switches for most vms and management. But each ESXI server also has a 40Gb Mellanox nic that is direct passthrough to an ubuntu 21.04 vm. Those ubuntu VMs don't have any other nic configured - just each having 1 dedicated 40Gb card.
When I test network throughput between these ubuntu VMs, the max speed I get is around 11.9Gbps total. I have tried multiple things. Even tried multiple iperf clients/servers just in case they ran out of cpu (iperf 3 is single threaded, and maxes out a single core). But using multiple parrallel iperf3 to get around that, and same result anyway.
As far as I can tell, everything is set up right. Everything on the switches shows 40Gb for the relevant ports. licenses are configured fine. I'm not seeing any bottlenecks anywhere that I can tell. I ran a second test, on one of the ubuntu vm's I removed out the 40Gb from its configuration and took 2 of the 10Gb nics on the ESXI 2, connected to 10Gb ports on the ICX6650 and set them up in dynamic lag lacp with active hash based load balancing. Ran the same test again with multiple iperf instances and got the *exact* same result, max of 11.9Gb/sec total transfer across the switches. I would have expected that to be upwards of 18-19Gbps.
Actually the first test I did was not even pass through. I had SR-IOV enabled but just created a 40Gb port group/vmswitch and had VMXNET3 direct path i/o and taht test yielded the same 11.9Gbps throughput. Then I did the LAG LACP test. Finalyl I assumed it was something in the VM layer and had reconfigured the nics to be pass through (not using SR-IOV) so its not even going through any VM layers at this point and STILL hit the same 11.9Gb/sec throughput. It's driving me nuts!
What am I overlooking that is preventing higher throughput? 40Gb nic to 40Gb switches to 40Gb nic I should be getting something like 30-35Gbps but I am not even close to that. I'm not cpu or memory bottlenecked, nothing I can see in the network path should be bottlenecking it. Should I not be expecting higher throughput across these switches??
