How does (or should) Intel Dynamic LAG work?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

daisho

New Member
Apr 25, 2019
27
4
3
I am desperately trying to get more than 10gbe over my X550-T2 nic over to my ESXi with a X710-T4.
The exact flowchart:

1) Windows 10 Pro, X550-T2 with both ports teamed using IEEE 802.3ad Dynamic Link Aggregation (according to Intel PROSet Adapter Config Utility)
Settings:
Jumbo-Packets MTU size 9014 (changed it to default 1500, but it doesn't change iperf3 performance)
Rx/Tx Buffers already on maximum (4096/16384), RSS enabled)
Tried to turn off Interrupt Moderation Rate (default is Enabled / Adaptive), but didn't change performance
CPU: Ryzen 3900x

< 2x RJ45 >
2) UniFi Switch XG 6, aggregated both ports (and showing no errors)
< 2x SFP+ >
3) UniFi Switch 16XG, aggregated both ports to XG6, aggregated 4 ports to ESXi (and showing no errors)
< 4x RJ45 >
4) ESXi, X710-T4 with all 4 ports defined as LACP/LAG with mode "Source and Dest. IP and VLAN"

Running iperf3 as server either on some Ubuntu* machine using vmxnet3 on the ESXi or on my W10 Workstation.
Running client on the other.
* they seem to have the best performance, when testing between VMs (meaning everything is passed directly between the vritual Distributed Switch and not really leaving the physical machine via NIC) I get around 20-30 Gbit/s with them.

Best I get is around 6,16 Gbit/s and it seems that the team on the Windows Workstation is not really aggregating but rather failover.
I get that it works that way when only using one connection, but shouldn't it use both links when sending several parallel streams?

Using e.g. following iperf3 command:
iperf3 -c x.x.x.x -P 10 -i 1 -t 10



When I do a speed test from my another workstation to mine (other way round the Skylake CPU is limiting on CPU performance because of iperf3 on all 4 cores ...) with a cheap ASUS XG-C100C I get exactly the same performance (around 6,16 Gbit/s) - I also tried that without any switch in between by just connecting a single 10gbe port from mine to the other Workstation (crossover) > same performance.


a) Is 6 Gbit/s the best I can get out of the X550-T2?
b) Shouldn't the dynamic LAG from the Intel driver not work better when doing 10 parallel threads via iperf3?
c) Any configuration I forgot on ESXi or my Intel NIC teaming? (is there no "mode" we can set, like based on IP and not MAC for example on the Intel driver?)