How does (or should) Intel Dynamic LAG work?

daisho · Jun 5, 2020

I am desperately trying to get more than 10gbe over my X550-T2 nic over to my ESXi with a X710-T4.
The exact flowchart:

1) Windows 10 Pro, X550-T2 with both ports teamed using IEEE 802.3ad Dynamic Link Aggregation (according to Intel PROSet Adapter Config Utility)
Settings:
Jumbo-Packets MTU size 9014 (changed it to default 1500, but it doesn't change iperf3 performance)
Rx/Tx Buffers already on maximum (4096/16384), RSS enabled)
Tried to turn off Interrupt Moderation Rate (default is Enabled / Adaptive), but didn't change performance
CPU: Ryzen 3900x

< 2x RJ45 >
2) UniFi Switch XG 6, aggregated both ports (and showing no errors)
< 2x SFP+ >
3) UniFi Switch 16XG, aggregated both ports to XG6, aggregated 4 ports to ESXi (and showing no errors)
< 4x RJ45 >
4) ESXi, X710-T4 with all 4 ports defined as LACP/LAG with mode "Source and Dest. IP and VLAN"

Running iperf3 as server either on some Ubuntu* machine using vmxnet3 on the ESXi or on my W10 Workstation.
Running client on the other.
* they seem to have the best performance, when testing between VMs (meaning everything is passed directly between the vritual Distributed Switch and not really leaving the physical machine via NIC) I get around 20-30 Gbit/s with them.

Best I get is around 6,16 Gbit/s and it seems that the team on the Windows Workstation is not really aggregating but rather failover.
I get that it works that way when only using one connection, but shouldn't it use both links when sending several parallel streams?

Using e.g. following iperf3 command:
iperf3 -c x.x.x.x -P 10 -i 1 -t 10

When I do a speed test from my another workstation to mine (other way round the Skylake CPU is limiting on CPU performance because of iperf3 on all 4 cores ...) with a cheap ASUS XG-C100C I get exactly the same performance (around 6,16 Gbit/s) - I also tried that without any switch in between by just connecting a single 10gbe port from mine to the other Workstation (crossover) > same performance.

a) Is 6 Gbit/s the best I can get out of the X550-T2?
b) Shouldn't the dynamic LAG from the Intel driver not work better when doing 10 parallel threads via iperf3?
c) Any configuration I forgot on ESXi or my Intel NIC teaming? (is there no "mode" we can set, like based on IP and not MAC for example on the Intel driver?)

Search

How does (or should) Intel Dynamic LAG work?

daisho

New Member