Performance question...
On ConnectX3 non-Pro, Linux, ntttcp or nuttcp, 4 streams sender, 4 streams receiver (same box), IP namespace loopback via FDR QSFP+ DAC cable, link speed set to 56 Gbps, I can barely get over 40 Gbps on a 8259CL CPU on an X11SPL-F board. Typical speed is around 39-40 Gbps over the course of a one hour run.
Tried disabling all sleep states of CPU (barely an effect, 1 Gbps or so), I disabled memory init/teardown via boot init_on_alloc=0 init_on_free=0 (a handful of Gbps more), MTU set to 9000 (nothing), various sysctl tweaks (nothing), other tools than ntttcp or nuttcp like iperf/iperf3 way worse (3-10 Gbps less in aggregate), adaptive-rx off rx-usecs 0 tx-frames 64 (a few Gbps more, at best).
What limit exactly am I hitting here?