40Gbps Ethernet Mode ConnectX-3 VPI Awful Windows Performance

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
This journey started when I noticed that my network which used to get 36Gb/sec throughput was getting 4.5Gb/sec one direction and 9.8Gb/sec the other. After many many hours of searching, trying different settings, and finally cross-flashing mellanox's latest firmware (and trying to force-reinstall drivers), I caved and tried the stupid thing I should have tried earlier. Jumbo frames. 9000MTU -> 16.0 Gbit/sec, an improvement of 6+ Gbit/sec over the 1500 MTU case. Still a very far cry from 36Gbit/sec.

With 9000MTU:
upload_2017-6-1_3-9-29.png

It's been a long time since I've seen Jumbo frames matter, and something seems very wrong here. It clearly seems like its CPU bottlenecked, which is amazing, since this is on a watercooled 4.4GHz Haswell-E.

For reference, it gets the same speeds over fiber through a switch and over a DAC directly from one card to another (two PCs in both cases).

IPerf3 numbers are also pretty horrible to localhost (not using any NIC at all) - at 7.xGb/sec for TCP. Again, clearly CPU limited. But that's where I thought the Mellanox TCP offload stuff was supposed to shine?

Before anyone asks, all the cards are in PCI-e x8 electrical slots at 8.0Gbps (Gen3) speeds, this was one of the first things I verified.

Has anyone ever seen this kind of issue? Latest Mellanox drivers (5.35.12978.0) from 3/8/2017, and latest Mellanox firmware (7000). I should have mentioned this before, but this is on Windows 10 x64 (latest)

upload_2017-6-1_3-8-1.png

Thanks in advance!

P.S. One more note, before Jumbo Frames, -P N (where N > 1) wasn't helping performance, it was just getting the same throughput broken into more pieces. I just tested with Jumbo Frames, and it seems that it now does make a positive impact - upto 25Gbit/sec @ P=4, but it still doesn't get performance to 36...
 

i386

Well-Known Member
Mar 18, 2016
4,221
1,540
113
34
Germany

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
upload_2017-6-1_3-26-38.png

Chimney Offload State = disabled... this looks suspicious...
 

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
What happens if you disable or change the "interrupt moderation" settings in the nic properties?
BSOD (Driver power state failure) :/

Followed by no change in perf..
 

Layla

Game Engine Developer
Jun 21, 2016
215
177
43
40
Confirmed with netstat -t that everything is InHost and nothing is Offload. So Chimney is not working (despite that I've turned it on in the OS and the driver now).

Does anyone else have these adapters who could check to see whether they're getting their TCP connections offloaded to the adapter (netstat -t)?