10Gbe poor performance in windows

Kei-0070

Member
May 3, 2019
40
18
8
Cardiff
Just ran some basic testing on the 3 machines that are going to be linked together via 10Gbe and found some slightly strange performance issues.

This is the kit that is linked together:

Switch - Juniper EX3300 with juniper SR transceivers no vlans or L3 features enabled 1500 MTU
PC1 - Windows 10 1803 - Threadripper 1920x / Intel X710 with intel SR transceiver Tx/Rx Buffers adjusted to 4096
PC2 - Windows 10 1803 - i7 4820K / Intel X710 with intel SR transceiver Tx/Rx Buffers adjusted to 4096
NAS - Fedora 30 - i5 9600K / Solarflare SFN7122F with Avago SR transceiver no driver adjustments

Running iperf server on PC1 and client on NAS nets around ~7.5Gbps. Running iperf server on the NAS and PC1 as client nets ~3Gbps. Running iperf on PC1 and PC2 results in ~1.5Gbps no matter what I try.

I'm not seeing any dropped packets or errors on the switch monitoring for the 10gig ports. I've tried disabling flow control and interrupt moderation in the intel driver and that makes zero difference. Task manager doesn't show the cpu getting hammered either so I'm not sure what's going on here.
 
Last edited:

Spartacus

Well-Known Member
May 27, 2019
786
326
63
Austin, TX
Which drivers are you using on the Windows Machines? and how many parallel tests?
The built in Windows ones are notorious for getting about 1/2 of the 10g speed capability on the Mellanox cards not sure if its the same for the Intel ones.
 

i386

Well-Known Member
Mar 18, 2016
3,007
965
113
33
Germany
Use ntttcp between Windows Machines for Network testing , iperf is Not optimized for Windows.
 

Kei-0070

Member
May 3, 2019
40
18
8
Cardiff
1.9.230.0 downloaded from the intel site about a month or so ago. (driver date suggests 27/11/2018) There was also a firmware update which I did on both cards, but I can't remember the version on that one.

NTttcp performance:
Code:
PS I:\Downloads\NTtcp> ./ntttcp.exe -s -m 8,*,192.168.0.160 -l 128k -a 2 -t 15
Copyright Version 5.33
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0   15.016        95863.612    131072.000
     1   15.454        34596.609    131072.000
     2   15.047        43018.276    131072.000
     3   14.750        13797.966    131072.000
     4   15.016        56984.550    131072.000
     5   15.016        55688.865    131072.000
     6   15.016        54248.269    131072.000
     7   15.016        56686.201    131072.000


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
     6037.750000      15.015       1460.054          402.115


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
             3216.916       2.353     48302.000


DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
    28524.875         0.627       52573.427          0.340


Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
     4336168           268517           0      0      1.184
 

Kei-0070

Member
May 3, 2019
40
18
8
Cardiff
Doing a quick iperf test on the local machine itself to the loopback address sees an average of 4.5Gbits/s. Setting the TCP window size to 2048000 results some heavy variability but the best I've seen is 8.9GBits/s and the average is around 6.5. This suggests to me that it's a windows issue.


Do the same thing on the linux server results in something like I'd expect.


Running similar tests in ntttcp give the same results.
Code:
PS I:\Downloads\NTtcp> ./ntttcp.exe -r -m 8,*,192.168.0.2 -l 128k -a 2 -t 15
Copyright Version 5.33
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0   15.000        76458.280     48480.812
     1   15.000        76458.755     48927.487
     2   14.999        76464.043     53761.016
     3   14.999        76463.758     48379.198
     4   15.000        76459.896     54396.665
     5   15.001        76454.229     64123.136
     6   15.000        76458.660     55378.178
     7   15.000        76458.755     51974.087


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
     8960.028477      15.000       1359.733          597.335


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
             4778.682       9.155     71680.228


DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
     1103.533       417.425       30830.067         14.941


Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
     6909840          6909643           0      0      6.840
PS I:\Downloads\NTtcp> ./ntttcp.exe -r -m 24,*,192.168.0.2 -l 256k -a 2 -t 15
Copyright Version 5.33
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0   15.000        24166.232    112041.449
     1   15.001        24164.716    107655.099
     2   14.993        24161.063    104284.717
     3   15.001        24164.716    111003.224
     4   15.008        24169.690    111544.877
     5   14.991        24164.381    104051.108
     6   14.994        24158.976    102780.116
     7   15.002        24163.105    105483.029
     8   14.999        24168.698    108318.197
     9   15.001        24164.716    112688.154
    10   14.992        24162.865    108146.840
    11   14.999        24167.938    117802.215
    12   15.000        24166.327    110474.637
    13   15.000        24166.327    108251.613
    14   15.001        24164.811    105483.444
    15   14.993        24160.778    102017.701
    16   14.994        24160.973    107432.407
    17   15.001        24164.716    105393.180
    18   15.003        24161.495    103224.355
    19   15.001        24163.955    105901.027
    20   15.001        24164.716    108409.690
    21   14.993        24162.394    102930.405
    22   14.992        24162.770    106807.429
    23   15.001        24163.480    109459.098


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
     8494.292297      15.000       1309.137          566.286


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
             2265.145       7.807     33977.169


DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
     1336.133       339.470      404661.400          1.121


Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
     6803737          6803653           0      0      5.530
PS I:\Downloads\NTtcp> ./ntttcp.exe -r -m 12,*,192.168.0.2 -l 32M -a 2 -t 15
Copyright Version 5.33
Network activity progressing...


Thread  Time(s) Throughput(KB/s) Avg B / Compl
======  ======= ================ =============
     0   14.877        61672.682  22369633.333
     1   14.880        61660.248  22369633.333
     2   14.880        61660.248  22369633.333
     3   14.882        61651.962  22369633.333
     4   14.895        61598.153  18067780.769
     5   14.883        61647.819  22369633.333
     6   14.893        61606.425  22369633.333
     7   14.888        61627.115  22369633.333
     8   14.885        61639.536  22369633.333
     9   14.897        61589.883  22369633.333
    10   14.880        61660.248  22369633.333
    11   14.883        61647.819  22369633.333


#####  Totals:  #####


   Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
    10752.005768      15.000       1381.202          716.800


Throughput(Buffers/s) Cycles/Byte       Buffers
===================== =========== =============
               22.400       5.567       336.000
 
Last edited:

Kei-0070

Member
May 3, 2019
40
18
8
Cardiff
What baffles me is that windows itself isn't necessarily the issue as I've seen plenty of other people have no issues achieving near 10 gigabits/s transfers using windows 10 with similar or even older hardware. The performance I'm seeing on my threadripper system isn't actually too bad but it's not great either. The x79 system on the other hand is frankly awful. That thing can barely push past gigabit in windows. I'm suspecting that the OS on that system might be "borked" as I have the feeling it was upgraded from an F8320/990FX without reinstalling.

Showing the network adapter info in powershell makes me suspicious of the reported pcie link width.
Code:
PS C:\WINDOWS\system32> Get-NetAdapterHardwareInfo

Name                           Segment Bus Device Function Slot NumaNode PcieLinkSpeed PcieLinkWidth Version
----                           ------- --- ------ -------- ---- -------- ------------- ------------- -------
WiFi                                 0   4      0        0    1        0      2.5 GT/s             1 1.1
10Gbe 1                              0   8      0        0             0      8.0 GT/s             2 1.1
Gigabit Lan                          0   5      0        0    1        0      2.5 GT/s             1 1.1
10Gbe 2                              0   8      0        1             0      8.0 GT/s             2 1.1

This is what I get for the X79 system. Definitely operating at pcie 3.0 in both cases but the link width is definitely wrong on the threadripper system.
Code:
PS C:\WINDOWS\system32> Get-NetAdapterHardwareInfo

Name                           Segment Bus Device Function Slot NumaNode PcieLinkSpeed PcieLinkWidth Version
----                           ------- --- ------ -------- ---- -------- ------------- ------------- -------
Ethernet 2                           0   5      0        1                    8.0 GT/s             8 1.1
Ethernet 3                           0   5      0        0                    8.0 GT/s             8 1.1
Onboard 1Gbe                         0   0     25        0                     Unknown
 

Kei-0070

Member
May 3, 2019
40
18
8
Cardiff
In respect of the pcie link width issue I spotted, I've now fixed it by swapping the NIC and audio interface around as both were in 8x slots. GetNetAdapterHardwareInfo reports the correct link width and speed now but it didn't make 1 iota of difference to the throughput though. Something I've just tested running 3 separate server/client instances of iperf3 which is suggested for 40G/100G network testing. In Work on my old Z800 workstation (dual x5650), one instance on the loopback address tops out at 9.5Gb/s. Three instances net similar results per instance meaning ~30Gb/s. I'll need to try this once I'm back at home and see how that fares.
 

Kei-0070

Member
May 3, 2019
40
18
8
Cardiff
Just pushed the 1903 update on my threadripper machine and iperf performance is now fixed on this machine. Across 5 instances I saw just shy of 40Gbit/s and a single instance gave just over 16Gbit/s. I'll look into doing the X79 system tomorrow and see if it also sees the same improvement.
 
  • Like
Reactions: nikalai