iperf3 speeds not symmetric on multiple adapters in Windows 10?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

jtabc

New Member
Jul 31, 2022
17
0
1
I ran `iperf3` between two Windows machines and noticed that the speeds were not symmetric. I tested using both the built-in 1-gigabit Ethernet adapters in each machine as well as with 10-gigabit Mellanox ConnectX-2 cards in both machines.

Here's the relevant output. I address the 10-gigabit cards using IP addresses of `192.168.1.XXX` while I address the 1-gigabit adapters with `192.168.0.XXX`.

Code:
PS E:\myfolder> iperf3.exe -c 192.168.1.220
Connecting to host 192.168.1.220, port 5201
[  4] local 192.168.1.201 port 54596 connected to 192.168.1.220 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   914 MBytes  7.66 Gbits/sec
[  4]   1.00-2.00   sec   909 MBytes  7.63 Gbits/sec
[  4]   2.00-3.00   sec   938 MBytes  7.87 Gbits/sec
[  4]   3.00-4.00   sec   959 MBytes  8.05 Gbits/sec
[  4]   4.00-5.00   sec   956 MBytes  8.02 Gbits/sec
[  4]   5.00-6.00   sec   958 MBytes  8.04 Gbits/sec
[  4]   6.00-7.00   sec   974 MBytes  8.17 Gbits/sec
[  4]   7.00-8.00   sec   995 MBytes  8.34 Gbits/sec
[  4]   8.00-9.00   sec   960 MBytes  8.05 Gbits/sec
[  4]   9.00-10.00  sec   971 MBytes  8.15 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  9.31 GBytes  8.00 Gbits/sec                  sender
[  4]   0.00-10.00  sec  9.31 GBytes  8.00 Gbits/sec                  receiver

iperf Done.
PS E:\myfolder> iperf3.exe -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.220, port 49675
[  5] local 192.168.1.201 port 5201 connected to 192.168.1.220 port 49676
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   428 MBytes  3.59 Gbits/sec
[  5]   1.00-2.00   sec   371 MBytes  3.11 Gbits/sec
[  5]   2.00-3.00   sec   406 MBytes  3.40 Gbits/sec
[  5]   3.00-4.00   sec   423 MBytes  3.55 Gbits/sec
[  5]   4.00-5.00   sec   421 MBytes  3.53 Gbits/sec
[  5]   5.00-6.00   sec   410 MBytes  3.44 Gbits/sec
[  5]   6.00-7.00   sec   403 MBytes  3.38 Gbits/sec
[  5]   7.00-8.00   sec   411 MBytes  3.45 Gbits/sec
[  5]   8.00-9.00   sec   414 MBytes  3.48 Gbits/sec
[  5]   9.00-10.00  sec   414 MBytes  3.48 Gbits/sec
[  5]  10.00-10.04  sec  16.8 MBytes  3.67 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  4.02 GBytes  3.44 Gbits/sec                  receiver
Notice how the transfer from `192.168.1.220` to `192.168.1.201` (bottom) is much slower than the top. Interestingly, I get similar results if I use a single gigabit connection instead:

Code:
PS E:\myfolder> iperf3.exe -c 192.168.0.220
Connecting to host 192.168.0.220, port 5201
[  4] local 192.168.0.201 port 54721 connected to 192.168.0.220 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   113 MBytes   950 Mbits/sec
[  4]   1.00-2.00   sec   113 MBytes   949 Mbits/sec
[  4]   2.00-3.00   sec   113 MBytes   949 Mbits/sec
[  4]   3.00-4.00   sec   113 MBytes   950 Mbits/sec
[  4]   4.00-5.00   sec   113 MBytes   949 Mbits/sec
[  4]   5.00-6.00   sec   113 MBytes   948 Mbits/sec
[  4]   6.00-7.00   sec   113 MBytes   950 Mbits/sec
[  4]   7.00-8.00   sec   113 MBytes   949 Mbits/sec
[  4]   8.00-9.00   sec   113 MBytes   949 Mbits/sec
[  4]   9.00-10.00  sec   113 MBytes   948 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.10 GBytes   949 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.10 GBytes   949 Mbits/sec                  receiver

iperf Done.
PS E:\myfolder> iperf3.exe -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.0.220, port 49677
[  5] local 192.168.0.201 port 5201 connected to 192.168.0.220 port 49678
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  69.5 MBytes   583 Mbits/sec
[  5]   1.00-2.00   sec  72.7 MBytes   609 Mbits/sec
[  5]   2.00-3.00   sec  72.7 MBytes   610 Mbits/sec
[  5]   3.00-4.00   sec  72.7 MBytes   609 Mbits/sec
[  5]   4.00-5.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   5.00-6.00   sec  72.7 MBytes   610 Mbits/sec
[  5]   6.00-7.00   sec  72.7 MBytes   610 Mbits/sec
[  5]   7.00-8.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   8.00-9.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   9.00-10.00  sec  72.6 MBytes   609 Mbits/sec
[  5]  10.00-10.01  sec  1.05 MBytes   606 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.01  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.01  sec   724 MBytes   607 Mbits/sec                  receiver
In both cases, transfers from `192.168.X.220` to `192.168.X.201` are not running at full speeds, while they (nearly) are the other way around.

What could be causing the transfer to be slower in one direction and not the other? Could this be a hardware issue? I'll mention that `192.168.X.220` is an "HP Slimline Desktop - 290-p0043w" with a Celeron G4900 CPU running Windows Server 2019 if that is somehow a bottleneck.

I notice the same performance difference when transferring large files from the SSD on one system to the other.

I'm hoping it's a software issue so it can be fixed, but I'm not sure. Any ideas on what could be the culprit?
 

i386

Well-Known Member
Mar 18, 2016
4,217
1,540
113
34
Germany
QUOTE="jtabc, post: 347143, member: 44411"]
Any ideas on what could be the culprit?
[/QUOTE]
Iperf is a Linux Tool, Not optimized for Windows.
Some Versions shipped with a less optimized/Buggy cygwin.dll (there are no official binaries, all the Windows Files are from third Parties).
Use iperf via Linux live Systems or try Other Software Like ntttcp (GitHub - microsoft/ntttcp) for Windows only Environments
 

jtabc

New Member
Jul 31, 2022
17
0
1
QUOTE="jtabc, post: 347143, member: 44411"]
Any ideas on what could be the culprit?
Iperf is a Linux Tool, Not optimized for Windows.
Some Versions shipped with a less optimized/Buggy cygwin.dll (there are no official binaries, all the Windows Files are from third Parties).
Use iperf via Linux live Systems or try Other Software Like ntttcp (GitHub - microsoft/ntttcp) for Windows only Environments
[/QUOTE]
I'm not sure if it is an issue with iperf. As I mentioned, I see this transfer speed limitation when I am copying large files too.
 

jtabc

New Member
Jul 31, 2022
17
0
1
(single threaded) Explorer copy?
Or multithreaded copy (like robocopy)?
I've only tested it single threaded. But once again it is directional. Slow from the HP machine but not slow from the desktop.
 

voidstar

New Member
Sep 18, 2022
2
0
1
I wrote ANT a few years ago as an alternative to iperf3. It uses standard C++ (but does depend on the boost library), and should be easily cross-compiled to other platforms.

Pre-built binaries are on the github page:

I'm just curious if it gives the same/similar results you're seeing with iperf3. Note that v100, v140, v141 are the Visual Studio versions used to compile the binary (and I didn't prepare a 32-bit builds, we're all done with that, right? :D ).

In ANT, I came up with this "beep metric" to try to help indicate when parameters weren't optimal to the system being used- because I was very keen on where to blame that performance bottleneck (between NIC/driver components or the processor itself). In the docs\developer_notes.txt is more notes about it.


EDIT:

i.e. some performance utilities are biased by including the time to LOAD the data to be sent, and also the time for the receiver to have access to the full data -- and to be fair, the work needed to create the data is part of the transfer time, as well as the time for the receiver to fully prepare the data so that it can start using it. But these "data jobs" are local CPU/bus timeline, not the NIC timeline.


ANT runs with some defaults. Use -h/--help to see the command line arguments.
The "-d" depth argument in ANT essentially corresponds to cores.

One thing I found while testing ANT: the Working Buffer size does matter. I found if I make it slightly larger than the L1 cache size of the host processor, it can noticeably degrade performance.

Also, the Windows Explorer copy is slow because of its status'ing. It actually cost quite a performance to report that status. In ANT, you can use --s to change the rate of status (or disable it entirely).


VS, when I was using it at the time, had trouble allocating 16GB or larger of contiguous RAM. Maybe it was my OS, or maybe it was build limitations in how they link the executable together. So I capped various command line arguments to 8GB (when testing in-memory data payloads). But in file mode, it buffers up the file a section at a time, so it should effectively be able to transfer any data file of any size. (but the time to load those portions of the file into RAM by the sender shouldn't be biased/accounted into the NIC transfer time results -- since that's CPU/bus/driver/drive media performance, not network performance)

Funny things happen when the payload size is smaller than the local MTU size (like metrics on transferring 1-byte).
 
Last edited:

voidstar

New Member
Sep 18, 2022
2
0
1
Unless the hardware (and OS) is exact mirrors of each other, I wouldn't expect the performance to be symmetric.

The data being transferred had to get from the source (SSD drive), across the bus, into CPU cache, then into RAM, then across north/south bridge and out to the NIC. It's not just two 10GB or 1Gbps NICs connected (or whatever the devices involved are). And there are two perspectives in network performance analysis: should the results include the time for the respective applications (sender/receiver) to package up and fully receive the data payload, or should it purely be the "data queued to NIC and confirmed sent" time. It's a subtle but important difference. To users, it is the application end-to-end performance that matters - you can't stop the clock when the NIC buffer has finished receiving data, if the user can't do anything with the data until it's be re-packaged into a cohesive unit down at the application level. But if you're just focused on pure network performance, maybe you're focused on cable quality, and want to ignore or disregard the overhead of the H/W and OS getting that data to the NIC.



One of those machines I would guess has a slower RAM and/or slower CPU (or a smaller cache), which is going to cut into the timing of how efficiently it feeds (serves) that data (and also relative to whatever the default settings of iperf3 is -- in terms of size of RAM buffers used to tx/rx that data, which can be impacted by the size of the CPU cache).

[ assuming you've also killed as much background processing under win10 on both systems, such that they're fairly uniform on that ]
 
Last edited:

LodeRunner

Active Member
Apr 27, 2019
540
227
43
I ran `iperf3` between two Windows machines and noticed that the speeds were not symmetric. I tested using both the built-in 1-gigabit Ethernet adapters in each machine as well as with 10-gigabit Mellanox ConnectX-2 cards in both machines.

Here's the relevant output. I address the 10-gigabit cards using IP addresses of `192.168.1.XXX` while I address the 1-gigabit adapters with `192.168.0.XXX`.
Code:
PS E:\myfolder> iperf3.exe -c 192.168.1.220
Connecting to host 192.168.1.220, port 5201
[  4] local 192.168.1.201 port 54596 connected to 192.168.1.220 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   914 MBytes  7.66 Gbits/sec
[  4]   1.00-2.00   sec   909 MBytes  7.63 Gbits/sec
[  4]   2.00-3.00   sec   938 MBytes  7.87 Gbits/sec
[  4]   3.00-4.00   sec   959 MBytes  8.05 Gbits/sec
[  4]   4.00-5.00   sec   956 MBytes  8.02 Gbits/sec
[  4]   5.00-6.00   sec   958 MBytes  8.04 Gbits/sec
[  4]   6.00-7.00   sec   974 MBytes  8.17 Gbits/sec
[  4]   7.00-8.00   sec   995 MBytes  8.34 Gbits/sec
[  4]   8.00-9.00   sec   960 MBytes  8.05 Gbits/sec
[  4]   9.00-10.00  sec   971 MBytes  8.15 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  9.31 GBytes  8.00 Gbits/sec                  sender
[  4]   0.00-10.00  sec  9.31 GBytes  8.00 Gbits/sec                  receiver

iperf Done.
PS E:\myfolder> iperf3.exe -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.220, port 49675
[  5] local 192.168.1.201 port 5201 connected to 192.168.1.220 port 49676
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   428 MBytes  3.59 Gbits/sec
[  5]   1.00-2.00   sec   371 MBytes  3.11 Gbits/sec
[  5]   2.00-3.00   sec   406 MBytes  3.40 Gbits/sec
[  5]   3.00-4.00   sec   423 MBytes  3.55 Gbits/sec
[  5]   4.00-5.00   sec   421 MBytes  3.53 Gbits/sec
[  5]   5.00-6.00   sec   410 MBytes  3.44 Gbits/sec
[  5]   6.00-7.00   sec   403 MBytes  3.38 Gbits/sec
[  5]   7.00-8.00   sec   411 MBytes  3.45 Gbits/sec
[  5]   8.00-9.00   sec   414 MBytes  3.48 Gbits/sec
[  5]   9.00-10.00  sec   414 MBytes  3.48 Gbits/sec
[  5]  10.00-10.04  sec  16.8 MBytes  3.67 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  4.02 GBytes  3.44 Gbits/sec                  receiver
Notice how the transfer from `192.168.1.220` to `192.168.1.201` (bottom) is much slower than the top. Interestingly, I get similar results if I use a single gigabit connection instead:
Code:
PS E:\myfolder> iperf3.exe -c 192.168.0.220
Connecting to host 192.168.0.220, port 5201
[  4] local 192.168.0.201 port 54721 connected to 192.168.0.220 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   113 MBytes   950 Mbits/sec
[  4]   1.00-2.00   sec   113 MBytes   949 Mbits/sec
[  4]   2.00-3.00   sec   113 MBytes   949 Mbits/sec
[  4]   3.00-4.00   sec   113 MBytes   950 Mbits/sec
[  4]   4.00-5.00   sec   113 MBytes   949 Mbits/sec
[  4]   5.00-6.00   sec   113 MBytes   948 Mbits/sec
[  4]   6.00-7.00   sec   113 MBytes   950 Mbits/sec
[  4]   7.00-8.00   sec   113 MBytes   949 Mbits/sec
[  4]   8.00-9.00   sec   113 MBytes   949 Mbits/sec
[  4]   9.00-10.00  sec   113 MBytes   948 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.10 GBytes   949 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.10 GBytes   949 Mbits/sec                  receiver

iperf Done.
PS E:\myfolder> iperf3.exe -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.0.220, port 49677
[  5] local 192.168.0.201 port 5201 connected to 192.168.0.220 port 49678
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  69.5 MBytes   583 Mbits/sec
[  5]   1.00-2.00   sec  72.7 MBytes   609 Mbits/sec
[  5]   2.00-3.00   sec  72.7 MBytes   610 Mbits/sec
[  5]   3.00-4.00   sec  72.7 MBytes   609 Mbits/sec
[  5]   4.00-5.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   5.00-6.00   sec  72.7 MBytes   610 Mbits/sec
[  5]   6.00-7.00   sec  72.7 MBytes   610 Mbits/sec
[  5]   7.00-8.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   8.00-9.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   9.00-10.00  sec  72.6 MBytes   609 Mbits/sec
[  5]  10.00-10.01  sec  1.05 MBytes   606 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.01  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.01  sec   724 MBytes   607 Mbits/sec                  receiver
In both cases, transfers from `192.168.X.220` to `192.168.X.201` are not running at full speeds, while they (nearly) are the other way around.

What could be causing the transfer to be slower in one direction and not the other? Could this be a hardware issue? I'll mention that `192.168.X.220` is an "HP Slimline Desktop - 290-p0043w" with a Celeron G4900 CPU running Windows Server 2019 if that is somehow a bottleneck.

I notice the same performance difference when transferring large files from the SSD on one system to the other.

I'm hoping it's a software issue so it can be fixed, but I'm not sure. Any ideas on what could be the culprit?
Have you tried Reverse mode (-R) instead of changing which system is the server? I'm just curious. Usually, asymmetry like that indicates a configuration issue of some sort. Gigabit is not particularly taxing and any reasonably modern CPU and network PHY should be able to do gigabit in both directions with a single iPerf stream on Windows. That G4900 is only 4-ish years old, but is only 2 cores, so that could be an issue if the system is very heavily loaded when you're trying these tests.

Look for Flow Control and Interrupt Moderation settings on the NICs. Flow Control is bad, don't use it (unless you have PFC/DCB). Interrupt Moderation can be situational. I turn it off on my 40Gb cluster interfaces because my priority is throughput at the expense of CPU usage, but it can be beneficial; my workstation example below has it enabled. With Interrupt Moderation disabled, I take a significant sending speed hit from my workstation. Also check for various TX/RX offload settings. If those are missing or disabled, that can also have an impact. Those are at least easy places to start.

Performance example:
This is my workstation to a file server VM. The round trip is my WS > 6 unit stack > LACP bundle to another stack > into a VM cluster (over yet another LACP bundle). My workstation is a i7-9700 with an Intel I219-LM NIC.

File server as the iperf3 host:
Code:
C:\iperf>iperf3.exe -c 172.16.0.20
Connecting to host 172.16.0.20, port 5201
[  4] local 172.16.1.52 port 13522 connected to 172.16.0.20 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  98.8 MBytes   828 Mbits/sec
[  4]   1.00-2.00   sec   104 MBytes   875 Mbits/sec
[  4]   2.00-3.00   sec   105 MBytes   878 Mbits/sec
[  4]   3.00-4.00   sec   103 MBytes   867 Mbits/sec
[  4]   4.00-5.00   sec  98.0 MBytes   822 Mbits/sec
[  4]   5.00-6.00   sec   103 MBytes   863 Mbits/sec
[  4]   6.00-7.00   sec   101 MBytes   846 Mbits/sec
[  4]   7.00-8.00   sec  90.4 MBytes   758 Mbits/sec
[  4]   8.00-9.00   sec  87.8 MBytes   736 Mbits/sec
[  4]   9.00-10.00  sec   101 MBytes   844 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   992 MBytes   832 Mbits/sec                  sender
[  4]   0.00-10.00  sec   991 MBytes   831 Mbits/sec                  receiver

iperf Done.

C:\iperf>iperf3.exe -c 172.16.0.20 -R
Connecting to host 172.16.0.20, port 5201
Reverse mode, remote host 172.16.0.20 is sending
[  4] local 172.16.1.52 port 13529 connected to 172.16.0.20 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   931 Mbits/sec
[  4]   1.00-2.00   sec   111 MBytes   933 Mbits/sec
[  4]   2.00-3.00   sec   112 MBytes   936 Mbits/sec
[  4]   3.00-4.00   sec   110 MBytes   919 Mbits/sec
[  4]   4.00-5.00   sec   110 MBytes   920 Mbits/sec
[  4]   5.00-6.00   sec   112 MBytes   941 Mbits/sec
[  4]   6.00-7.00   sec   110 MBytes   923 Mbits/sec
[  4]   7.00-8.00   sec   110 MBytes   921 Mbits/sec
[  4]   8.00-9.00   sec   111 MBytes   935 Mbits/sec
[  4]   9.00-10.00  sec   110 MBytes   920 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.08 GBytes   928 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.08 GBytes   928 Mbits/sec                  receiver

iperf Done.

My workstation as the iPerf3 host:
Code:
C:\iperf>iperf3.exe -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 172.16.0.20, port 59187
[  5] local 172.16.1.52 port 5201 connected to 172.16.0.20 port 59188
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   103 MBytes   868 Mbits/sec
[  5]   1.00-2.00   sec   110 MBytes   920 Mbits/sec
[  5]   2.00-3.00   sec   102 MBytes   858 Mbits/sec
[  5]   3.00-4.00   sec   106 MBytes   891 Mbits/sec
[  5]   4.00-5.00   sec   111 MBytes   932 Mbits/sec
[  5]   5.00-6.00   sec   110 MBytes   924 Mbits/sec
[  5]   6.00-7.00   sec   112 MBytes   937 Mbits/sec
[  5]   7.00-8.00   sec   111 MBytes   932 Mbits/sec
[  5]   8.00-9.00   sec   109 MBytes   918 Mbits/sec
[  5]   9.00-10.00  sec   111 MBytes   935 Mbits/sec
[  5]  10.00-10.05  sec  5.56 MBytes   947 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.05  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.05  sec  1.07 GBytes   912 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 172.16.0.20, port 59189
[  5] local 172.16.1.52 port 5201 connected to 172.16.0.20 port 59190
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  89.0 MBytes   746 Mbits/sec
[  5]   1.00-2.00   sec  99.0 MBytes   830 Mbits/sec
[  5]   2.00-3.00   sec   103 MBytes   863 Mbits/sec
[  5]   3.00-4.00   sec   104 MBytes   877 Mbits/sec
[  5]   4.00-5.00   sec   103 MBytes   863 Mbits/sec
[  5]   5.00-6.00   sec   104 MBytes   874 Mbits/sec
[  5]   6.00-7.00   sec  97.9 MBytes   821 Mbits/sec
[  5]   7.00-8.00   sec  97.5 MBytes   817 Mbits/sec
[  5]   8.00-9.00   sec   102 MBytes   856 Mbits/sec
[  5]   9.00-10.00  sec   102 MBytes   862 Mbits/sec
[  5]  10.00-10.05  sec  4.75 MBytes   835 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.05  sec  1007 MBytes   841 Mbits/sec                  sender
[  5]   0.00-10.05  sec  0.00 Bytes  0.00 bits/sec                  receiver
You can see consistent performance regardless of which side is the host.
 

dazagrt

Active Member
Mar 1, 2021
195
97
28
I'll bet you it's the cards! I've seen it before. Just quickly pop out the cards and go to an Intel card and try it.

I've seen it happen with 2.5Gb Realtek + 2.5Gb Intel as well. It only seems to show up on the server end, as in, the server will push a poor result but if you give iperf3 a -R the server end will pull at full speed - where the setup is Realtek at the server end and Intel on the client end.
 

jtabc

New Member
Jul 31, 2022
17
0
1
Have you tried Reverse mode (-R) instead of changing which system is the server? I'm just curious. Usually, asymmetry like that indicates a configuration issue of some sort. Gigabit is not particularly taxing and any reasonably modern CPU and network PHY should be able to do gigabit in both directions with a single iPerf stream on Windows. That G4900 is only 4-ish years old, but is only 2 cores, so that could be an issue if the system is very heavily loaded when you're trying these tests.

Look for Flow Control and Interrupt Moderation settings on the NICs. Flow Control is bad, don't use it (unless you have PFC/DCB). Interrupt Moderation can be situational. I turn it off on my 40Gb cluster interfaces because my priority is throughput at the expense of CPU usage, but it can be beneficial; my workstation example below has it enabled. With Interrupt Moderation disabled, I take a significant sending speed hit from my workstation. Also check for various TX/RX offload settings. If those are missing or disabled, that can also have an impact. Those are at least easy places to start.

Performance example:
This is my workstation to a file server VM. The round trip is my WS > 6 unit stack > LACP bundle to another stack > into a VM cluster (over yet another LACP bundle). My workstation is a i7-9700 with an Intel I219-LM NIC.

File server as the iperf3 host:
Code:
C:\iperf>iperf3.exe -c 172.16.0.20
Connecting to host 172.16.0.20, port 5201
[  4] local 172.16.1.52 port 13522 connected to 172.16.0.20 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  98.8 MBytes   828 Mbits/sec
[  4]   1.00-2.00   sec   104 MBytes   875 Mbits/sec
[  4]   2.00-3.00   sec   105 MBytes   878 Mbits/sec
[  4]   3.00-4.00   sec   103 MBytes   867 Mbits/sec
[  4]   4.00-5.00   sec  98.0 MBytes   822 Mbits/sec
[  4]   5.00-6.00   sec   103 MBytes   863 Mbits/sec
[  4]   6.00-7.00   sec   101 MBytes   846 Mbits/sec
[  4]   7.00-8.00   sec  90.4 MBytes   758 Mbits/sec
[  4]   8.00-9.00   sec  87.8 MBytes   736 Mbits/sec
[  4]   9.00-10.00  sec   101 MBytes   844 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   992 MBytes   832 Mbits/sec                  sender
[  4]   0.00-10.00  sec   991 MBytes   831 Mbits/sec                  receiver

iperf Done.

C:\iperf>iperf3.exe -c 172.16.0.20 -R
Connecting to host 172.16.0.20, port 5201
Reverse mode, remote host 172.16.0.20 is sending
[  4] local 172.16.1.52 port 13529 connected to 172.16.0.20 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   931 Mbits/sec
[  4]   1.00-2.00   sec   111 MBytes   933 Mbits/sec
[  4]   2.00-3.00   sec   112 MBytes   936 Mbits/sec
[  4]   3.00-4.00   sec   110 MBytes   919 Mbits/sec
[  4]   4.00-5.00   sec   110 MBytes   920 Mbits/sec
[  4]   5.00-6.00   sec   112 MBytes   941 Mbits/sec
[  4]   6.00-7.00   sec   110 MBytes   923 Mbits/sec
[  4]   7.00-8.00   sec   110 MBytes   921 Mbits/sec
[  4]   8.00-9.00   sec   111 MBytes   935 Mbits/sec
[  4]   9.00-10.00  sec   110 MBytes   920 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.08 GBytes   928 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.08 GBytes   928 Mbits/sec                  receiver

iperf Done.

My workstation as the iPerf3 host:
Code:
C:\iperf>iperf3.exe -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 172.16.0.20, port 59187
[  5] local 172.16.1.52 port 5201 connected to 172.16.0.20 port 59188
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   103 MBytes   868 Mbits/sec
[  5]   1.00-2.00   sec   110 MBytes   920 Mbits/sec
[  5]   2.00-3.00   sec   102 MBytes   858 Mbits/sec
[  5]   3.00-4.00   sec   106 MBytes   891 Mbits/sec
[  5]   4.00-5.00   sec   111 MBytes   932 Mbits/sec
[  5]   5.00-6.00   sec   110 MBytes   924 Mbits/sec
[  5]   6.00-7.00   sec   112 MBytes   937 Mbits/sec
[  5]   7.00-8.00   sec   111 MBytes   932 Mbits/sec
[  5]   8.00-9.00   sec   109 MBytes   918 Mbits/sec
[  5]   9.00-10.00  sec   111 MBytes   935 Mbits/sec
[  5]  10.00-10.05  sec  5.56 MBytes   947 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.05  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.05  sec  1.07 GBytes   912 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 172.16.0.20, port 59189
[  5] local 172.16.1.52 port 5201 connected to 172.16.0.20 port 59190
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  89.0 MBytes   746 Mbits/sec
[  5]   1.00-2.00   sec  99.0 MBytes   830 Mbits/sec
[  5]   2.00-3.00   sec   103 MBytes   863 Mbits/sec
[  5]   3.00-4.00   sec   104 MBytes   877 Mbits/sec
[  5]   4.00-5.00   sec   103 MBytes   863 Mbits/sec
[  5]   5.00-6.00   sec   104 MBytes   874 Mbits/sec
[  5]   6.00-7.00   sec  97.9 MBytes   821 Mbits/sec
[  5]   7.00-8.00   sec  97.5 MBytes   817 Mbits/sec
[  5]   8.00-9.00   sec   102 MBytes   856 Mbits/sec
[  5]   9.00-10.00  sec   102 MBytes   862 Mbits/sec
[  5]  10.00-10.05  sec  4.75 MBytes   835 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.05  sec  1007 MBytes   841 Mbits/sec                  sender
[  5]   0.00-10.05  sec  0.00 Bytes  0.00 bits/sec                  receiver
You can see consistent performance regardless of which side is the host.
Sorry for the late reply. I tried -R reverse mode and received the same results as my initial post (I tried all 8 combinations: with and without reverse mode, with the built-in gigabit and with the 10gig nic, and with the HP server acting as the Iperf server and the other machine acting as the server). In all cases, transfers from `192.168.X.220` (the HP slimline) to `192.168.X.201` (the workstation) were much slower.

I'm looking into the flow control suggestion right now. It seems like both are set to " Rx & Tx Flow control is enabled " by default. Here is the documentation I'm referencing:

 
Last edited:

jtabc

New Member
Jul 31, 2022
17
0
1
I'll bet you it's the cards! I've seen it before. Just quickly pop out the cards and go to an Intel card and try it.

I've seen it happen with 2.5Gb Realtek + 2.5Gb Intel as well. It only seems to show up on the server end, as in, the server will push a poor result but if you give iperf3 a -R the server end will pull at full speed - where the setup is Realtek at the server end and Intel on the client end.
As I mentioned in my prior post, -R didn't help. Additionally, I don't see how it can be the cards considering I'm observing this result both with the 10-gig card as well as with the built-in 1-gig ethernet. It must be related to the software that I'm running on the server (Windows Server 2019) or it's a hardware (CPU or RAM, I assume) limitation.
 
Last edited:

jtabc

New Member
Jul 31, 2022
17
0
1
Unless the hardware (and OS) is exact mirrors of each other, I wouldn't expect the performance to be symmetric.

The data being transferred had to get from the source (SSD drive), across the bus, into CPU cache, then into RAM, then across north/south bridge and out to the NIC. It's not just two 10GB or 1Gbps NICs connected (or whatever the devices involved are). And there are two perspectives in network performance analysis: should the results include the time for the respective applications (sender/receiver) to package up and fully receive the data payload, or should it purely be the "data queued to NIC and confirmed sent" time. It's a subtle but important difference. To users, it is the application end-to-end performance that matters - you can't stop the clock when the NIC buffer has finished receiving data, if the user can't do anything with the data until it's be re-packaged into a cohesive unit down at the application level. But if you're just focused on pure network performance, maybe you're focused on cable quality, and want to ignore or disregard the overhead of the H/W and OS getting that data to the NIC.



One of those machines I would guess has a slower RAM and/or slower CPU (or a smaller cache), which is going to cut into the timing of how efficiently it feeds (serves) that data (and also relative to whatever the default settings of iperf3 is -- in terms of size of RAM buffers used to tx/rx that data, which can be impacted by the size of the CPU cache).

[ assuming you've also killed as much background processing under win10 on both systems, such that they're fairly uniform on that ]
Here is the hardware for each machine. Would you expect my observation considering the following hardware?

HP slimline (file server) ( 192.168.X.220 ):
CPU: Celeron G4900
RAM: 4 GB @ 2666 MHz (according to task manager)

Desktop / Workstation ( 192.168.X.201 ):
CPU: i7 5960x
RAM: 64 GB @ 2133 MHz (according to task manager)
 

LodeRunner

Active Member
Apr 27, 2019
540
227
43
When using iPerf, on the Gigabit I would expect symmetric performance, which is why I strongly suspect it's some combination of Flow Control, Interrupt Moderation, and Offload settings that are being problematic.

10G tuning is harder.