Strange performance issue connecting 10G and 1G LANs

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

bolek

New Member
Mar 25, 2023
10
0
1
Hi,

I have two LANs connected to each other: one is 1-gigabit ethernet (primarily Netgear GS108 unmanaged switches) and the other is 10-gigabit ethernet (using TP-Link TL-SX105 unmanaged switch and copper wires). On its own each LAN works with expected throughput between computers on the same LAN, so I don't think I have any switch or cabling issue.

But when I connect the two LANs together a strange thing happens. The traffic from the 1-gigabit LAN to the 10-gigabit LAN works fine, with the expected 1-gigabit speed. But the traffic from 10-gigabit LAN to 1-gigabit LAN is very slow, only at around 0.2 gigabit/s. This is even when no other traffic is going on. What could be the cause? Is there a way to avoid it?

I tried swapping cables and/or ports but none of this makes any difference. I also tried to enable jumbo frames but this made things worse (not much though), although it did have the expected benefit on each LAN separately.

Thanks
Bolek
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
How are you measuring the speed? Are your test devices Linux or Windows? Could you try iperf3 if you haven't yet?

Could be an MTU or duplex mismatch maybe. Try double-checking both sides have jumbos disabled, and if in WIndows, check your MTU on both sides with the command-line: netsh interface ipv4 show subinterface:
1679778620286.png

Try hard-setting the speed to 1G Full Duplex instead of autodetect:
1679778133738.png

Try setting both sides to 1G, just for a test and see what happens.

Try to find your true MTU using the PING command repeated over and over, in WIndows from both sides like this: ping 10.41.1.1 -f -l 1200

The -f prevents packet fragmentation, and -l sets the length. Keep increasing/decreasing the length until you get a failure. (Ex: Try 1000, then 9000, 4500, etc... in a methodical order to try to find the max supported MTU)

The "proper" failure will be a message like "Packet needs fragmentation but DF bit set". If you get a timeout, that means you've successfully sent a packet, but did not get one back. This could indicate a problem, and it can present as a really bad performance problem.

This is how it should look ideally: There should be a distinct boundary where it fails with the DF error, but just 1 byte less works - No timeouts.
1679778856444.png

If you can, use WIreshark to get a packet trace from both sides, if you still can't get it to work properly.
 

bolek

New Member
Mar 25, 2023
10
0
1
Hi, thanks for your reply.

It's mostly Linux, though I have one Windows machine on the 10G side but it has the same results. Everything points to the direction of the traffic, not any particular host problem. 1G -> 10G seems fine but 10G -> 1G is too slow by a factor of 4-5x.

Yes, I am already using iperf3. But I also tested HTTP transfer using curl with similar results.

Right now MTU is set to 1500 everywhere. In the examples below host "pilot" is on the 1G side and host "vapor" is on the 10G side. But it does not matter which particular host I use, it only matters if it is on the 1G LAN or on the 10G LAN.

Running on 1G side talking to iperf3 server on 10G side:
Bash:
[pilot:/tmp] ping -c 1 -s 1472 -M do vapor
PING vapor (192.168.1.8) 1472(1500) bytes of data.
1480 bytes from vapor (192.168.1.8): icmp_seq=1 ttl=64 time=0.351 ms

--- vapor ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.351/0.351/0.351/0.000 ms

[pilot:/tmp] ping -c 1 -s 1473 -M do vapor
PING vapor (192.168.1.8) 1473(1501) bytes of data.
ping: local error: message too long, mtu=1500

--- vapor ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

[pilot:/tmp] iperf3 -c vapor
Connecting to host vapor, port 5201
[  5] local 192.168.1.5 port 41438 connected to 192.168.1.8 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   115 MBytes   962 Mbits/sec    0    471 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0    494 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   942 Mbits/sec    0    494 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   938 Mbits/sec    0    516 KBytes       
[  5]   4.00-5.00   sec   112 MBytes   940 Mbits/sec    0    516 KBytes       
[  5]   5.00-6.00   sec   113 MBytes   947 Mbits/sec    0    516 KBytes       
[  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec    0    516 KBytes       
[  5]   7.00-8.00   sec   113 MBytes   947 Mbits/sec    0    516 KBytes       
[  5]   8.00-9.00   sec   111 MBytes   933 Mbits/sec    0    516 KBytes       
[  5]   9.00-10.00  sec   113 MBytes   952 Mbits/sec    7    438 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec    7             sender
[  5]   0.00-10.04  sec  1.10 GBytes   938 Mbits/sec                  receiver

iperf Done.

[pilot:/tmp] iperf3 -c vapor -R
Connecting to host vapor, port 5201
Reverse mode, remote host vapor is sending
[  5] local 192.168.1.5 port 46730 connected to 192.168.1.8 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  23.6 MBytes   198 Mbits/sec                  
[  5]   1.00-2.00   sec  28.4 MBytes   238 Mbits/sec                  
[  5]   2.00-3.00   sec  11.6 MBytes  97.2 Mbits/sec                  
[  5]   3.00-4.00   sec  24.9 MBytes   209 Mbits/sec                  
[  5]   4.00-5.00   sec  20.3 MBytes   170 Mbits/sec                  
[  5]   5.00-6.00   sec  29.1 MBytes   244 Mbits/sec                  
[  5]   6.00-7.00   sec  18.4 MBytes   155 Mbits/sec                  
[  5]   7.00-8.00   sec  21.0 MBytes   177 Mbits/sec                  
[  5]   8.00-9.00   sec  14.1 MBytes   119 Mbits/sec                  
[  5]   9.00-10.00  sec  19.6 MBytes   165 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   211 MBytes   177 Mbits/sec  1724             sender
[  5]   0.00-10.00  sec   211 MBytes   177 Mbits/sec                  receiver

iperf Done.
[pilot:/tmp]
And on the 10G side talking to iperf3 server on 1G side:
Bash:
[vapor:/tmp] ping -c 1 -s 1472 -M do pilot
PING pilot (192.168.1.5) 1472(1500) bytes of data.
1480 bytes from pilot (192.168.1.5): icmp_seq=1 ttl=64 time=0.324 ms

--- pilot ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.324/0.324/0.324/0.000 ms

[vapor:/tmp] ping -c 1 -s 1473 -M do pilot
PING pilot (192.168.1.5) 1473(1501) bytes of data.
ping: local error: message too long, mtu=1500

--- pilot ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

[vapor:/tmp] iperf3 -c pilot
Connecting to host pilot, port 5201
[  5] local 192.168.1.8 port 60112 connected to 192.168.1.5 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  20.1 MBytes   168 Mbits/sec  152   2.83 KBytes       
[  5]   1.00-2.00   sec  18.5 MBytes   155 Mbits/sec  162   9.90 KBytes       
[  5]   2.00-3.00   sec  21.3 MBytes   178 Mbits/sec  135   7.07 KBytes       
[  5]   3.00-4.00   sec  22.7 MBytes   190 Mbits/sec  191   2.83 KBytes       
[  5]   4.00-5.00   sec  20.9 MBytes   175 Mbits/sec  161   5.66 KBytes       
[  5]   5.00-6.00   sec  20.4 MBytes   171 Mbits/sec  185   5.66 KBytes       
[  5]   6.00-7.00   sec  17.9 MBytes   150 Mbits/sec  159   2.83 KBytes       
[  5]   7.00-8.00   sec  19.0 MBytes   159 Mbits/sec  161   2.83 KBytes       
[  5]   8.00-9.00   sec  20.4 MBytes   172 Mbits/sec  173   11.3 KBytes       
[  5]   9.00-10.00  sec  17.0 MBytes   142 Mbits/sec  155   7.07 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   198 MBytes   166 Mbits/sec  1634             sender
[  5]   0.00-10.00  sec   197 MBytes   166 Mbits/sec                  receiver

iperf Done.

[vapor:/tmp] iperf3 -c pilot -R
Connecting to host pilot, port 5201
Reverse mode, remote host pilot is sending
[  5] local 192.168.1.8 port 44652 connected to 192.168.1.5 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   112 MBytes   940 Mbits/sec                  
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   5.00-6.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   6.00-7.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

iperf Done.
[vapor:/tmp]
I also tried to disable auto-negotiation but it didn't make any difference. Actually, I do not believe that it could have since this can only affect the first link (from the host to the nearest switch) and this works fine. I do get the expected throughput within each LAN separately.
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
Thanks for the data, I think I might have an idea of what's going on. The Retry count is super high from 10G -> 1G.

Did you try hard setting 1G at both ends? What I'm wondering, is with the extremely high retry count, the host on the 10G may be firing packets way too fast so most of them are dropped, and the switch buffer fills up and doesn't send pause frames, or possibly too many pause frames - Basically this problem:

https://www.reddit.com/r/networking/comments/b5uo0p
Since the switches are unmanaged, there's not much we can do if they don't handle this well. Possibly disabling flow control on the hosts may help, or implementing QoS policies. Try hard setting 1G at both ends and see what happens. Have a look at the links - If you don't get anywhere I'll try to dig deeper for you or think of more general workarounds that don't involve host config.

Hopefully someone else may have ideas as well.
 
  • Like
Reactions: klui and bolek

bolek

New Member
Mar 25, 2023
10
0
1
P.S. I also tried to run Wireshark and created a 1 second capture of the iperf3 traffic in the "bad" case. But TBH, I have no clue what to look for :(. The capture is about 20MB compressed but I don't think it's wise to post it here?
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
I'd be happy to look at your captures, if you're willing to send them to me. Can you upload them to somewhere like gdrive or onedrive, then DM me a link? Then you can delete them. They may contain some sensitive information, just be aware of that.
 

bolek

New Member
Mar 25, 2023
10
0
1
I also tried connecting the 1G host directly to the 10G switch (instead of via 1G switch) and the problem went away. So I guess the issue is between the two switches? Like you said, since they are unmanaged, probably not much that can be done? I will read the links though.
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
Received - I will have a look.

Wow that's crazy I was just typing a message to ask you that question - I think it may be a problem with flow control in the firmware of the TP-Link unfortunately.
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
The packet capture does show quite a few TCP errors. The problem may be that PAUSE frames are interacting in a strange way between the switches, possibly too many are being sent. Reading online shows this 10G->1G problem is not uncommon.

On your 1G network, would it be possible to isolate only 1 of the switches with only 1 thing plugged in? So where you have 10G->1G->Other, drop the "Other" and everything on the single 1G switch that isn't being tested, then try iperf again.

If you have other brands/models of 1G switches to test with, that would also be useful. (as the direct hop from the 10G)

I think a possible workaround is to replace your 1st hop 1G switch with something different/better. Other than that, I'm not sure unfortunately. I'll let you know if I think of anything else.

Edit: The netgear has 192K of buffer, and the TP-Link has 2M of buffer. This may play into it.
 
Last edited:

bolek

New Member
Mar 25, 2023
10
0
1
Yeah, I after reading some of the stuff that you sent it seems like this happens to other people too.

I tried isolation for testing like you suggested but that didn't change anything. Unfortunately all my 1G switches are old and similar. I do have one non-Netgear switch, but it's the same vintage and class and behaves the same way.

You may be right that replacing them with something better might help but I am reluctant to spend more money on 1G switches given that there is no guarantee. I think I may be better off saving for 10G replacements instead. This seems like it will almost certainly fix the problem (based on the experiment) even if some hosts stay 1G only. But it's expensive for sure so I don't know.

Thanks for your help
 

pimposh

hardware pimp
Nov 19, 2022
139
77
28
I would start by replacing the TP-Link with another 10G unit. Unfortunately (and I had >3 of them) they all failed within the first few months. Ports often hang up for no reason. Dollars vs. pennies 1G switches aren't an issue in your network.
 

DavidWJohnston

Active Member
Sep 30, 2020
242
191
43
Maybe you can borrow a more modern 1G switch of a different model, with a larger buffer and see what happens. But yeah no guarantee, might be a waste of time.

I agree - If you're going to put any moderate amount of money towards it, a managed 10G switch would be an excellent upgrade. Your existing one is probably sellable.

I hope you get it solved, I'm happy to help, good luck!
 

bolek

New Member
Mar 25, 2023
10
0
1
Hmm, I see. My problem is that I need several smaller and fanless switches instead of a bigger one (so far two of them are 10G). I am limited by the existing wiring in the walls. It's either that or no 10G at all :).

The only comparable 10G switch that I know of is the Trendnet TEG-S750. Is it any better than the TP-Link? AFAIK, there aren't any managed switches in this size.
 

pimposh

hardware pimp
Nov 19, 2022
139
77
28
How many 10G clients you need in total ?
You can go with the QSW-M408-4C, although it's not completely silent (good enough to keep it in bedroom thou)
 

bolek

New Member
Mar 25, 2023
10
0
1
5 but maybe I can live with 4. This is in two locations, so would need 3 switches but maybe only one needs to be managed. QSW-M408-4C is interesting, though big and not fanless. I will think about it. I assume that since it has both 1G and 10G ports, it would avoid the 10G->1G problem that started this thread?
 

pimposh

hardware pimp
Nov 19, 2022
139
77
28
I can confirm that tcp stack in this model is fine. No issues found 1G<>10G running with really long uptimes without issues.