Mixing 5GbE and 10GbE speeds - slower than expected

hmw · Aug 1, 2020

meyergru said:
Hi,

I am currently having a somewhat disappointing experience:

Read https://forums.servethehome.com/ind...i-gigabit-1-2-5-5-10-switch.20851/post-274105 and

https://forums.servethehome.com/index.php?threads/any-considerations-on-a-multi-gigabit-1-2-5-5-10-switch.20851/post-274267

I believe the problem is that the transceivers are 10Gbit on the switch side and n-gbit on the endpoint side. It's not TCP/IP but probably the switch itself that cant set the speeds properly. After all, it believes the connection is 10Gbit. I can get the same problems with a data center type switch like the ICX-6610 (The switch is capable of handling 4 x 40 gbE + 8 x 10GbE + 24 x 1GbE traffic, so 'buffers' or speed shouldnt be a problem)

The ICX 6610 is connected via SFP+ & an Ipolex transceiver for 10 GbE and a QSFP+ DAC for 40 GbE. The 10GbE connects to a Windows 10 box via a OWC AQ107 TB3 adapter. I can set the link speed explicitly to 1/2.5/5/10 GbE

- Windows 10GbE <-> ESXi 40 GbE gives 9.5 Gbit/s bi-directional

- Setting the TB3/AQ107 to 5GbE gives 5 Gbit/s from Win10 to ESXi and 1.5 Gbit/s reverse

- Setting the TB3/AQ107 to 2.5 GbE or even 1GbE gives 2.4Gbit/s & 900Mbit/s from Win 10 to ESXi but 80 Mbit/s reverse in both cases.

But if you test with restricted bandwidth (i.e. iperf3 -b 2.5G or iperf3 -b 1G) - you'll get the 2.4 Gbit/s and 900 Mbit/s as expected. And reverse speeds improve to 1.9 Gbit/s and 800 Mbit/s respectively

I've tested with Wiitek and it makes no difference. I believe if you want to mix speeds, you need a multi gigabit switch like the Netgear MS510TX or Buffalo BS-MP2012/2008 - or then you need TB3 adapters everywhere you want the full 10gbit/s speeds

meyergru · Aug 1, 2020

It is correct that my switch does not "see" the real downlink speed being only 5 GBit/s. But even if a switch has that knowledge, with a speed mix there will always be drops after the packet buffer has been exhausted.

Have you tried if your TCP window size is large or if if becomes reduced because of small buffers? I can find no tech specs with regard to RAM or buffer sizes of the ICX-6610.

Alas, I cannot test your theory any more. Matter-of-fact I had a Buffalo BS-MP2008, but exchanged it with the Cisco. I never had the chance to check 5 GBit/s because I did not have 5-Base-T capable adapters then.

Also, I second that probably, this is a problem between Linux and Windows TCP stacks. I lack the opportunity to check with other combinations.

For now, I resorted to using 10/10 again by using the native 10-Gbe-only ports on my Cisco for the Cat.5e house cabling. I had no dropouts today, maybe because I now use the more modern Intel X550 instead of X540 adapters. Speeds are up again to more appropriate levels (faster than my RAID speed at least). If no problems occur, I will call it a day.

madbrain · Aug 1, 2020

hmw said:
But if you test with restricted bandwidth (i.e. iperf3 -b 2.5G or iperf3 -b 1G) - you'll get the 2.4 Gbit/s and 900 Mbit/s as expected. And reverse speeds improve to 1.9 Gbit/s and 800 Mbit/s respectively

That helps a little bit here, but not to the extent you are saying. Still nowhere close to 2.5 Gbps.
This is with the client NIC forced to 2.5 Gbps in device manager. Server at 10 G. Both Aquantia, on TEG-7080ES, which is supposed to be an Nbase-T switch. No transceivers involved.

C:\Users\Julien Pierre\Desktop\iperf3>iperf3.exe -N -c server10g -R
Connecting to host server10g, port 5201
Reverse mode, remote host server10g is sending
[ 4] local 192.168.1.26 port 59666 connected to 192.168.1.27 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 116 MBytes 972 Mbits/sec
[ 4] 1.00-2.00 sec 118 MBytes 987 Mbits/sec
[ 4] 2.00-3.00 sec 124 MBytes 1.04 Gbits/sec
[ 4] 3.00-4.00 sec 127 MBytes 1.06 Gbits/sec
[ 4] 4.00-5.00 sec 118 MBytes 987 Mbits/sec
[ 4] 5.00-6.00 sec 120 MBytes 1.01 Gbits/sec
[ 4] 6.00-7.00 sec 122 MBytes 1.02 Gbits/sec
[ 4] 7.00-8.00 sec 126 MBytes 1.06 Gbits/sec
[ 4] 8.00-9.00 sec 113 MBytes 946 Mbits/sec
[ 4] 9.00-10.00 sec 112 MBytes 941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.17 GBytes 1.00 Gbits/sec 7107 sender
[ 4] 0.00-10.00 sec 1.17 GBytes 1.00 Gbits/sec receiver

iperf Done.

C:\Users\Julien Pierre\Desktop\iperf3>iperf3.exe -N -c server10g -R -b 2.5G
Connecting to host server10g, port 5201
Reverse mode, remote host server10g is sending
[ 4] local 192.168.1.26 port 59696 connected to 192.168.1.27 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 156 MBytes 1.31 Gbits/sec
[ 4] 1.00-2.00 sec 166 MBytes 1.39 Gbits/sec
[ 4] 2.00-3.00 sec 162 MBytes 1.36 Gbits/sec
[ 4] 3.00-4.00 sec 167 MBytes 1.40 Gbits/sec
[ 4] 4.00-5.00 sec 163 MBytes 1.37 Gbits/sec
[ 4] 5.00-6.00 sec 163 MBytes 1.37 Gbits/sec
[ 4] 6.00-7.00 sec 166 MBytes 1.39 Gbits/sec
[ 4] 7.00-8.00 sec 166 MBytes 1.39 Gbits/sec
[ 4] 8.00-9.00 sec 164 MBytes 1.38 Gbits/sec
[ 4] 9.00-10.00 sec 167 MBytes 1.40 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.60 GBytes 1.38 Gbits/sec 664 sender
[ 4] 0.00-10.00 sec 1.60 GBytes 1.38 Gbits/sec receiver

hmw · Aug 1, 2020

meyergru said:
Have you tried if your TCP window size is large or if if becomes reduced because of small buffers? I can find no tech specs with regard to RAM or buffer sizes of the ICX-6610.

Tried all the usual TCP and IPv4 tuning routines for Windows, Mac and Linux - made very little to no difference. A lot of things (interrupt moderation etc) made things worse

I have a Netgear MS510TX - will probably try an experiment with that, using a ConnectX-3 and Realtek USB adapters ... but it's a pain to test it out

madbrain · Aug 1, 2020

FYI, also changed the "Receive buffers" from 512 to the max of 4096 in the Aquantia drivers on the client side. No improvement.

meyergru · Aug 1, 2020

hmw said:
Tried all the usual TCP and IPv4 tuning routines for Windows, Mac and Linux - made very little to no difference. A lot of things (interrupt moderation etc) made things worse

And what was your actual window size? If it was limited, that may well be the same problem. What makes you think the buffer size is not the problem? Actually, at 128K and 5 Gbe, the handshake rate is 5000/s. I don't know if Windows TCP stack can handle that, especially considering the fact that in my case, disabling window scaling (effectively limiting the buffer to 64K) made things worse.

hmw · Aug 2, 2020

meyergru said:
And what was your actual window size? If it was limited, that may well be the same problem. What makes you think the buffer size is not the problem? Actually, at 128K and 5 Gbe, the handshake rate is 5000/s. I don't know if Windows TCP stack can handle that, especially considering the fact that in my case, disabling window scaling (effectively limiting the buffer to 64K) made things worse.

I've done tests with window scaling with different parameters. I've gone through the entire list of parameters, turning them on and off one at a time. The point is: why would a 10Gb speed not cause problems but a 2.5Gb or 5 Gb speed cause asymmetrical speeds?

I've also done tests with the ICX6610 and rate limiting / rate shaping input and output to clamp it to 2.5 Gbps and I still have the same asymmetrical speeds.

At this point I'm like you = I'll buy TB3 10GbE adapters and call it a day ...

madbrain · Aug 2, 2020

hmw said:
At this point I'm like you = I'll buy TB3 10GbE adapters and call it a day ...

Most computers don't have TB connectors. The only one I have out of 8 PCs in the house that has a TB3 port is my work-issued laptop. And the TB3 is fully used for a docking station with two monitors running at 4K/60. No bandwidth left for 10GBASE-T.
Not to speak of the 3 single-board computers. ODROID XU4 and N2+ have USB 3. Raspberry Pi3 only USB 2 (maxed at <300 mbps over ethernet, with a USB 3.0 1 Gbps NIC, sadly).

Also, the OP has CAT5 that can't support 10gigs. Not really a solution.

hmw · Aug 2, 2020

madbrain said:
Also, the OP has CAT5 that can't support 10gigs. Not really a solution.

Both the OP and I have Cat5E cabling and we’re both running 10GbE NICs.

meyergru · Aug 2, 2020

hmw said:
I've done tests with window scaling with different parameters. I've gone through the entire list of parameters, turning them on and off one at a time. The point is: why would a 10Gb speed not cause problems but a 2.5Gb or 5 Gb speed cause asymmetrical speeds?

I explained that already: Consider the case were you have a limited buffer of, say, 128K. It does not matter how big the buffer size is when you do not match speeds as each packet that is being sent can be transmitted to the receiving end.

On the other hand, when you mix a 10 Gbe sender with a 5 Gbe receiver, once that the receiver cannot handle all packets, the buffer fills up with non-transmitted packets until it becomes exhausted, after which packets will get dropped.

At that point, TCP will have to react by having the missing packets by requesting retransmissions, aka "handshaking" instead of just "streaming". Only in that situation, where TCP congestion handling kicks in, will you notice how good it is at high speeds. Normally, over internet lines, the rate at which handshaking occurs will be quite low, such that the latency at which TCP reacts will not matter. Over a 5 Gbe connection, with a 128K, those handshakes will be neccessary at a rate of 5000/s. The smaller the buffer, the higher that frequency. With a fixed reaction latency, the gaps caused by the handshakes will also be fixed, such that a higher frequency will cause those gaps to become larger relative to the "streaming" phases. That explains why the resulting throughput would be lower when the TCP window size is smaller.

The only question is if there is a kind of natural limit to the TCP window size because of Linux/Windows TCP stack interactions or if the problem can be fixed by a larger packet buffer.

You still did not state what your TCP window size actually is. I suspect it to be less than 1 GByte and be equal to the actual packet buffer size. I also think that it is bigger when you do not mix speeds, because the buffer size does not matter in that case.

hmw said:
Both the OP and I have Cat5E cabling and we’re both running 10GbE NICs.

No, I have old Cat.5a. At least with one PC with the Buffalo Adapter, 10 Gbe cannot be reached, the connection always falls back to 1 Gbe and if I fix it at 10 Gbe, it does not connect at all. I am still trying with the X550 which initially connects at 10 Gbe if it is stable.

hmw · Aug 3, 2020

meyergru said:
I explained that already: Consider the case were you have a limited buffer of, say, 128K. It does not matter how big the buffer size is when you do not match speeds as each packet that is being sent can be transmitted to the receiving end.

On the other hand, when you mix a 10 Gbe sender with a 5 Gbe receiver, once that the receiver cannot handle all packets, the buffer fills up with non-transmitted packets until it becomes exhausted, after which packets will get dropped.

At that point, TCP will have to react by having the missing packets by requesting retransmissions, aka "handshaking" instead of just "streaming". Only in that situation, where TCP congestion handling kicks in, will you notice how good it is at high speeds. Normally, over internet lines, the rate at which handshaking occurs will be quite low, such that the latency at which TCP reacts will not matter. Over a 5 Gbe connection, with a 128K, those handshakes will be neccessary at a rate of 5000/s. The smaller the buffer, the higher that frequency. With a fixed reaction latency, the gaps caused by the handshakes will also be fixed, such that a higher frequency will cause those gaps to become larger relative to the "streaming" phases. That explains why the resulting throughput would be lower when the TCP window size is smaller.

The only question is if there is a kind of natural limit to the TCP window size because of Linux/Windows TCP stack interactions or if the problem can be fixed by a larger packet buffer.

You still did not state what your TCP window size actually is. I suspect it to be less than 1 GByte and be equal to the actual packet buffer size. I also think that it is bigger when you do not mix speeds, because the buffer size does not matter in that case.

No, I have old Cat.5a. At least with one PC with the Buffalo Adapter, 10 Gbe cannot be reached, the connection always falls back to 1 Gbe and if I fix it at 10 Gbe, it does not connect at all. I am still trying with the X550 which initially connects at 10 Gbe if it is stable.

What specific TCP window size are you referring to?

Modern TCP/IP stacks have autotuning so TCP receive window scales automagically with the shift count in the header field - the max value for scaling is 2^14, so the window size is (2^16 -1) x (2^14) = 65534 x 16384 = 1073725440 = 1GB. You can set it in Windows 10 by setting netsh interface tcp set global autotuninglevel Experimental - this will set it to 0xE or 14

I've tried that and there's no change in speeds (and windows scaling heuristics is disabled)

Again - you can use iperf to force a specific bandwidth for testing. Hence if you wont send more than 2.5 Gb/s - it doesn't matter that the sender link speed is 5 GbE or 50 GbE. And if what you're saying is the ONLY thing that applies - you would see a gradual decrease in performance from 2.5 Gb/s to say 5 Gb/s when you look at iperf results with -b varying from 2.5G to 5G. Hint: the receiving speed remains constant at 1.3 Gb/s, which in my case is the limit with a 64k buffer and RTT of 0.4msec

If you're referring to TX and RX receive buffers - usually for non DMA NICs, the driver doesn't need to expose RX & TX buffer options, at least on the Realtek USB NICs, I don't see an option for RX and TX buffers.

On the TB3 AQ-107 10GbE adapters, RX and TX buffers can be set in the driver. For setting it - assuming a RTT of 0.4 ms, 5Gbps implies a buffer size of 244 kbytes is needed and for 2.5 Gbps, it's 122 kbytes. This means even 256 buffers for TX and 512 buffers for RX will do the job - again, increasing these to a max of 4096/8148 has no effect

And the weird thing is that between the 10 GbE and 40 GbE nodes, I get 9.48 Gb/s in both directions. Even with conservative buffer sizes and TCP Window scaling set to 8 instead of 14.

It would seem that for 2.5GbE and 5 GbE speeds, the TCP stack or the OS don't scale the window properly ...

meyergru · Aug 3, 2020

I am referring to the TCP window size that is effectively being negotiated on the line as can be seen in a Wireshark trace:

870495 192.168.10.97 192.168.10.3 TCP 1514 4399 → 18767 [ACK] Seq=1100727273 Ack=1 Win=131072 Len=1448 TSval=18647398 TSecr=1972253487 18767 4399
870496 192.168.10.97 192.168.10.3 TCP 1514 4399 → 18767 [ACK] Seq=1100728721 Ack=1 Win=131072 Len=1448 TSval=18647398 TSecr=1972253487 18767 4399
870497 192.168.10.97 192.168.10.3 TCP 1514 4399 → 18767 [ACK] Seq=1100730169 Ack=1 Win=131072 Len=1448 TSval=18647398 TSecr=1972253487 18767 4399

As you mention, the TCP stacks negotiate that size to the maximum applicable. So-to-speak, this measures the effective buffering capacity that is used in the speed reduction part (i.e. your switch) - iff that is the only limiting factor.

The buffer sizes in the network adapters are not that important, because they (hopefully) immediately can get rid of the packets they receive at any speed as there is no speed translation. Exceptions to that would be a slow sender or receiver end, maybe because of interrupt latencies, thus enlarging these hardware parameters can still be beneficial.

If you do not see the speed reduction for other speed mixes than 10/5, that might indicate either a larger buffer allocated to the 40 Gbe ports or that the inability of the switch to actually differentiate between 5 and 10 Gbe (but can for 10 and 40 Gbe) - this is something that my switch cannot do. If it could, it could pause the sender by flow-of-control, so maybe this would help to not drop packets in the first place.

As I said, it boils down to these possibilities IMHO:

1. Hardware buffer size of the switch too small
2. Limiting factors of TCP implementations between Windows and Linux
3. Unawareness of switch of the speed translation (and thus packet drops they induce)

madbrain · Aug 3, 2020

meyergru said:
2. Limiting factors of TCP implementations between Windows and Linux

I can rule that out, at least in my case. I did Linux to Linux testing over the weekend also. It helps to have hotswap 2.5 in SATA drive bays attached to bootable controllers on almost all my systems.

When the Linux client NIC is forced down to 2.5 Gbps or 5 Gbps, and the server is at 10 Gbps, the same problem occurs, where the net result is TCP retries and bandwidth much less than expected, sometimes less than 1 Gbps. This is the case even when using -b 2.5G in iperf. The problem is definitely not specific to Windows <-> Linux .

hmw · Aug 4, 2020

@meyergru - I looked at some wireshark captures and I can see the window size being limited to 212680 (scaling factor 4). This is despite changing parameters on both ends ...

meyergru · Aug 4, 2020

Exactly as expected, so the buffer is not much larger than on my switch.

Changing the parameters could only limit TCP window size to below the physical buffer size - for example, if you disable window scaling, the size would be limited to 64K, which should yield an even lower throughput.

I wonder which (cheap) switches have larger buffers... nobody in posession of a Mikrotik CRS328-24P-4S+RM willing to try? It has 512MByte of RAM, but specs state nothing about how much of it is dedicated to packet buffers.

hmw · Aug 4, 2020

meyergru said:
Exactly as expected, so the buffer is not much larger than on my switch.

Changing the parameters could only limit TCP window size to below the physical buffer size - for example, if you disable window scaling, the size would be limited to 64K, which should yield an even lower throughput.

I wonder which (cheap) switches have larger buffers... nobody in posession of a Mikrotik CRS328-24P-4S+RM willing to try? It has 512MByte of RAM, but specs state nothing about how much of it is dedicated to packet buffers.

I don't see any dropped or retransmitted packets on the switch - I don't think it's the switch at all.

hmw · Aug 4, 2020

meyergru said:
Exactly as expected, so the buffer is not much larger than on my switch.

Changing the parameters could only limit TCP window size to below the physical buffer size - for example, if you disable window scaling, the size would be limited to 64K, which should yield an even lower throughput.

I wonder which (cheap) switches have larger buffers... nobody in posession of a Mikrotik CRS328-24P-4S+RM willing to try? It has 512MByte of RAM, but specs state nothing about how much of it is dedicated to packet buffers.

Switches usually have 1MB to 4MB dedicated to packet buffers, not more. The RAM used for buffers is very different from your normal RAM, i.e. it's usually SRAM. The ICX above has 8MB. Again - most switches try to operate in cut and forward, not store and forward - hence large buffers don't make sense - they would be wasted. Not to mention that the ICX6610 is perfectly capable of handling 10 GbE -> 1 GbE or even 40 GbE -> 1 GbE

meyergru · Aug 4, 2020

As for 10/1, sure thing as >128K will be enough for that. What was your observed window size for 40/10? I suspect it to be larger than that for 10/5.

Also again: you cannot cut & forward while reducing speeds - in that situation, the switch cannot get rid of the packets at their receive rate, so it has to store & forward. That is the whole point: Yes, I know that under normal (i.e. cut through) operation, small buffers are sufficient. Alas, that is not the case when reducing speeds.

So, if all switches are limited to small buffers (and more so if they distribute that size evenly over all ports), this problem would occur with all of them. The more interesting is why this does not occur for your 40/10 case. Can you measure the window size there?

meyergru · Aug 5, 2020

Oh, I got the answer from Mikrotik:

MikroTik CRS328 switches are similar to the mentioned Cisco switch, they have independent 1.5MByte switch-chip packet buffer which is not related to RAM. Performance between different speed interfaces is also similar.

meyergru · Dec 14, 2020

In the meantime, I solved that problem another way: I just exchanged my wall boxes with CAT.6 ones - and voila, speed changed to 10 Gbit/s immediately. Thus, the speed mixing problem does not occur any more.

Mixing 5GbE and 10GbE speeds - slower than expected

Well-Known Member

New Member

Active Member

Well-Known Member

Active Member

New Member

Well-Known Member

Active Member

Well-Known Member

New Member

Well-Known Member

New Member

Active Member

Well-Known Member

New Member

Well-Known Member

Well-Known Member

New Member

New Member

New Member