>=25G network not performing well with Windows 11

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Chriggel

Active Member
Mar 30, 2024
157
85
28
Sure, it's almost standard. The only thing I've changed is the RTO, which was set to 1000 but I've seen that actually 3000 should be standard, once again I have no explanation why out of the box installations have these differences. Changing it to 3000 had no effect so far, but I also didn't expect it to have one.

Globale TCP-Parameter
----------------------------------------------
Zustand der empfangsseitigen Skalierung : enabled
Autom. Abstimmungsgrad Empfangsfenster : normal
Add-On "Überlastungssteuerungsanbieter" : default
ECN-Funktion : disabled
RFC 1323-Zeitstempel : allowed
RTO (anfänglich) : 3000
Zustand der Empfangssegmentzusammenfügung : enabled
Nicht-SACK-RTT-Widerstandsfähigkeit : disabled
Maximale SYN-Neuübertragungen : 4
Fast Open : enabled
Fast Open-Fallback : enabled
HyStart : enabled
Proportionale Ratenreduzierung : enabled
Schrittsteuerungsprofil : off
** Die oben angezeigte autotuninglevel-Einstellung ist das Ergebnis einer Gruppenrichtlinie, die alle lokalen
Einstellungen außer Kraft setzt.

It's German, the second entry is the auto tuning and the ** at the bottom says that the displayed auto tuning setting is the result of a group policy which overrides all local changes. That's why I can't change or disable it.
 

DAVe3283

New Member
Apr 4, 2024
2
2
3
Boise, ID
The only difference I see with mine is the Initial RTO (mine is the default 1000) and RFC 1323 Timestamps are disabled for me (this appears to be the default on Windows). I wouldn't expect RFC 1323 Timestamps to be the problem, as from what I read, if anything, that should make my setup slower.
Code:
PS C:\Users\DAVe3283\Apps\Networking\iperf3> netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State          : enabled
Receive Window Auto-Tuning Level    : normal
Add-On Congestion Control Provider  : default
ECN Capability                      : disabled
RFC 1323 Timestamps                 : disabled
Initial RTO                         : 1000
Receive Segment Coalescing State    : enabled
Non Sack Rtt Resiliency             : disabled
Max SYN Retransmissions             : 4
Fast Open                           : enabled
Fast Open Fallback                  : enabled
HyStart                             : enabled
Proportional Rate Reduction         : enabled
Pacing Profile                      : off

PS C:\Users\DAVe3283\Apps\Networking\iperf3> .\iperf3.exe -c ThreadReaper -P 8 -M 8960
Connecting to host ThreadReaper, port 5201
<snip>
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[SUM]   0.00-10.00  sec  29.9 GBytes  25.7 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  29.9 GBytes  25.7 Gbits/sec                  receiver

iperf Done.

The really strange part to me is you have group policy applied despite not being domain joined. Are you logging on using a Microsoft account? I think that can apply some nonsense on Win11. So when you test Win11 Enterprise, make sure you create a local account so no unexpected GPOs get applied.
 
  • Like
Reactions: Chriggel

jolly

New Member
The really strange part to me is you have group policy applied despite not being domain joined.
This isnt that surprising, you can happily apply group policy settings to a non domain machine. It is surprising that its happening without you doing it.

Check Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies and Computer\HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies
 
  • Like
Reactions: Chriggel

Chriggel

Active Member
Mar 30, 2024
157
85
28
Just wanted to let all of you know that I appreciate all the input of everyone who tried to help, but I've given up. I've decided to not waste any more time with this mess. I'll either use dual boot or maybe get a second machine and will use Linux for my main desktop.

So, even though it appears to be working for many people, I know that I'm not alone with this problem. If anyone should find this in the future, sadly I can't offer you a solution. I was never able to make any sense of this mess. I don't easily give up, but I've given up this time. Sorry.
 

MountainBofh

Beating my users into submission
Mar 9, 2024
393
287
63
I have some further info regarding Windows 11 and 10gb networking. Might shed some further light on the issues OP ran into.

I was working on one of my older test motherboards tonight, an Asrock Z170 Extreme4. Its running a I7-6700K, and 16 gigs of ram. I was testing to see if the Intel X520 , X710 and E810 cards would behave performance wise if you only give it 4x pci-e lanes. Long story short, they did fine with 4x lanes in the first tested motherboard. I was using Iperf3 3.16 for all my tests.

Initially I was using Debian 12 to run all my tests. After I finished up in Debian, I decided to try Windows 10 LTSC 21H2, Windows server 2022, and Windows 11 23H2 pro to see how the various Windows OS's compared. I had previously tested the cards on my more powerful test machine (a I7-9700K on a Z370 motherboard), but the Z370 box has a different pci-e configuration and I wanted to make sure I was only giving the cards 4x.

Windows 10 performed as expected, giving a solid 9.5gb in Iperf with all the cards. Server 2022 also behaved the same. Windows 11 though......

The results with 11 were all over the place, and no card, even the vaunted Mellanox 4x, was able to give consistent network transfers. I even moved the cards to the 16x lane slot to see if that would help, and shockingly it made things even worse!

I wondered if this was some sort of bug with this old board, so I dug out another old test board an Asrock Z170 pro4. The pro4 was even worse! It could barely hold 5gb, regardless of what card I tried, or what slot I plugged the card into, or even what OS was tested.

Conclusion #1 - I think the server grade cards we're all using really can get PICKY about all sorts of things. I don't think the vendors really test or design for consumer grade systems. Different motherboards, different OS's, different pci-e slots, etc. Change one variable, and you can get totally different results.

Conclusion #2 - Windows 11 was the most temperamental OS in terms of network performance. On my Z370 it performs fine, but not on the Z170 boards (either one). Considering how M$ has treated 11 as a non stop beta I'm not terribly surprised though.

Conclusion #3 - The Mellanox 4x's seem to be the most "stable" in terms of network performance. But even they're not immune to buggy motherboards (i.e my Asrock pro 4 ), or certain combinations of things (i.e. Win11 + the Extreme4 Z170).

Conclusion #4 - Keep a Linux box with a modern distro handy(or at least a flash drive with a live OS). It makes ruling out weirdness in Windows kernel or drivers, or the Windows iperf3 client a lot easier.

Conclusion #5 - consumer motherboards has unknown bugs or quirks in them that can cause issues. I never would of expected my pro 4 test board to act the way it did, yet the extreme4 board behaved just fine.
 

i386

Well-Known Member
Mar 18, 2016
4,625
1,757
113
36
Germany
I don't think the vendors really test or design for consumer grade systems.
I would "invert" that statement and say the consumer hardware manufacturers don't test their systems enough...
(My most recent experience was an asus board that wouldn't boot with a >x4 pice device (hba or 25GBE nic) in a x8 slot and the gpu in the x16 slot. This was later fixed with a bios update)
 
  • Like
Reactions: blunden

MountainBofh

Beating my users into submission
Mar 9, 2024
393
287
63
Agreed that the consumer vendors do minimal testing on their stuff, and let the end users find all the bugs for them. You just hope that a bios upgrade will fix it later.

Sometimes even the "big" boys don't test very well. I got a new Supermicro server (based on a H13SAE-MF board) a few weeks ago at work that will not boot with either an E810 or Mellanox 4 card. But its perfectly happy with a X710 or X520. I need to send them a nasty gram and hopefully they can address it with a bios update.
 

Chriggel

Active Member
Mar 30, 2024
157
85
28
I also verified the problems with W11 and Server 2022 on a dual 3647 workstation platform and it was the same thing. I understand the testing argument, but I'm not sure how a potential bug would cause this or at least contribute to the problem

Meanwhile, I've noticed more weird behaviour with W11. It's a mess and broken in many ways.
 

mattlach

Active Member
Aug 1, 2014
407
170
43
I've done a lot of experimenting with fast networking over the years, and I have come to the conclusion that it rarely works the way you'd like.

With NVMe drives, PCIe, RAM and CPU being very fast these days, you'd expect the network interface to be the bottleneck, but that is rarely the case.

The thing is, I couldn't tell you what actually is the bottleneck either.

Connections faster than 10Gbit (25Gbit, 40Gbit, 100Gbit, etc.) usually work pretty well if you saturate them with large numbers of requests from many clients (like one would in a server application) but single connection results (like transferring a file from a client on an otherwise idle connection) seem to top out somewhere between 1.2 and 1.8 GB/s no matter what you do, and even getting that level of performance can be tricky sometimes.

These speeds are obviously far below the available PCIe bandwidth, RAM bandwidth, local NVMe drive bandwidth, and looking at CPU load, it is rarely pinning a single core or anything like that during file transfer operations, but something is holding it back.

The thing is, the same was true with 10Gbit not that many years ago. Impossible to max out with a single connection, for no apparent reason.

I get the impression it lies in software. Some strange behavior resultant from thread locks, wait states or something like that. As newer faster network interfaces become more mainstream in clients (like 10Gbit now has) these weird software inefficiencies quietly get cleaned up by developers and we can take advantage of the full bandwidth or close to the full bandwidth with a single connection, but if you do anything even slightly more exotic than that, you can't. At least that is my impression.

I've been using 10Gbit networking at home since 2014 when this was highly exotic, and required buying decomissioned fiber adapters from server pulls on eBay. In the beginning the experience was pretty so-so, but over time it got better to the point where I for the last few years have had no problem maxing them out, and transferring a file from my local NVMe drive to my NAS at up to 1.2GB/s.

Encouraged by this I thought it might be time for an upgrade. When I saw some used Intel 40Gbit QSFP+ adapters pop up on eBay at a price too good to pass up, I went for it.

At 40Gbit I see a small improvement (I can get to 1.6 - 1.8GB/s and rarely up to 2GB/s depending on what I am transferring) but that's about as high it gets. Most of the time I don't even get that.

This seems to be a pretty universal experience for anyone who tries exotic fast networking designed for servers on clients. At least for me it hasnt mattered if it is in Linux or Windows. I guess we just have to wait for faster networking to become available in the mainstream before we see real single link performance improvements. Until then, exotic fast networking products only make sense in servers that see large numbers of concurrent connections. Which - after all - is what they were designed for, so it kind of makes sense.
 
  • Like
Reactions: MountainBofh

Tech Junky

Active Member
Oct 26, 2023
711
240
43
At 40Gbit I see a small improvement (I can get to 1.6 - 1.8GB/s and rarely up to 2GB/s depending on what I am transferring) but that's about as high it gets. Most of the time I don't even get that.
This brings to mind my TB testing.

Certain drives perform better than others and then it comes down to the OS and whether you're moving data with a single stream or split it into multiple streams. It's kind of like bulk transferring things with filezilla for example. You have the option to use a single connection or up to 10 at the same time. The other issue is how Windows handles threading with "windowing".

I can do a TB P2P setup and get a link state of 20gbps between machines and typically max out at 1.5GB/s which isn't bad considering. Now, if TB didn't reserve bandwidth for video and unlocked the full 40gbps for data use it would be better. It's got me waiting for TB5 to see if it changes since it's supposed to be able to handle 80gbps data or up to 120gbps for video. Unlocking 80gbps would be significant in terms of cheap networking or in my case a laptop that doesn't have expansion options. Laptop 10GE means spending ~$150 for a dongle and AFAIK there's nothing faster at this point that can be added on w/o some EGPU case and slotting a card inside which still would get bottlenecked by the TB protocol and not very convenient to use portably.