>=25G network not performing well with Windows 11

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Chriggel

Member
Mar 30, 2024
75
34
18
I've spent a lot of time in the last 2-3 weeks to troubleshoot this, this will be my last attempt. I've already tried several things, but nothing worked (in fact, it only got worse). If you got a reliable 25G or faster network connection working on Windows 11, please let me know how you did it.

I appreciate everyone reading this and giving any input that might point me in the right direction, this could get lengthy:

I've been running a 10G network since 2011 or so and it always worked well with W7 and W10. I decided it's time to upgrade and so the first thing I did was getting a Mikrotik CRS510-8XS-2XQ-IN. which served as a drop-in replacement of my existing 10G switch and that worked well. It was running for a couple of months when I decided it's time to upgrade my desktop and with that also upgrade to 25G or 40G networking.

Different than in years prior, I opted for my first consumer hardware system in over a decade, because while getting a new Threadripper 7000 system was tempting, it would also be expensive and fairly wasteful. I chose an Asus TUF X670E-Plus and a Ryzen 9 7950X3D instead, knowing that with the GPU in place, the amount of remaining PCIe lanes will be (very) limited.

I wanted to run Linux on it, but reluctantly went for Windows 11 Pro instead because it made things easier at that moment (at least at first...). The slot for the NIC is a PCIe 4.0 x4 slot connected to one of the PROM21 chips of the X670. I still don't know which one because Asus refused to give me a block diagram and I couldn't be bothered to go through the PCIe topology myself yet.

The first NIC I tried was a Mellanox ConnectX-3 40G, connected to the switch via a DAC. It's a HPE 10/40 card flashed with a original 40/56 firmware and set to ethernet mode. Because this is an older generation, I knew it would not be able to reach 40G in a x4 slot, because PCIe 3.0 x4 isn't 40 Gbit/s, but that's ok. It didn't even get close, managing to transmit barely 8 Gbit/s and receive no more than 4 Gbit/s. I used iperf3, which I understand is a bit wonky on Windows, but real world speeds matched with its results.

Even though I thought it was not likely to be the problem, I also tried a direct connection without the switch to the TrueNAS on the other end, which was also equipped with an identical ConnectX-3, and also tried the NIC in the x16 slot of the desktop, but neither gave me any improvements. For a very short time, I managed to reach ~15 Gbit/s, which until today I was never able to reproduce on this machine. I learned that the driver for these cards that ships with Windows 11 (I was surprised that there even comes one with it) is supposedly not the best, so I installed the latest WinOF for Windows clients as well as the one for Windows servers and both didn't change anything. I also learned that the Mellanox cards aren't necessarily the first choice for TrueNAS (Core, BSD-based) because of the BSD driver, but testing against Linux-based TrueNAS Scale gave the same results.

I also went through the driver settings on the Windows 11 desktop, Interrupt Moderation is still on by default and I have no idea why (this already severely limited my network throughput on my first 10G connections way back in the day), so I turned it off. I also increased the Rx and Tx buffers. No difference.

At this point, I was stuck. The performance was worse than previously with the 10G network and I couldn't really narrow down where the problem is. However, I have another TrueNAS Core installed on a different machine with a 10G Intel NIC. It and the other TrueNAS with the ConnectX-3 NIC were able to transfer data between them at 10G while my Windows desktop wasn't even getting close to 10G to either of these machines. I started to suspect that there was nothing wrong with either of the TrueNAS servers and their drivers and NICs.

Next thing I did was trying an Intel XL710-QDA1 in my desktop, which made it worse. Instead of 8/4 Gbit/s, it was now more like 6-7 Gbit/s transmit and 2-3 Gbit/s receive. I changed the DAC for optical transceivers and that changed nothing.

Then I set up another Windows 11 Pro to test with, on a Asus WS C621E with two CPUs, so plenty of PCIe there and I gave it one of the ConnectX-3, a ConnectX-4 Lx (HPE 640SFP28 flashed with the latest original Mellanox firmware) and a Intel XL710-QDA1. Nothing, they all performed equally bad and on the same level as the NICs in my desktop. For the 40G cards, again I tried both DAC and optical transceivers and it made no difference again. For the 25G ConnectX-4 Lx, I tried an AOC and optical transceivers, also no difference. At this point, I ran all these tests against the TrueNAS Core which previously had the 10G Intel NIC, now equipped with one of the Intel XL710-QDA1. Since I had nothing to lose, I also tried Windows 11 Pro for Workstations and Windows Server 2022 on that test machine, both performed equally bad.

My desktop then got an Intel E810-XXVDA2 25G NIC, this is the first PCIe 4.0 NIC of the ones mentioned, but that also made no difference.

At that point I was out of hardware to test. I booted a Linux (Mint 21) from USB on my desktop as well as the test machine. The test machine got 23.4 Gbit/s using the ConnectX-4 Lx 25G and my desktop got 23.5 Gbit/s using the E810-XXVDA2, both against the TrueNAS Core with the Intel XL710-QDA1. I didn't test real world performance, but I'm fairly confident that 20+ Gbit/s would be possible.

I'm now also confident in saying that it's Windows 11 which is the problem here. Which brings me back to the start of this post: What's necessary to make Windows 11 do 25G networking? I've seen the article on STH from January 2023 about getting E810 100G NICs to work on Windows 11, which the driver doesn't support officially. After installing it unofficially, is 100G even possible with Windows 11?

I'm open for almost any suggestions. I'm close to building a 2nd desktop to run Windows on, so that I can run Linux on this one. The Windows desktop could then also get the GPU, which gives me fast networking on Linux and a free x16 slot for some NVMe storage. Not the worst idea ever.

Also, don't be gentle. If I'm the fool who went through all this and completely missed the obvious, let me know.
 

Tech Junky

Active Member
Oct 26, 2023
371
125
43
Well, you already figured it out. It's a microsoft problem.

Something that comes to mind is tweaking the TCP windowing and/or MTU sizes.

Unfortunately I can't speak to 25gbps as my laptop would never hit that w/o some Frankenstein type of rigging. However, I can break the barrier with TB between two machines and hit 1.5GB/s moving data back and forth. So, is it possible? Yes. Is it likely? Maybe.
 

i386

Well-Known Member
Mar 18, 2016
4,251
1,548
113
34
Germany
If you got a reliable 25G or faster network connection working on Windows 11, please let me know how you did it.
I put the cx-4 into a pcie slot and installed the latest driver
I used iperf3, which I understand is a bit wonky on Windows, but real world speeds matched with its results.
The windows version sucks because it's a tool optimized for linux apis/kernel, using a compatibility layer from a third party that aims for support and not necessarily for performance (cygwin), provided by third parties that might use an old code base (usually iperf.fr), measuring a metric that doesn't represent "desktop|workstation" use cases
What's necessary to make Windows 11 do 25G networking?
I didn't change any settings in mellanox drivers or windows.
The slot for the NIC is a PCIe 4.0 x4 slot connected to one of the PROM21 chips of the X670.
Do you have a bunch of usb, sata and other devices connected to ports on the backside of the mainboard?
 

Chriggel

Member
Mar 30, 2024
75
34
18
Well, you already figured it out. It's a microsoft problem.

Something that comes to mind is tweaking the TCP windowing and/or MTU sizes.

Unfortunately I can't speak to 25gbps as my laptop would never hit that w/o some Frankenstein type of rigging. However, I can break the barrier with TB between two machines and hit 1.5GB/s moving data back and forth. So, is it possible? Yes. Is it likely? Maybe.
Windows 11 is also supposed to get USB4 support for 80 Gbit/s and it can also handle such data transfers on local drives, for example, so I have no doubt that it can do it in general.

I won't change the MTU size, but I found some interesting information about TCP windows. I'll look at those in detail later and see if I can apply some of that. Thanks!

I put the cx-4 into a pcie slot and installed the latest driver

[...]

I didn't change any settings in mellanox drivers or windows.
And that enabled you to actually transfer 25G over the network? With interrupt moderation still enabled by default, I have never been able to achieve meaningful network speeds, so it's at least this that needs to be changed before there's even a faint hope of getting meaningful network speeds. With the default settings, I was able to get transfer speeds of around 400-500 Mbit/s.

The windows version sucks because it's a tool optimized for linux apis/kernel, using a compatibility layer from a third party that aims for support and not necessarily for performance (cygwin), provided by third parties that might use an old code base (usually iperf.fr), measuring a metric that doesn't represent "desktop|workstation" use cases
I know, but it's convenient and comes with TrueNAS. Getting ntttcp to work would be much more complicated, besides that I don't have much hope for the Linux, let alone the BSD version of ntttcp. And the speeds iperf3 reported matched with my real world speeds I got while sending files over the network, so for the moment I accept the results as accurate enough. I do have this in mind though and will always double check its results, especially if they seem weird.

Do you have a bunch of usb, sata and other devices connected to ports on the backside of the mainboard?
I have all four SATA ports populated with SSDs, but there was no traffic during the time of my tests. As for USB, only keyboard and mouse and an audio device, so not many at all and especially no resource intensive ones.
 

i386

Well-Known Member
Mar 18, 2016
4,251
1,548
113
34
Germany
And that enabled you to actually transfer 25G over the network?
On a 40GBE link (and 4 threads) I get 3.5+ GByte/s in ntttcp.
In real world file transfer I reach about 2.1GByte/s via robocopy and 16 threads (limited by hdd raid 6 performance)
 
  • Like
Reactions: Chriggel

Stereodude

Active Member
Feb 21, 2016
468
95
28
USA
While not 25gig, I used Connectx3 10gig SFP+ cards in Windows 10 through a Mikrotik 8 port SFP+ switch to another Windows 10 system with a Connectx3 SFP+ and could get the full 10gig with real world SMB transfers of >1GB/sec using Windows Explorer. Didn't do any tweaking or tuning.
 
  • Like
Reactions: Chriggel

MountainBofh

Active Member
Mar 9, 2024
142
115
43
I'm only running 10gb ethernet myself, but between my Debian 12 server, and my various Windows boxes, I was getting all sorts of interesting and inconsistent file copy speeds (using a ram drive as a test). I finally settled on Mellanox 4 cards for all my boxes, and now I get solid 1GB/sec

I did notice that my test bench box, when using a X520 nic, seemed to give much more reliable (and faster) copy performance in Server 2022 vs Win 10 LTSC. Newer drivers, firmware,etc did not seem to make a difference.
 
  • Like
Reactions: Chriggel

Chriggel

Member
Mar 30, 2024
75
34
18
Ok guys, I think I got almost to the bottom of this, after reading about how TCP works in detail for a few days.

First, I found a really interesting discussion about how bad several network settings in Windows are by default, this is in part even acknowledged by a Microsoft employee. Applying these settings didn't solve my problem though.

And after a bit of back and forth, going through the documentation of TCP settings and so on, I've used this: TCP Throughput Calculator - Tools - Network Portal - SWITCH
This is just a handy implementation of the formulas that are also mentioned and used elsewhere. According to these formulas, you need a TCP window size of about 3MiB for a 25 Gbit/s connection with a RTT of 1 ms.

Windows 11 uses a TCP window size of 64kB and supports window scaling, which should dynamically increase the size to whatever is needed during the initial TCP handshake. It uses a feature called AutoTuning which can be set to several different levels. According to the documentation, the level "Normal", which is the default, is good for a scaling factor of 8, resulting in 64kiB x 2^8 = 16MiB. So this should easily be enough.

But of course, it doesn't do that. Here's the essentials of the TCP handshake when establishing a connection to a TrueNAS Scale server with a 40 Gbit/s connection:

SYN:
Window: 65535
[Calculated window size: 65535]
TCP Option - Window scale: 2 (multiply by 4)

SYN ACK:
Window: 64240
[Calculated window size: 64240]
TCP Option - Window scale: 7 (multiply by 128)

ACK:
Window: 53248
[Calculated window size: 212992]
[Window size scaling factor: 4]

Note that the connection is established by my Windows machine, so if I understand this correctly, it asks for a window size of 65535 x4 = 262.144 bytes, then TrueNAS asks for 64240 x128 = 8.222.720 bytes and then they agree on 53248 x4 = 212.992 bytes.

Bug? Feature? I don't know. According to the documentation and my understanding of said documentation, it shouldn't do that. The default values should allow Windows to match the suggestion of TrueNAS, as this is still under the upper limit of the default AutoTuning level. I think I've read somewhere that the window scaling is set once during the TCP handshake, so once this is set, this is it for this connection. In all subsequent packages, it says factor 4 in the ones coming from Windows and factor 128 in the ones coming from TrueNAS.

So, any ideas? There is one AutoTuning level beyond Normal, and that is Experimental which allows the window size to grow to 1.073.725.440 bytes, which is 1GiB. But besides the fact that this shouldn't be necessary, I can't do it anyway, because changing this value is restricted by a Group Policy and I couldn't find out which one yet.
 

Tech Junky

Active Member
Oct 26, 2023
371
125
43
It's all lies. Might work for servers but, clients are doubtful. There might be an override in the nic driver that forces higher speeds. Dive into the nic advanced options and see if anything stands out.

I know with some things like wifi there's some options off by default but when enabled unleash the full speed.
 
  • Like
Reactions: Chriggel

Chriggel

Member
Mar 30, 2024
75
34
18
Sadly, this isn't related to the NIC. That's why all NICs and all settings and all drivers produced the same results. It's a problem with the TCP stack, so basically a kernel issue. And obviously Microsoft knows about this, they've implemented TCP window scaling, there's an RFC for that, but it's not working how it's supposed to, at least not for me. When researching this problem, I found some people with similar problems. And then there are people who say it's not a problem and works out of the box. I guess that means that for some people the TCP window scaling works as intended and it doesn't for others.
 

753951

New Member
Apr 3, 2024
1
1
3
iPerf for Windows downloadable from iperf.fr really do not work well with Windows and they are old. Like 8 years old. I find ones available at files.budman.pw are much better and give full bandwidth Windows are available of.

For example iperf.fr ones give me maximum of 1.6Gb/s without tweaking number of threads, etc., while the newer ones give full 9.54Gb/s out of the box with same NIC/driver/OS combo. Intel x550 or x710 and Windows 11 and Server 2022. Sorry, don't have 25 or 40Gb/s hardware to test.
 
  • Like
Reactions: Chriggel

Chriggel

Member
Mar 30, 2024
75
34
18
It's not a problem with iperf3. I understand that it's not well supported on Windows. This isn't the problem. The real transfer speeds match exactly with the speeds iperf3 reported, so it's accurate in this case. The problem is that Windows decides to use a 200kB TCP window size which can't work with a 25G local connection with 1ms RTT.

And in the other direction, receiving data from the server, it's even worse, because I just noticed that all the TCP packages use window sizes of 501 and 1026 bytes. The scaling factor is -1, unknown.
 

MountainBofh

Active Member
Mar 9, 2024
142
115
43
OP, I did the following test, hopefully it'll give you some data points to work with.

Linux box hosting samba - Debian 12.5, Xeon 2680-V4, 192GB ram. Network card is a Mellanox 4 lx. RX and TX buffers set to 2048. No other tweaks applied to the machine. 64GB ram drive was created for this test.

Test box running Windows 11 pro 23H2 is a I7-9700K with 16GB ram. All Windows updates applied, and newest drivers for network card installed. SSD is a WD black SN750. Network card is a Mellanox 4 lx. All other settings were default.

Switch - el cheapo Horaco 8 port SFP+ ( this is a realtek 9303-CG chipset I believe). All machines linked using various cheap SFP+ transceivers and OM3 fiber. I don't have the ability to easily test speeds beyond 10gb at this time, sorry (don't have any SFP28 transceivers at the moment).

Software and test file used. Iperf3 3.16 on both machines, and the Windows version is the one from Home • Directory Lister My test file for Samba performance testing is a 50GB Blu-ray ISO file.

I started off with having my Linux box running iperf3 is server mode. Fired up the Windows client, and got a consistent 9.5gb transfer rate (which is what I've seen across my network on many different OS's and network cards).

I then copied the ISO test file from my main Windows box (which is running Win10 LTSC) to the test box. Copy speed was about 800-900MB per sec.

Then I copied the ISO file from the Windows 11 testbox to the ram drive on my Linux box. Speed was a very consistent 1.05GB/sec.


I agree that the Windows network stack generally sucks in general, but I'm not sure what's going on in your case. I'm sure its frustrating as hell.
 
Last edited:
  • Like
Reactions: Chriggel

blunden

Active Member
Nov 29, 2019
492
155
43
If the issue really is the auto tuning, you can disable it and hardcode TCP window size values last time I checked. Might be worth exploring just to prove or disprove your findings.
 
  • Like
Reactions: Chriggel

DAVe3283

New Member
Apr 4, 2024
2
2
3
Boise, ID
I almost wonder if the server/Enterprise editions of Windows have different networking defaults. Like @i386, my experience was pretty much plug-and-play. But I am running Windows 10 Enterprise, not Pro.

Desktop: Gigabyte X670E Aorus Master + 7950X + DDR5-5600, Windows 10 Enterprise 21H2, ConnectX-3 Pro w/WinOF driver
Server: ASRock Rack ROMEDE-2T + 7302P + DDR4-3200, Proxmox VE 8.1, ConnectX-3 Pro w/in-box kernel driver
Link: 10 meter direct attach fiber (MC2207310-010)

Initial testing with iperf3 using default settings saw ~20 Gbps. Setting the MTU to 9014 got that up to ~26Gbps, which is the same speed as I see between VMs on the server, so I think that is the line rate of the vmbr (I run firewalls on Proxmox, which does have an impact at these speeds).

I see a very consistent 12 Gbps on iSCSI mounts, but I believe that to be limited by the underlying hard drives serving the block storage or my complete lack of tuning/effort. (I have done literally nothing besides create the iSCSI share in TrueNAS Core and connect Windows to it.)

Given the Gigabyte motherboard I run has a nearly identical PCIe architecture, you should be able to get 25+ Gbps on a ConnectX-3 in PCIe x4 mode, as that is how mine is running.

So my only other suggestion is if you aren't running Windows 11 Enterprise, grab an evaluation copy (Windows 11 Enterprise | Microsoft Evaluation Center) and see if things are better vs. the Pro edition. It wouldn't be the first time Microsoft nerfed the capabilities of lesser editions of Windows for dubious reasons (Memory Limits for Windows and Windows Server Releases - Win32 apps).
 
  • Like
Reactions: Chriggel

Tech Junky

Active Member
Oct 26, 2023
371
125
43
Microsoft nerfed the capabilities
Very true. This is why I mentioned windowing. Consumer versions tend to take into account they won't be running. 100GE connections. Not to mention they've gotten lazy with QA and coding. Enterprises tend to run more Linux servers for raw power because of msft. The only thing that keeps msft around is programs that don't get ported to Linux. The money is in msft based apps because consumers are hooked on GUI. At least apple uses a *nix base.
 

Chriggel

Member
Mar 30, 2024
75
34
18
OP, I did the following test, hopefully it'll give you some data points to work with.

Linux box hosting samba - Debian 12.5, Xeon 2680-V4, 192GB ram. Network card is a Mellanox 4 lx. RX and TX buffers set to 2048. No other tweaks applied to the machine. 64GB ram drive was created for this test.

Test box running Windows 11 pro 23H2 is a I7-9700K with 16GB ram. All Windows updates applied, and newest drivers for network card installed. SSD is a WD black SN750. Network card is a Mellanox 4 lx. All other settings were default.

Switch - el cheapo Horaco 8 port SFP+ ( this is a realtek 9303-CG chipset I believe). All machines linked using various cheap SFP+ transceivers and OM3 fiber. I don't have the ability to easily test speeds beyond 10gb at this time, sorry (don't have any SFP28 transceivers at the moment).

Software and test file used. Iperf3 3.16 on both machines, and the Windows version is the one from Home • Directory Lister My test file for Samba performance testing is a 50GB Blu-ray ISO file.

I started off with having my Linux box running iperf3 is server mode. Fired up the Windows client, and got a consistent 9.5gb transfer rate (which is what I've seen across my network on many different OS's and network cards).

I then copied the ISO test file from my main Windows box (which is running Win10 LTSC) to the test box. Copy speed was about 800-900MB per sec.

Then I copied the ISO file from the Windows 11 testbox to the ram drive on my Linux box. Speed was a very consistent 1.05GB/sec.


I agree that the Windows network stack generally sucks in general, but I'm not sure what's going on in your case. I'm sure its frustrating as hell.
Thank you, I appreciate it! I have to say, I never had much problems with W10 and 10G. But with this new system, I changed to W11 and 25G/40G NICs at the same time. I have no experience with W10 and 25G/40G nor with W11 and 10G. Since this connection is now slower than what my 10G connection used to be, it might be worth trying a 10G NIC and W11.

I've no idea what this is supposed to be either. It's definitely frustrating as f*ck. I've spend weeks trying different things that are the more obvious and/or more likely reasons. Reasons that I've seen or heard of before that could do those things. I got nothing out of it. Then I spent the last couple of days with understanding how the TCP stack in Windows works to finally find out what I think is the problem, only to learn that I can apparently do nothing about it. Or at least I don't now how yet.

If the issue really is the auto tuning, you can disable it and hardcode TCP window size values last time I checked. Might be worth exploring just to prove or disprove your findings.
Honestly, I don't know what to make of it and I'm not really convinced by it either, but at this point, it's the best explanation I have that covers everything I've seen. If you've ruled out everything else, then that what's left must be the reason, no matter how unlikely.

The problem is that the auto tuning, emphasis on "auto", doesn't really accept manual interaction. Yes, I can turn it of. Or at least I could in theory. In practice, I can't at the moment, because it doesn't let me change the auto tuning level because of a group policy. This system isn't part of a domain, and yet it has group policies.

If I could overcome this somehow, then I could disable that. According to the documentation, this fixes the window size to the default 64k and disables all scaling. But I still can't change the window size myself, at this point it's then just locked to 64k. The registry setting TCPWindowSize is legacy and not supported anymore, so setting this doesn't do anything.

If you know of another way to set the window size that works in W11, please let me know!

I almost wonder if the server/Enterprise editions of Windows have different networking defaults. Like @i386, my experience was pretty much plug-and-play. But I am running Windows 10 Enterprise, not Pro.

Desktop: Gigabyte X670E Aorus Master + 7950X + DDR5-5600, Windows 10 Enterprise 21H2, ConnectX-3 Pro w/WinOF driver
Server: ASRock Rack ROMEDE-2T + 7302P + DDR4-3200, Proxmox VE 8.1, ConnectX-3 Pro w/in-box kernel driver
Link: 10 meter direct attach fiber (MC2207310-010)

Initial testing with iperf3 using default settings saw ~20 Gbps. Setting the MTU to 9014 got that up to ~26Gbps, which is the same speed as I see between VMs on the server, so I think that is the line rate of the vmbr (I run firewalls on Proxmox, which does have an impact at these speeds).

I see a very consistent 12 Gbps on iSCSI mounts, but I believe that to be limited by the underlying hard drives serving the block storage or my complete lack of tuning/effort. (I have done literally nothing besides create the iSCSI share in TrueNAS Core and connect Windows to it.)

Given the Gigabyte motherboard I run has a nearly identical PCIe architecture, you should be able to get 25+ Gbps on a ConnectX-3 in PCIe x4 mode, as that is how mine is running.

So my only other suggestion is if you aren't running Windows 11 Enterprise, grab an evaluation copy (Windows 11 Enterprise | Microsoft Evaluation Center) and see if things are better vs. the Pro edition. It wouldn't be the first time Microsoft nerfed the capabilities of lesser editions of Windows for dubious reasons (Memory Limits for Windows and Windows Server Releases - Win32 apps).
Indeed, we run very similar setups here. From my test with a Linux on that system, I know that the Intel E810-XXVDA2 I'm currently using will do 23.5 Gbit/s without zero tuning in that PCIe x4 slot that's connected to one of the PROM21 chips of the X670E.

The ConnectX-3 is PCIe 3.0, so it would be limited by the slot to about 30 Gbit/s. The E810 is a PCIe 4.0 device which, at x4, is good for over 60 Gbit/s, so this would even be enough bandwidth to saturate both ports.

Since I've nothing to lose, I will try W11 Enterprise on the other system that I've used for tests and that showed the same issue with a fresh W11 Pro installation. However, it had the same problem with a Windows Server 2022 installation and IF they somehow nerf their operating systems for the plebs, you'd think that they don't do it with the server version that's much more likely to need solid network performance.

Very true. This is why I mentioned windowing. Consumer versions tend to take into account they won't be running. 100GE connections. Not to mention they've gotten lazy with QA and coding. Enterprises tend to run more Linux servers for raw power because of msft. The only thing that keeps msft around is programs that don't get ported to Linux. The money is in msft based apps because consumers are hooked on GUI. At least apple uses a *nix base.
The really weird thing is that with the default settings of TCP window sizes of up to 16MB (which should even be the same for W11 Home, for example), they could do 100GbE. No problem. But it's not working as it should.

And as if that's not enough, the REALLY disturbing thing is that two out of the box installations don't behave in the same way for different people. Some people report that it's not a problem, others report the same or a similar bandwidth limit like I have. That's not a coincidence. According to these other forum posts, they never got it to work (or didn't report on their success). Limited to the results that you can find with Google, the idea of the TCP window size being relevant for fast network isn't new, but I couldn't find any other post that brings this in context with W11 to explain a mysterious hard cap of ~7 Gbit/s

At this point I've already decided that I'm prepared to move this W11 installation over to another system, that I yet have to buy the parts for, and get new SSDs for this one and run Linux on it, most likely Mint, and do all the network intensive stuff on that system while the W11 can then live with a 1G connection.

Since I'm already so far into this, I can still try a few more things, like the W11 Enterprise, before I do that and effectively give up.

Once again, I appreciate all your comments and attempts to help me out of this sh*thole!
 

Scott Laird

Active Member
Aug 30, 2014
317
148
43
Why do you have a 1ms RTT? That's in the gap between what I'd expect for a LAN, while being very close for a WAN. On my 10G fiber at home, 8.8.8.8 is around 1.5 ms away, while local systems are more like 200-400us, depending on the stack. The bandwidth-delay product for a 200us link is obviously a lot better than a 1ms link.

I have a CX6 here (2x100), but something weird is up with the 100G path between it and my file server. Dirt in the fiber? Haven't had time to debug and/or clean. I could probably drop 40G onto a nearby switch, though, without much work. This is with Win 11 Workstation, which is between Pro and Enterprise, and a relatively simple upgrade from Pro. No clue if it changes anything relevant.
 

Tech Junky

Active Member
Oct 26, 2023
371
125
43
Since I'm already so far into this, I can still try a few more things, like the W11 Enterprise, before I do that and effectively give up.
I mean if you really need Windows for something you could yield the speed and just put W11 in Virtual Box and only fire it up as needed.

You're not alone in fighting with the Microsoft TCP stack issues though as people have been fighting with it for decades now. I've been hacking my Linux box the past few days after some glitch trying to upgrade some piece of software that wreaked havoc on other things in collateral damage and finding some new things I wasn't aware of that impact other things. When you start doing a post mortem and digging under the hood you can find some really intriguing things related to "issues" you see from time to time. About the only thing left for me to figure out at the moment is how to smooth out QSV video conversions again. Right now they're jolting between frames when there's movement between scenes whereas before there was just normal playback. It's as if the program is removing the filler frames when converting from TS > MKV.

But, there's this "modemmanager" service that runs to make the 5G modem work that relies on "polkit" to bring up the interface but spit out an error saying the modem wasn't picked up by the "bus" and then mm was failing for a dependency issue which usually means some tidbit of software was missing. Well, it wasn't really a piece missing it was the polkit missing which isn't related to the mm / 5G but more of a security mechanism. It never seem to be straight forward when figuring some of these things out though. The good part is you're not insane and it's not a HW issue but a MS issue.
 
  • Like
Reactions: Chriggel

Chriggel

Member
Mar 30, 2024
75
34
18
Why do you have a 1ms RTT? That's in the gap between what I'd expect for a LAN, while being very close for a WAN. On my 10G fiber at home, 8.8.8.8 is around 1.5 ms away, while local systems are more like 200-400us, depending on the stack. The bandwidth-delay product for a 200us link is obviously a lot better than a 1ms link.

I have a CX6 here (2x100), but something weird is up with the 100G path between it and my file server. Dirt in the fiber? Haven't had time to debug and/or clean. I could probably drop 40G onto a nearby switch, though, without much work. This is with Win 11 Workstation, which is between Pro and Enterprise, and a relatively simple upgrade from Pro. No clue if it changes anything relevant.
To be fair, I should have mentioned that I've used 1ms because I thought at the time that it's the lowest value the calculator accepted, while in reality it's under 1ms. Only a while later I realized that it also takes values under 1ms. Right now, it's a 2 meter LC OM4 cable from my desktop to the switch and then another 2 meter MTP OM4 cable to the server. That's under 1ms for sure, but I don't know how far under 1ms. However, I'm fairly certain that even under ideal circumstances, 200k windows are too small. The 8M the TrueNAS asks for are probably overkill, but then again I think it's more reasonable to go for 8M rather than 200k. These systems know that they have a 40G and 25G connection respectively, so it won't hurt to choose a slightly larger window size. 0.5ms RTT would still mean that 1.5M windows are needed. Going too big will have no meaningful disadvantages as far as I'm aware, but going to small will create the mess I have at the moment.

200k windows at 0.5ms RTT would limit the throughput to ~3.2 Gbit/s which is roughly half of what I'm getting now (when sending data) and about what I get when receiving data. Transmitting data runs stable at around ~750 MB/s real transfer speed, while receiving data is all over the place between 100 and 300 MB/s on a sequential data transfer. The Wireshark capture while receiving data looks like a mess. There might be other issues, but I figured that the 200k windows are my immediate problem and if I can get this fixed, I'd look into the issues with receiving data if they're still there.

I've tested W11 Workstation btw on my test machine, it showed the exact same problem as W11 Pro and WS2022. When I tested this, I didn't know about the potential problem with the TCP window size so I didn't check. But given that they also ran into an apparent hard cap at around 7 Gbit/s, it's probably the exact same problem like my desktop with W11 Pro has.

The good part is you're not insane...
...yet.
 
  • Like
Reactions: Scott Laird