10GB read speeds, not so great write speeds

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

n00bftw

Member
Jul 16, 2020
39
4
8
United Kingdom
I have two systems, each with an Intel x540 T2 10GB NIC:

System 1:

  • X1900 (AMD Threadripper)
  • 32GB RAM
  • RAMDisk (for testing purposes)
  • Windows 10
System 2:

  • I9 7900x (Intel)
  • 32GB RAM
  • RAMDisk (for testing purposes)
  • Windows 10
I have set Jumbo frames along with the rest of the usual things to improve speed.

Threadripper Machine:
Read/Write speeds on the Threadripper machine using Crystal Disk Mark. Tested by mapping a Windows share of a RAMDisk set up on the I9 7900x. Read/Write speeds are perfect.
Threadripper speeds.png

Intel Machine:
Read/Write speeds on the Intel machine using Crystal Disk Mark. Tested by mapping a Windows share of a RAMDisk set up on the x1900. Read speeds are great but write speeds are not.
Intel crystalmark.png


Read/Write speeds should be maxing out here. Previous to getting these Intel NICS, I had two Mellanox Connectx2 SFP cards, I experienced the same slow write speed. I really do not know what the problem is. The issue is definitely not the cards, as concluded above, it’s happened on two separate pieces of hardware.

There is no switch involved, the two machines are directly connected.

Threadripper Machine’s Intel x540 NIC
Threadripper proset.png


Intel Machine - x540 NIC
Intel proset.png


Both cards seem to be running at 8x

Am I right in thinking the problem is on the AMD machine, as that's where it was writing too on the Intel machine test, in regard to the slower write speed, no?

Any suggestions would be great, thanks.

Iperf Test

Threadripper machine was client, Intel machine was the server:
Ran from Threadripper (Intel was the server).png

Intel machine was client, Threadripper machine was the server:
Ran from Intel machine (Threadripper was the server).png
 

Attachments

Last edited:
  • Like
Reactions: William

madbrain

Active Member
Jan 5, 2019
212
44
28
Well, this is a timely message. I have been trying to optimize my network speed for weeks also ...

First, check the power management profile on both Win10 machines. Make sure you are using "high performance" and not "balanced" or "power saver".

Use Hwinfo64 to check the PCIe parameters for each side a second time.

Re: iperf tests
-: I suggest you add -N to all your iperf3 client tests to disable Nagle.
- Also, disabling frequent display can help the tests. Add -i 5 or -i 10 at least. I have noticed this makes a small difference.
- I have attached all the batch files I personally for testing with iperf3 . Had to rename them to .bat.txt since the forum won't allow .bat . Just rename them back to .bat . I run server.bat on each side in the startup folder (type Win+R shell:startup and move a shortcut to server.bat for automatic startup) . Then, run test.bat %hostname%. There are forks of iperf3 out there that support bidirectional mode. Unfortunately, there are no Windows binaries for them. I haven't gotten around to compiling any myself.
- it shouldn't be necessary to use -P 20 to saturate the network. If you are not saturating with 4 streams then there is a bottleneck somewhere. 20 is probably going to just induce more locking and get you diminishing results. I test just -P 1 through 4 . I don't think it's very realistic that any single application that is network bound will utilize more than that, but I could be wrong. You could test -P 1 through 8 with your system having so many threads.

FYI, I can't ever achieve near 10 Gbps speed in iperf3 with a single TCP stream with Aquantia NICs in both directions, with or without switch, with or without jumbo frames. But I do get somewhat faster single stream speeds than you still, even with very old hardware - one box is running with an old FX-8120

If you get a chance, try a different OS, on one or both sides say Ubuntu. Easy for me to do as I have hotswap drive bays and spare drives on almost 3 out of 4 of my machines with 10Gb NICs. Might be harder for you.

I haven't done any testing with a RAM disk, but here is what I'm achieving with a real disk below.
The server is Ubuntu 18.10 with a Z170-AR motherboard and i5 6600k CPU with 32GB RAM . The volume is ZFS with compression on. 6 x 10 TB drive (shucked WD100EMAZ) running as raid-z2, and 90% full.
Client is Win10 with an MSI X99A Raider mobo, i7 5820k CPU, also with 32GB RAM.
Test below is going through a TEG-7080ES switch. No jumbo frames at this point - I disabled them everywhere this week as they were causing issues with IPv6 Path MTU. Jumbo frames in theory should get you that last 4-5% of performance. Of course, reality is different from the theory. Latency can also be an issue depending on application protocol. And jumbo frames do help with that. That's where disabling Nagle in helpful in iperf3 and decent throughput can still be achieved with jumbo frames.

crystal-nas-remote.png
If that makes you feel better, I'm getting even lower writes than you are over the network. I can't run CrystalDiskMark locally on Ubuntu, though (or if I could, it would be with a local Windows VM, but that probably would be bottlenecked somewhere else).
I can boot my Z170 box to Win10 also thanks to hotswap bay. But it won't be able to access the ZFS volume.

Guess it's time for me to install RAM disks on both sides and see what I achieve.
 

Attachments

Last edited:
  • Like
Reactions: William

madbrain

Active Member
Jan 5, 2019
212
44
28
A couple things that I found very unintuitive : enabling EEE (energy efficient ethernet) actually improves performance on a long test, because the cards run cooler. Not sure if you have that option on your NIC drivers.

Aquantia drivers have some "interrupt moderation" tuning. Not sure if your Intel drivers do.

If you have any choice of x8 slots on your motherboard, place the card in the lowest possible slot, since heat rises.

Probably has nothing to do with your issue here, which seems to be OS or filesystem related rather than NIC.
 

n00bftw

Member
Jul 16, 2020
39
4
8
United Kingdom
Well, this is a timely message. I have been trying to optimize my network speed for weeks also ...

First, check the power management profile on both Win10 machines. Make sure you are using "high performance" and not "balanced" or "power saver".

Use Hwinfo64 to check the PCIe parameters for each side a second time.

Re: iperf tests
-: I suggest you add -N to all your iperf3 client tests to disable Nagle.
- Also, disabling frequent display can help the tests. Add -i 5 or -i 10 at least. I have noticed this makes a small difference.
- I have attached all the batch files I personally for testing with iperf3 . Had to rename them to .bat.txt since the forum won't allow .bat . Just rename them back to .bat . I run server.bat on each side in the startup folder (type Win+R shell:startup and move a shortcut to server.bat for automatic startup) . Then, run test.bat %hostname%. There are forks of iperf3 out there that support bidirectional mode. Unfortunately, there are no Windows binaries for them. I haven't gotten around to compiling any myself.
- it shouldn't be necessary to use -P 20 to saturate the network. If you are not saturating with 4 streams then there is a bottleneck somewhere. 20 is probably going to just induce more locking and get you diminishing results. I test just -P 1 through 4 . I don't think it's very realistic that any single application that is network bound will utilize more than that, but I could be wrong. You could test -P 1 through 8 with your system having so many threads.

FYI, I can't ever achieve near 10 Gbps speed in iperf3 with a single TCP stream with Aquantia NICs in both directions, with or without switch, with or without jumbo frames. But I do get somewhat faster single stream speeds than you still, even with very old hardware - one box is running with an old FX-8120

If you get a chance, try a different OS, on one or both sides say Ubuntu. Easy for me to do as I have hotswap drive bays and spare drives on almost 3 out of 4 of my machines with 10Gb NICs. Might be harder for you.

I haven't done any testing with a RAM disk, but here is what I'm achieving with a real disk below.
The server is Ubuntu 18.10 with a Z170-AR motherboard and i5 6600k CPU with 32GB RAM . The volume is ZFS with compression on. 6 x 10 TB drive (shucked WD100EMAZ) running as raid-z2, and 90% full.
Client is Win10 with an MSI X99A Raider mobo, i7 5820k CPU, also with 32GB RAM.
Test below is going through a TEG-7080ES switch. No jumbo frames at this point - I disabled them everywhere this week as they were causing issues with IPv6 Path MTU. Jumbo frames in theory should get you that last 4-5% of performance. Of course, reality is different from the theory. Latency can also be an issue depending on application protocol. And jumbo frames do help with that. That's where disabling Nagle in helpful in iperf3 and decent throughput can still be achieved with jumbo frames.

View attachment 15088
If that makes you feel better, I'm getting even lower writes than you are over the network. I can't run CrystalDiskMark locally on Ubuntu, though (or if I could, it would be with a local Windows VM, but that probably would be bottlenecked somewhere else).
I can boot my Z170 box to Win10 also thanks to hotswap bay. But it won't be able to access the ZFS volume.

Guess it's time for me to install RAM disks on both sides and see what I achieve.
Those write speeds are probably right for 6 x 10TB mechanical drives. With my scenario, RAMDisks should be maxing out the 10GB line on both ends.
I'm only bothered about file transfer speed tbh. In that regard everything is fine apart from the write speed on one of the systems, i just cannot figure it out for the life of me. It must be the OS or the motherboard. But the same OS is running on the machine with correct 10GB write speeds, ahhhhhhh.
 

madbrain

Active Member
Jan 5, 2019
212
44
28
Those write speeds are probably right for 6 x 10TB mechanical drives. With my scenario, RAMDisks should be maxing out the 10GB line on both ends.
I'm only bothered about file transfer speed tbh. In that regard everything is fine apart from the write speed on one of the systems, i just cannot figure it out for the life of me. It must be the OS. But the same OS is running on the machine with correct 10GB write speeds, ahhhhhhh.
I think it is the OS. Bits are optimized for certain CPUs. Some instructions perform better on Intel, others on AMD. If you use OSS, you can recompile everything optimized for your local CPU . No such choice for Windows unfortunately.

I have seen some super weird things re: transfer speeds, just with iperf3 . Massively different results when running single-stream iperf3 on one Intel machine vs one AMD machine . And yes, I account for direction ... Maybe you can reproduce this too, and take the file system / SMB protocol out of this. Let me give you an example.

I have 2 AMD boxes on my network and 2 with Intel. FX-8120, Ryzen 2700, 5820k, 6600k respectively. Each with identical Aquantia NICs. All attached to a Trendnet TEG-7080ES, except for the Ryzen which is in another room and going through 2 switches - second switch is GS110MX. In my experience, a single switch isn't having much impact at all vs direct connection.

Anyway, here is an example of weirdness, from two machines running Win10.

At HIGGS terminal :

C:\Users\Julien Pierre\Desktop\iperf3>iperf3.exe -N -c bumblebee
Connecting to host bumblebee, port 5201
[ 4] local 192.168.1.26 port 55279 connected to 192.168.1.37 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 656 MBytes 5.50 Gbits/sec
[ 4] 1.00-2.00 sec 681 MBytes 5.71 Gbits/sec
[ 4] 2.00-3.00 sec 748 MBytes 6.28 Gbits/sec
[ 4] 3.00-4.00 sec 739 MBytes 6.20 Gbits/sec
[ 4] 4.00-5.00 sec 719 MBytes 6.03 Gbits/sec
[ 4] 5.00-6.00 sec 747 MBytes 6.26 Gbits/sec
[ 4] 6.00-7.00 sec 754 MBytes 6.33 Gbits/sec
[ 4] 7.00-8.00 sec 683 MBytes 5.73 Gbits/sec
[ 4] 8.00-9.00 sec 453 MBytes 3.80 Gbits/sec
[ 4] 9.00-10.00 sec 368 MBytes 3.09 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 6.40 GBytes 5.49 Gbits/sec sender
[ 4] 0.00-10.00 sec 6.40 GBytes 5.49 Gbits/sec receiver

iperf Done.

C:\Users\Julien Pierre\Desktop\iperf3>

At BUMBLEBEE terminal :

C:\Users\Julien Pierre\Desktop\iperf3>iperf3.exe -N -c higgs -R
Connecting to host higgs, port 5201
Reverse mode, remote host higgs is sending
[ 4] local 192.168.1.37 port 63767 connected to 192.168.1.26 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 581 MBytes 4.88 Gbits/sec
[ 4] 1.00-2.00 sec 564 MBytes 4.73 Gbits/sec
[ 4] 2.00-3.00 sec 570 MBytes 4.78 Gbits/sec
[ 4] 3.00-4.00 sec 580 MBytes 4.86 Gbits/sec
[ 4] 4.00-5.00 sec 586 MBytes 4.91 Gbits/sec
[ 4] 5.00-6.00 sec 577 MBytes 4.84 Gbits/sec
[ 4] 6.00-7.00 sec 612 MBytes 5.13 Gbits/sec
[ 4] 7.00-8.00 sec 594 MBytes 4.98 Gbits/sec
[ 4] 8.00-9.00 sec 594 MBytes 4.98 Gbits/sec
[ 4] 9.00-10.00 sec 461 MBytes 3.87 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 5.58 GBytes 4.80 Gbits/sec sender
[ 4] 0.00-10.00 sec 5.58 GBytes 4.80 Gbits/sec receiver

iperf Done.

Both tests are the exact same direction between two machines. First test is HIGGS sending to BUMBLEBEE . Second test is BUMBLEBEE receiving from HIGGS. HIGGS is the Intel 5820k . Both overclocked. The Intel has hyperthreading disabled. Both on TEG-7080ES switch. BUMBLEBEE is the AMD FX-8120 . There is a 0.7 Gbps difference. I have seen much weirder than that though, when switching between version of drivers, and difference OSes ..
 
Last edited:

madbrain

Active Member
Jan 5, 2019
212
44
28
I'm still confused which machine has the problem, the Intel or AMD system? :/. I'm easily confused haha
Unfortunately, you may need more than 2 machines to figure that out :-(

But this sounds like a software issue to me, not hardware.
 

n00bftw

Member
Jul 16, 2020
39
4
8
United Kingdom
I think it is the OS. Bits are optimized for certain CPUs. Some instructions perform better on Intel, others on AMD. If you use OSS, you can recompile everything optimized for your local CPU . No such choice for Windows unfortunately.

I have seen some super weird things re: transfer speeds, just with iperf3 . Massively different results when running single-stream iperf3 on one Intel machine vs one AMD machine . And yes, I account for direction ... Maybe you can reproduce this too, and take the file system / SMB protocol out of this. Let me give you an example.

I have 2 AMD boxes on my network and 2 with Intel. FX-8120, Ryzen 2700, 5820k, 6600k respectively. Each with identical Aquantia NICs. All attached to a Trendnet TEG-7080ES, except for the Ryzen which is in another room and going through 2 switches - second switch is GS110MX. In my experience, a single switch isn't having much impact at all vs direct connection.

Anyway, here is an example of weirdness, from two machines running Win10.

At HIGGS terminal :

C:\Users\Julien Pierre\Desktop\iperf3>iperf3.exe -N -c bumblebee
Connecting to host bumblebee, port 5201
[ 4] local 192.168.1.26 port 55279 connected to 192.168.1.37 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 656 MBytes 5.50 Gbits/sec
[ 4] 1.00-2.00 sec 681 MBytes 5.71 Gbits/sec
[ 4] 2.00-3.00 sec 748 MBytes 6.28 Gbits/sec
[ 4] 3.00-4.00 sec 739 MBytes 6.20 Gbits/sec
[ 4] 4.00-5.00 sec 719 MBytes 6.03 Gbits/sec
[ 4] 5.00-6.00 sec 747 MBytes 6.26 Gbits/sec
[ 4] 6.00-7.00 sec 754 MBytes 6.33 Gbits/sec
[ 4] 7.00-8.00 sec 683 MBytes 5.73 Gbits/sec
[ 4] 8.00-9.00 sec 453 MBytes 3.80 Gbits/sec
[ 4] 9.00-10.00 sec 368 MBytes 3.09 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 6.40 GBytes 5.49 Gbits/sec sender
[ 4] 0.00-10.00 sec 6.40 GBytes 5.49 Gbits/sec receiver

iperf Done.

C:\Users\Julien Pierre\Desktop\iperf3>

At BUMBLEBEE terminal :

C:\Users\Julien Pierre\Desktop\iperf3>iperf3.exe -N -c higgs -R
Connecting to host higgs, port 5201
Reverse mode, remote host higgs is sending
[ 4] local 192.168.1.37 port 63767 connected to 192.168.1.26 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 581 MBytes 4.88 Gbits/sec
[ 4] 1.00-2.00 sec 564 MBytes 4.73 Gbits/sec
[ 4] 2.00-3.00 sec 570 MBytes 4.78 Gbits/sec
[ 4] 3.00-4.00 sec 580 MBytes 4.86 Gbits/sec
[ 4] 4.00-5.00 sec 586 MBytes 4.91 Gbits/sec
[ 4] 5.00-6.00 sec 577 MBytes 4.84 Gbits/sec
[ 4] 6.00-7.00 sec 612 MBytes 5.13 Gbits/sec
[ 4] 7.00-8.00 sec 594 MBytes 4.98 Gbits/sec
[ 4] 8.00-9.00 sec 594 MBytes 4.98 Gbits/sec
[ 4] 9.00-10.00 sec 461 MBytes 3.87 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 5.58 GBytes 4.80 Gbits/sec sender
[ 4] 0.00-10.00 sec 5.58 GBytes 4.80 Gbits/sec receiver

iperf Done.

Both tests are the exact same direction between two machines. First test is HIGGS sending to BUMBLEBEE . Second test is BUMBLEBEE receiving from HIGGS. HIGGS is the Intel 5820k . Both on TEG-7080ES switch. BUMBLEBEE is the AMD FX-8120 . There is a 0.7 Gbps difference. I have seen much weirder than that though, when switching between version of drivers, and difference OSes ..
I might just get rid of the threadripper setup, it's really starting to peeve me off. I have also had several nightmares with RAID playiing up on it, it's more trouble that it's worth.
 

madbrain

Active Member
Jan 5, 2019
212
44
28
I might just get rid of the threadripper setup, it's really starting to peeve me off. I have also had several nightmares with RAID playiing up on it, it's more trouble that it's worth.
Can you reduce the number of cores in the BIOS and give that a try ? May reduce lock contention.
 

n00bftw

Member
Jul 16, 2020
39
4
8
United Kingdom
Unfortunately, you may need more than 2 machines to figure that out :-(

But this sounds like a software issue to me, not hardware.
Im currently building a Xeon server to put FreeNas on, before i do that i will install Win 10 on it and test both machines to see if the write issue disappears, or shows me which exact machine has the issue.
 

madbrain

Active Member
Jan 5, 2019
212
44
28
Im currently building a Xeon server to put FreeNas on, before i do that i will install Win 10 on it and test both machines to see if the write issue disappears, or shows me which exact machine has the issue.
Highly suggest you install at least one front bay SATA hotswap. You may need Windows just to install firmware updates once in a while. Comes in handy to have a Windows SSD. Also, I image the Ubuntu OS drive to another system from time to time. I boot Windows for that too and run Acronis . Still haven't figured out a OS good backup/restore solution I can run on Linux.
 

n00bftw

Member
Jul 16, 2020
39
4
8
United Kingdom
So, I got a drive and stuck FreeNas on it and powered up the Threadripper system.

Results from the previous Windows 10 system, ‘Threadripper as server’.

test.png


Results on the same Threadripper system, but using FreeNas.
Untitled.png


So yes, it is indeed the OS/config or something or other that is causing the issue. Might try a fresh install of Windows. If that fails, I will get rid of the Threadripper system entirely, and replace it with another Intel.

Why is it showing a single 4.81 Gbits/sec run, when the final overall is 9.47?

Bidirectional test, with a lot more data. The speeds I expected to see in the Windows OS.
more data.png
 

Attachments

Last edited:

pod

New Member
Mar 31, 2020
15
7
3
So, I got a drive and stuck FreeNas on it and powered up the Threadripper system.

Results from the previous Windows 10 system, ‘Threadripper as server’.

View attachment 15101


Results on the same Threadripper system, but using FreeNas.
View attachment 15100


So yes, it is indeed the OS/config or something or other that is causing the issue. Might try a fresh install of Windows. If that fails, I will get rid of the Threadripper system entirely, and replace it with another Intel.

Why is it showing a single 4.81 Gbits/sec run, when the final overall is 9.47?

Bidirectional test, with a lot more data. The speeds I expected to see in the Windows OS.
View attachment 15104
Why is it showing a single 4.81 Gbits/sec run, when the final overall is 9.47?

Didn't stay at a Holiday Inn but I'd say that's a rounding error at most. only 168K probably took all of 150 microseconds and because 168K is a small percentage of the data it should not have changed results in a meaningful way.
 
  • Like
Reactions: n00bftw