Low 10GB speeds between direct attached servers

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

manxam

Active Member
Jul 25, 2015
234
50
28
Hi, I have a Windows server 2012 R2 installation on a Dell R710 with an Intel AF DA nic in an x8 slot.
I also have an OmniOS installation on a Supermicro X8DTH with an Intel X520-DA2.

These devices are directly connected on a 10.10.10.0/24 subnet using Twinax. Both are configured with large packets [MTU 9014 (windows) and MTU 9000 (OmniOS)]

Performing an iperf3 (iperf -c 10.10.10.10 -i 1 -t 10) using Windows as the server:
Code:
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   462 MBytes  3.87 Gbits/sec
[  5]   1.00-2.00   sec   471 MBytes  3.95 Gbits/sec
[  5]   2.00-3.00   sec   432 MBytes  3.62 Gbits/sec
[  5]   3.00-4.00   sec   465 MBytes  3.90 Gbits/sec
[  5]   4.00-5.00   sec   455 MBytes  3.82 Gbits/sec
[  5]   5.00-6.00   sec   426 MBytes  3.58 Gbits/sec
[  5]   6.00-7.00   sec   442 MBytes  3.70 Gbits/sec
[  5]   7.00-8.00   sec   424 MBytes  3.56 Gbits/sec
[  5]   8.00-9.00   sec   424 MBytes  3.55 Gbits/sec
[  5]   9.00-10.00  sec   462 MBytes  3.87 Gbits/sec
[  5]  10.00-10.01  sec  4.02 MBytes  4.22 Gbits/sec
Performing an iperf3 using OmniOS as the server:
Code:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   619 MBytes  5.19 Gbits/sec
[  4]   1.00-2.00   sec   679 MBytes  5.70 Gbits/sec
[  4]   2.00-3.00   sec   672 MBytes  5.63 Gbits/sec
[  4]   3.00-4.00   sec   671 MBytes  5.63 Gbits/sec
[  4]   4.00-5.00   sec   677 MBytes  5.68 Gbits/sec
[  4]   5.00-6.00   sec   671 MBytes  5.63 Gbits/sec
[  4]   6.00-7.00   sec   674 MBytes  5.65 Gbits/sec
[  4]   7.00-8.00   sec   672 MBytes  5.64 Gbits/sec
[  4]   8.00-9.00   sec   685 MBytes  5.74 Gbits/sec
[  4]   9.00-10.00  sec   655 MBytes  5.49 Gbits/sec
Can someone please explain why I'm seeing considerably less than 10GB performance and why there's so much difference between using Windows and OmniOS as the sender/receiver?

Thanks in advance for any advice/suggestions!
-M
 

ttabbal

Active Member
Mar 10, 2016
747
207
43
47
Try without the large MTU. Some systems don't like that. And shouldn't they be the same?
 

azev

Well-Known Member
Jan 18, 2013
769
251
63
so I am having similar result with different motherboard with the same setup (dual IOH) To be honest right now I am pointing the issue to slow PCI-E bandwidth on this mobo, and planning to upgrade the system to a sandy bridge setup.

Anyway you can try running the test multi threaded with -P 8 for example and see if that gets you better result.

I was able to get better result by putting the CPU and pin point the numa mode configs on the nic to the 1st cpu.
However the total bandwidth I can get per IOH chip seems to be limited to only 10Gbps.
 

manxam

Active Member
Jul 25, 2015
234
50
28
Thanks guys. Changing the MTU to 1500 on both made no difference (@ttabbal: Windows includes the header size in the MTU so makes it 9014 whereas most *nix OS' don't include the header when setting MTU size so, therefore, it's 9000 -- both are the same true value AFAIK)

Adding the -P 8 using the OmniOS box (X8DTH, single E5530 + 32GB + X520) as the server nets me 9.18Gb/s as [SUM]. Using the Windows box (R710, dual E5570 + 64GB + AF DA) as the server nets me 5.34Gb/s as the [SUM].

Both servers only increase CPU slightly under testing so they're not CPU bound.

I'm thoroughly confused...
 

ttabbal

Active Member
Mar 10, 2016
747
207
43
47
Very strange. I get 9.7 on iperf between freenas and linux with default settings and 1500 MTU. maybe try running client and server locally? It might expose an issue with the IP stack.

I don't have a Windows Machine here to test with. Maybe try booting a live Linux and trying it against SmartOS? If it works well, you at least know it's a software problem and on which end.

I'd offer to test more, but I've discovered that I managed to damage the fiber installing it. sigh....
 
Last edited:

manxam

Active Member
Jul 25, 2015
234
50
28
@ttabbal: Your suggestion is a good idea. As they're dual port cards I'll just loop each NIC in each server and perform an iperf between ports. This way (I hope) I'll be able to determine if it's server-to-server or a specific card / server that's the issue.

Expect to hear back from me with.. "wtf is going on??" :)
 

manxam

Active Member
Jul 25, 2015
234
50
28
And here it is.. Strangely I'm seeing the following on the OmniOS X520-DA2 box:
Code:
[SUM]   0.00-10.00  sec  33.3 GBytes  28.6 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  33.3 GBytes  28.6 Gbits/sec                  receiver
And on the R710 AF DA box:
Code:
[ SUM]   0.00-10.00  sec   462 MBytes  3.87 Gbits/sec
So, why am I seeing almost 30Gbps on a looped 10GB card on one box and 3.87Gbps on another?
Having said that, it's obvious that the R710 with the AF DA card is the bottleneck but I'm not certain why.
I've tried with the in-built 2012 R2 drivers as well as the most recent Intel drivers and the results are the same.
 

ttabbal

Active Member
Mar 10, 2016
747
207
43
47
I suspect the 30gbps is the card or OS bypassing the physical layer. So you're basically testing DMA speed. Why the other machine is getting such poor speeds is a good question. You've tried drivers, maybe try swapping NICS?

Could the card be in a slot with low speed? 1x electrical?
 

azev

Well-Known Member
Jan 18, 2013
769
251
63
for some reason running iperf server/client locally on the same windows box I am getting similar result, where I am getting full saturation 10gb on multiple nic running concurrently.
 

gea

Well-Known Member
Dec 31, 2010
3,157
1,195
113
DE
The OmniOS default tcp settings are optimized for 1G and low RAM usage.
Have you increased buffer settings ex to

max_buf=4097152 tcp
send_buf=2048576 tcp
recv_buf=2048576 tcp

I set this as a part of my default tuning options in napp-it what gives me up to
1000 MB/s not only on ip but SMB service level.

On Windows side, disabling int_throtteling is a very performance sensitive setting.
http://napp-it.de/doc/downloads/performance_smb2.pdf
 

manxam

Active Member
Jul 25, 2015
234
50
28
Thank you all! After changing the settings as per @gea I'm getting "acceptable" performance. 10gb -> OmniOS and 5gb -> Windows. I'm going to tack the low performance of the 10GB on Windows to the older AF DA card. I've confirmed it's in an x8 slot but I can't determine any limitations. Disabling interrupt modulation increased speed by a little over 1Gb/s which, while it's less than 10GB, is faster than my disk subsystem can handle.

I suspect i'll just order another X520 in the near future...
 

james23

Active Member
Nov 18, 2014
441
122
43
52
(below is all iperf2 and iperf3 #'s)
im battling this same issue, exclusive to Win hosts (2012r2) (guests on esxi 6.5u2 as well as bare metal win 2012r2).

no probs on the same host when using ubuntu guest, getting 9.7gbit to a baremetal FN 11.2 server. but on anything win2012r2 i cant get above 4.5-6gbit.

one change that gave me a 1 gbit boost, was to click uninstall on the "QoS Packet Scheduler" (on network adaptor -> properties). (this got me to ~6-6.5gbit in either direction).

mtu 1500 or 9000 does not affect me (and verified mtu 9k via dont frag -s 8972 pings). using multiple streams -P 4 gets me up to ~9gbit on win, but single stream is what im after (as iscsi and smb transfers max at ~550 mb/s from ARC or all flash).
 

manxam

Active Member
Jul 25, 2015
234
50
28

james23

Active Member
Nov 18, 2014
441
122
43
52
What adapter are you using? If using an Intel adapter, download the latest drivers from Intel (that has the advanced settings configuration) and disable interrupt throttling.
Thanks, should i do this for a vmware windows guest? (that is what im using is esxi 6.5 guests, connecting to a baremetal FN host).
FN box has a chelsio t520 . the vmware host (w win VMs) has both a chelsio t520 and the 2x onboard x540 intel 10g nics (was seeing same ~5gbit speeds with both)

I can update with this, i randomly tried a windows 10 guest vm, and HUGE performance boost (for SMB mainly). now when using SMB to send or RX from the FN box, i get close to maxing 10gbit (and im watching to be sure im not using arc). this is ofcourse with sync=disabled and a raid0 / stripe of 4x HGST sas3 ssds. but i needed to see this type of speed is possible, before moving forward. So a big part of this is win 10 / server 2016 is much more optimized than 2012r2 when it comes to 10gbit. Also w win10 MTU 9k vs 1500 is still good for ~ 5% improvement, but clearly it was an OS or windows issue mainly. (not even a esxi issue, as my baremetal 2012r2 system wouldnt go much above 5gbit w smb).

( above is with the normal vmware vmx3 driver , when using sriov passthrough for the nic, performance was about the same).

intersting thing, even with win 10 , iperf 2 and 3 to the FN box is still ~ 5gbit (unless i use -P 6 , just like is case on 2012r2 guests)
tks
 

datacounts

New Member
Dec 2, 2020
1
0
1
I just wanted to give this page a bump in the ol google results. running intel 10GB x520 cards in proxmox and truenas all getting near line speed, and then moving to a windows 10 OS, the speed would drop to ~5G or so. disabling interrupt moderation got the speeds backup to near 10G. Thanks for this post james23