10GB Performance issues

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Imma go out on a limb here and guess that he didn't have jumbo turned up in the vlan or globally on the switch,
 

BackupProphet

Well-Known Member
Jul 2, 2014
1,093
652
113
Stavanger, Norway
olavgg.com
Actually you don't need jumbo frames on a hardware accelerated network card. The performance increase is at most marginal. My suggestion is to use the normal 1500 MTU, unless you really really have a good reason.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I have clearly seen on my 10G network w/out jumbo frames I can maybe push roughly 6Gbps until I turn up jumbo then I get line rate. With GigE yeah not much point to use jumbo, on 10G it is fairly dramatic (to the tune of 25-30% throughput increase I have found in my testing).

Can anyone else confirm similar results or am I just making this sh|t up? heh :-D

2cents. YMMV

EDIT: Proof in pudding. Comparison at least on my network w/ non-jumbo and jumbo enabled so you cannot tell me for the life of me that jumbo does not do anything or is even close to 'marginal' speed increases.

This is VM to VM on different hosts, hosts have x520 intel adapter and VM's have vmxnet3 virtual nics, forcing traffic up over ex3300 and the aforementioned infrastructure so this is abt as representative to 'real-world' as you can get. I recently had this debate w/ a colleague of mine who touted that jumbo was mostly worthless or introduced more complexity than benefits gained then he saw these results and quickly retracted.
 

Attachments

Last edited:
  • Like
Reactions: cthulolz

Rain

Active Member
May 13, 2013
276
124
43
iperf is sending a single stream (meaning a single thread). On a 5520 you will bottleneck a single core of the CPU long before you reach 10Gbe throughput. You've got to get all 4 cores (or even all 8 threads) active before you'll see 10Gbe on the wire.
For what its worth:
Code:
CPU: Intel(R) Xeon(R) CPU  E5520  @ 2.27GHz (2275.82-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x106a5  Family = 6  Model = 1a  Stepping = 5

(0:4) test1:/sysprog/terry# iperf -c test2
------------------------------------------------------------
Client connecting to test2, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[  3] local 10.20.30.40 port 26252 connected with 10.20.30.41 port 5001
[ ID] Interval  Transfer  Bandwidth
[  3]  0.0-10.2 sec  11.7 GBytes  9.89 Gbits/sec
Hardware is dual E5520, network card is X540-T1. FreeBSD 8.4.
Yea, the CPU should be more than capable. See my posts in this thread: https://forums.servethehome.com/ind...mance-looking-for-tweaks-hints-whatever.7031/. Specifically, this post (#12) where I tested Sequential Samba4 performance on a L5520 underclocked to 1.6GHz with only 2 cores enabled.

You should definitely be getting 10Gb/s in a single-threaded Iperf test, no question about that. Are you testing with ESXi or Windows 2012 R2 installed on the bare metal? If windows, make sure you've got the most up-to-date Intel NIC drivers installed; that'll probably make a big difference, especially if your using the generic drives built into the OS. I would also try booting into Linux on both boxes and trying Iperf again, just to double check.

Also, what board do you have these cards plugged into?
 

alex1002

Member
Apr 9, 2013
519
19
18
This is the Dell motherboard in these servers. They
Are pcie 8x

Running Windows barebone. Also the latest Intel drivers

The Dell switch is got the mtu set to 9216 for all the ports. This is what they recommend.

I was so excited for 10gbe and to really improve my backup performance. :( seems I am stuck with the 1.6. I ordered new lc to lc cables from Amazon. Will see if those make any difference. I will also test using my friends Dell transceivers and see if it makes a difference.
 

alex1002

Member
Apr 9, 2013
519
19
18
This one of those thing where you need to set max switch device MTU to something like 9216? It's different for various vendors but it's worth looking into, seems like you have some HEAVY fragmentation/re-transmissions going on, bet a wireshark session replayed would look...ummm...interesting :-D
Does it look better now?
netsh int ip show int

Idx Met MTU State Name
--- ---------- ---------- ------------ ---------------------------
1 50 4294967295 connected Loopback Pseudo-Interface 1
21 5 9000 connected A3 10GBE

What should I look for in a wireshark?
 

BackupProphet

Well-Known Member
Jul 2, 2014
1,093
652
113
Stavanger, Norway
olavgg.com
I have clearly seen on my 10G network w/out jumbo frames I can maybe push roughly 6Gbps until I turn up jumbo then I get line rate. With GigE yeah not much point to use jumbo, on 10G it is fairly dramatic (to the tune of 25-30% throughput increase I have found in my testing).

Can anyone else confirm similar results or am I just making this sh|t up? heh :-D

2cents. YMMV

EDIT: Proof in pudding. Comparison at least on my network w/ non-jumbo and jumbo enabled so you cannot tell me for the life of me that jumbo does not do anything or is even close to 'marginal' speed increases.

This is VM to VM on different hosts, hosts have x520 intel adapter and VM's have vmxnet3 virtual nics, forcing traffic up over ex3300 and the aforementioned infrastructure so this is abt as representative to 'real-world' as you can get. I recently had this debate w/ a colleague of mine who touted that jumbo was mostly worthless or introduced more complexity than benefits gained then he saw these results and quickly retracted.
That is probably most likely because you are using Windows. It is sad well known fact that the TCP implementation for Windows has serious performance/latency issues.
Linux is much better, while FreeBSD is the king(maybe Solaris is even better, I don't know much about it).

This is what I get from two old servers with dual core AMD X2 5600+ and Ubuntu 14.04

Code:
meanwhile@mw-dev1:~$ iperf -c 10.2.2.10 -t 20 -m -P3 -i 4
------------------------------------------------------------
Client connecting to 10.2.2.10, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  5] local 10.2.2.20 port 53256 connected with 10.2.2.10 port 5001
[  3] local 10.2.2.20 port 53254 connected with 10.2.2.10 port 5001
[  4] local 10.2.2.20 port 53255 connected with 10.2.2.10 port 5001
[ ID] Interval  Transfer  Bandwidth
[  3]  0.0- 4.0 sec  1.43 GBytes  3.07 Gbits/sec
[  4]  0.0- 4.0 sec  1.31 GBytes  2.81 Gbits/sec
[  5]  0.0- 4.0 sec  1.55 GBytes  3.33 Gbits/sec
[SUM]  0.0- 4.0 sec  4.29 GBytes  9.21 Gbits/sec
[  5]  4.0- 8.0 sec  1.68 GBytes  3.62 Gbits/sec
[  3]  4.0- 8.0 sec  1.34 GBytes  2.89 Gbits/sec
[  4]  4.0- 8.0 sec  1.36 GBytes  2.91 Gbits/sec
[SUM]  4.0- 8.0 sec  4.38 GBytes  9.41 Gbits/sec
[  5]  8.0-12.0 sec  1.60 GBytes  3.44 Gbits/sec
[  3]  8.0-12.0 sec  1.42 GBytes  3.04 Gbits/sec
[  4]  8.0-12.0 sec  1.36 GBytes  2.92 Gbits/sec
[SUM]  8.0-12.0 sec  4.38 GBytes  9.40 Gbits/sec
[  3] 12.0-16.0 sec  1.33 GBytes  2.85 Gbits/sec
[  4] 12.0-16.0 sec  1.35 GBytes  2.90 Gbits/sec
[  5] 12.0-16.0 sec  1.71 GBytes  3.67 Gbits/sec
[SUM] 12.0-16.0 sec  4.38 GBytes  9.41 Gbits/sec
[  5] 16.0-20.0 sec  1.70 GBytes  3.65 Gbits/sec
[  5]  0.0-20.0 sec  8.25 GBytes  3.54 Gbits/sec
[  5] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
[  3]  0.0-20.0 sec  6.85 GBytes  2.94 Gbits/sec
[  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
[  4]  0.0-20.0 sec  6.72 GBytes  2.89 Gbits/sec
[  4] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
[SUM]  0.0-20.0 sec  21.8 GBytes  9.37 Gbits/sec
And with jumbo frames
Code:
meanwhile@mw-dev1:~$ iperf -c 10.2.2.10 -t 20 -m -P3 -i 4
------------------------------------------------------------
Client connecting to 10.2.2.10, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  5] local 10.2.2.20 port 53259 connected with 10.2.2.10 port 5001
[  4] local 10.2.2.20 port 53258 connected with 10.2.2.10 port 5001
[  3] local 10.2.2.20 port 53257 connected with 10.2.2.10 port 5001
[ ID] Interval  Transfer  Bandwidth
[  4]  0.0- 4.0 sec  1.34 GBytes  2.87 Gbits/sec
[  3]  0.0- 4.0 sec  1.31 GBytes  2.81 Gbits/sec
[  5]  0.0- 4.0 sec  1.87 GBytes  4.01 Gbits/sec
[SUM]  0.0- 4.0 sec  4.52 GBytes  9.70 Gbits/sec
[  5]  4.0- 8.0 sec  1.83 GBytes  3.92 Gbits/sec
[  4]  4.0- 8.0 sec  1.37 GBytes  2.95 Gbits/sec
[  3]  4.0- 8.0 sec  1.40 GBytes  3.00 Gbits/sec
[SUM]  4.0- 8.0 sec  4.60 GBytes  9.87 Gbits/sec
[  5]  8.0-12.0 sec  1.97 GBytes  4.23 Gbits/sec
[  4]  8.0-12.0 sec  1.33 GBytes  2.85 Gbits/sec
[  3]  8.0-12.0 sec  1.30 GBytes  2.80 Gbits/sec
[SUM]  8.0-12.0 sec  4.60 GBytes  9.88 Gbits/sec
[  5] 12.0-16.0 sec  1.93 GBytes  4.15 Gbits/sec
[  4] 12.0-16.0 sec  1.33 GBytes  2.86 Gbits/sec
[  3] 12.0-16.0 sec  1.34 GBytes  2.88 Gbits/sec
[SUM] 12.0-16.0 sec  4.60 GBytes  9.89 Gbits/sec
[  3]  0.0-20.0 sec  6.74 GBytes  2.89 Gbits/sec
[  3] MSS size 8948 bytes (MTU 8988 bytes, unknown interface)
[  5] 16.0-20.0 sec  1.82 GBytes  3.90 Gbits/sec
[  5]  0.0-20.0 sec  9.41 GBytes  4.04 Gbits/sec
[  5] MSS size 8948 bytes (MTU 8988 bytes, unknown interface)
[  4]  0.0-20.0 sec  6.77 GBytes  2.91 Gbits/sec
[  4] MSS size 8948 bytes (MTU 8988 bytes, unknown interface)
[SUM]  0.0-20.0 sec  22.9 GBytes  9.84 Gbits/sec
This is with default values set by Ubuntu, I have not changed anything with sysctl.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
That is probably most likely because you are using Windows. It is sad well known fact that the TCP implementation for Windows has serious performance/latency issues.
Linux is much better, while FreeBSD is the king(maybe Solaris is even better, I don't know much about it).
That's ALMOST laughable, I almost NEVER use windows at all on my LAN unless absolutely necessary (15+ yrs as Unix/Linux Systems Engineer from all three proprietary Unix's as well as just about every Linux flavor you can think of and some BSD here and there). Of the 30+ VM's i run day in and day out maybe a small handful 3-4 are Windows.

My tests were conducted using two ubuntu 14.04 U3 LTS servers good sir. Everything default (hell no vmware tools or open-vm-tools even)...all I did was simply setup phys/virt switching infra for jumbo and on one test leave default ubuntu 1500 mtu setup and to test jumbo execute 'ifconfig eth0 mtu 9000' and re-run test so yeah...about as vanilla as you can get.
 

push3r

Member
Feb 19, 2015
52
11
8
54
alex1002, have you tried connecting the two servers directly, eliminating the switch from the picture like what namike suggested?

Also, since you are using Windows 2012 R2 servers, look into the TCP receive window Auto-Tuning feature and see if it's working properly using Wireshark. You can even set the TCP window auto-tuning to "experimental" so it can scale to something really big for 10Gb. Although, there's not latency (not good to make assumption, look for it in wireshark), "normal" setting should be fine. See reference link below.

Use Wireshark to troubleshoot as you can see:
1) 3-way handshake conversation advertising tcp window size, mtu, window scaling, maximum segment size (MSS) etc.
2) check for fragmentation, etc...

Maybe Wireshark will catch something obvious ie. bad optic cable or NIC's showing lots of crc errors, fragmentation, etc?

I would start without Jumbo Frame first just to troubleshoot.

You can tweak the TCP Auto-Tuning using this guide here.
Windows 8, 10, 2012 Server TCP/IP Tweaks

Lots of tutorial out there for Wireshark to check network performance.
 
  • Like
Reactions: alex1002

push3r

Member
Feb 19, 2015
52
11
8
54
I've been trying to troubleshoot my ipsec S2S VPN throughput due to WAN latency and found the SpeedGuide.net tool "SG TCP Optimizer" really amazing to quickly tweak your TCP settings.

Set the slider to 100+Mbps and "Optimal"; and you can see that it will increase your TCP Window Auto-Tuning to "experimental" and other setttings like Receive-Side Scaling (RSS), Congestion, etc...all of these will have impact on TCP throughput in the LAN also.
 
Last edited:
  • Like
Reactions: alex1002

alex1002

Member
Apr 9, 2013
519
19
18
I've been trying to troubleshoot my ipsec S2S VPN throughput due to WAN latency and found the SpeedGuide.net tool "SG TCP Optimizer" really amazing to quickly tweak your TCP settings.

Set the slider to 100+Mbps and "Optimal"; and you can see that it will increase your TCP Window Auto-Tuning to "experimental" and other setttings like Receive-Side Scaling (RSS), Congestion, etc...all of these will have impact on TCP throughput in the LAN also.
I tried this. I am starting to think its the cables or the transceivers.
 

alex1002

Member
Apr 9, 2013
519
19
18
GREAT NEWS!!
I am the biggest IDIOT ever. I switched the card in the Server from the PCI 8X slot to the PCI 16x and see the different. Issue fixed
iperf3.exe -c amaran3
Connecting to host amaran3, port 5201
[ 4] local 192.168.0.85 port 49903 connected to 192.168.0.41 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 1.08 GBytes 9.31 Gbits/sec
[ 4] 1.00-2.00 sec 1.09 GBytes 9.35 Gbits/sec
[ 4] 2.00-3.00 sec 1.08 GBytes 9.32 Gbits/sec
[ 4] 3.00-4.00 sec 1.09 GBytes 9.39 Gbits/sec
[ 4] 4.00-5.00 sec 1.09 GBytes 9.36 Gbits/sec
[ 4] 5.00-6.00 sec 1.09 GBytes 9.39 Gbits/sec
[ 4] 6.00-7.00 sec 1.09 GBytes 9.36 Gbits/sec
[ 4] 7.00-8.00 sec 1.10 GBytes 9.43 Gbits/sec
[ 4] 8.00-9.00 sec 1.10 GBytes 9.43 Gbits/sec
[ 4] 9.00-10.00 sec 1.09 GBytes 9.39 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 10.9 GBytes 9.37 Gbits/sec sender
[ 4] 0.00-10.00 sec 10.9 GBytes 9.37 Gbits/sec receiver

iperf Done.

Thank you
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
Your 10gbe card is an X8 card so that shouldn't matter - unless the X8 slot is only x4 electrical (which is actually likely on many server MB).

What MB are you using?
 
  • Like
Reactions: alex1002

alex1002

Member
Apr 9, 2013
519
19
18
Your 10gbe card is an X8 card so that shouldn't matter - unless the X8 slot is only x4 electrical (which is actually likely on many server MB).

What MB are you using?
According to the Dell website these the 8x slot is electrically 8x also, the 16x is also 16x. not sure why this happened?
 

push3r

Member
Feb 19, 2015
52
11
8
54
GREAT NEWS!!
I am the biggest IDIOT ever. I switched the card in the Server from the PCI 8X slot to the PCI 16x and see the different. Issue fixed
iperf3.exe -c amaran3
Connecting to host amaran3, port 5201
[ 4] local 192.168.0.85 port 49903 connected to 192.168.0.41 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 1.08 GBytes 9.31 Gbits/sec
[ 4] 1.00-2.00 sec 1.09 GBytes 9.35 Gbits/sec
[ 4] 2.00-3.00 sec 1.08 GBytes 9.32 Gbits/sec
[ 4] 3.00-4.00 sec 1.09 GBytes 9.39 Gbits/sec
[ 4] 4.00-5.00 sec 1.09 GBytes 9.36 Gbits/sec
[ 4] 5.00-6.00 sec 1.09 GBytes 9.39 Gbits/sec
[ 4] 6.00-7.00 sec 1.09 GBytes 9.36 Gbits/sec
[ 4] 7.00-8.00 sec 1.10 GBytes 9.43 Gbits/sec
[ 4] 8.00-9.00 sec 1.10 GBytes 9.43 Gbits/sec
[ 4] 9.00-10.00 sec 1.09 GBytes 9.39 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 10.9 GBytes 9.37 Gbits/sec sender
[ 4] 0.00-10.00 sec 10.9 GBytes 9.37 Gbits/sec receiver

iperf Done.

Thank you
NICE! Surprisingly, no one suggested that. :) Well, the 8x slot on your MB should have worked in the first place. That's bizzare. What's the MB on your boxes?
 

alex1002

Member
Apr 9, 2013
519
19
18
NICE! Surprisingly, no one suggested that. :) Well, the 8x slot on your MB should have worked in the first place. That's bizzare. What's the MB on your boxes?
According to Dell when I called them, they confirmed the 8x should be able to work well this the Intel 10GBE cards. I am starting to think the riser card is limited bandwidth. I moved this card to the 16x on the other risercard.

Right now: I love 10GBE and want more of it.What cards can people recommend to me on the affordableside?
 

push3r

Member
Feb 19, 2015
52
11
8
54
Mellanox ConnectX-2 is super cheap. ~ $15 - $20 bucks. But it's a bit dated and doesn't have Remote Direct Memory Access (RDMA). Also, when buying these cards, be sure you get the correct profile for your available slots on your server chassis. Low profile or regular profile. Finding the low profile or regular profile brackets for them is nearly impossible.

With RDMA you can truly test Windows SMB Direct, SMB Multi-Channel, etc. Check this link. http://www.mellanox.com/page/microsoft_based_solutions

Might try ConnectX-3 but I think those are for 40 GbE. Time to update your "newly obsolete" switch. LOL
 
Last edited:
  • Like
Reactions: Chuntzu

alex1002

Member
Apr 9, 2013
519
19
18
Mellanox ConnectX-2 is super cheap. ~ $15 - $20 bucks. But it's a bit dated and doesn't have Remote Direct Memory Access (RDMA). Also, when buying these cards, be sure you get the correct profile for your available slots on your server chassis. Low profile or regular profile. Finding the low profile bracket for them is nearly impossible.

With RDMA you can truly test Windows SMB Direct, SMB Multi-Channel, etc. Check this link. http://www.mellanox.com/page/microsoft_based_solutions

Might try ConnectX-3 but I think those are for 40 GbE. Time to update your "newly obsolete" switch. LOL
Do you have any ebay links for a good switch I can upgrade to?