10GB Performance issues

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

alex1002

Member
Apr 9, 2013
519
19
18
The setup
I am pleased to announce I finally got the "poor" man 10gb solution going. Using the following parts:
Dell PowerConnect 6224 24-Port Gigabit Layer 3 Switch w/ Rack Ears "WARRANTY"
with
Dell PowerConnect FJ727 10GBe Fiber Module XFP 6224 6248 w/ 1x FTLX8511D3 GBIC
and
NEW SEALED DELL FORCE10 FTLX8511D3-FC GP-XFP-1S MFGR4 XFP 10GB SR transceiver
going into
Intel 10 GbE XF SR 2 Port Server Adapter PCIe Full Height EXPX9502FXSRGP5
Now two of my ESXi servers are on 10gbe link, on the cheap. I am plan to stack another Dell6224 and get use the same method to connect another two servers 10gb.

Unit Interface Mode Mode Status Speed (Gb/s)
---- ---------------- ---------- ---------- ------------ ------------
1 xg1 Ethernet Ethernet Link Up 10
1 xg2 Ethernet Ethernet Link Up 10

Thank you @push3r
Amazing Advice!

My only issue is how to optimize it, advice highly required. Each server is got Raid10 with 8 ssds. I am only getting 185mb/s a second when testing. Jumbo frames enable.

I did try with different optimization settings found on the Intel/Microsoft Guides. I know my raid was push over 550mb/s. I also created ram drives on both servers and tried but no happiness.
Copying from RAID drive D:\ NTFS 8TB to 8GB Ram Drive 7.5gb file 488mb/s.Copying from Ram Disk to Ram disk on the other server 176mb/s
Checked both servers and are fast copying from Raid to Ram drive.
Both servers are Windows 2012 R2, lastest Intel drives.

Could it be my LC to LC cables?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,808
113
Have you tried iperf or similar? Good to use something like that in order to troubleshoot the link v. if you are having storage subsystem issues.
 

alex1002

Member
Apr 9, 2013
519
19
18
Connecting to host 192.168.0.85, port 5201
[ 4] local 192.168.0.177 port 50098 connected to 192.168.0.85 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 1.00-2.00 sec 170 MBytes 1.43 Gbits/sec
[ 4] 2.00-3.00 sec 191 MBytes 1.60 Gbits/sec
[ 4] 3.00-4.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 4.00-5.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 5.00-6.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 6.00-7.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 7.00-8.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 8.00-9.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 9.00-10.00 sec 192 MBytes 1.61 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 1.86 GBytes 1.59 Gbits/sec sender
[ 4] 0.00-10.00 sec 1.86 GBytes 1.59 Gbits/sec receiver

iperf Done.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Well that doesn't look good on a 10G network :-( Jumbo frames enabled end to end, what are two (src/dest) systems you are testing w/...phys or virt?
 

alex1002

Member
Apr 9, 2013
519
19
18
The Machines are physical now. Both sides got RAID10. Dual Xeons 16cores each, 48gb ram
 

namike

Member
Sep 2, 2014
70
18
8
43
Since it is only 2 physical boxes I'd plug my servers in directly to each other and eliminate the switch.

I'd also start out without jumbo frame support.

If you want to test if jumbo frames are working, you can do a ping with a large MTU to ensure jumbo frames are enabled end to end.

How to test if 9000 MTU/Jumbo Frames are working - Blah, Cloud.

That article explains it. In your case since you are using windows use the "ping -f -l 9000". The -f switch is important as it will tell it not to fragment your transmission.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I have that exact Intel XF adapter (single port only) and the most I was able to push was roughly 6Gbps on that older gen 10G silicon using iperf. My X520-DA2's push line rate in iperf/jperf so if I was a betting man it's switch/transceivers/optics (networking) related.
 
Last edited:

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
What is the exact iperf command you are running? What size packets, how many threads, etc?

What is the CPU?
 

alex1002

Member
Apr 9, 2013
519
19
18
What is the exact iperf command you are running? What size packets, how many threads, etc?

What is the CPU?
I did a simple file copy at first. I then ran ipperf using the simple -c for now.

Both sides are Dell r710
CPUs dual E5520 4c/8t each
48gb ddr3
raid10 perc h700
6*15k cheetah raid10
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
iperf is sending a single stream (meaning a single thread). On a 5520 you will bottleneck a single core of the CPU long before you reach 10Gbe throughput. You've got to get all 4 cores (or even all 8 threads) active before you'll see 10Gbe on the wire.

Try increasing the number of threads in the test. Use "-P 4" on the iperf to use all four cores or "-P 8" to use all 8 threads concurrently and see what happens.

Also, try increasing the TCP window size ("-w 4000").
 

alex1002

Member
Apr 9, 2013
519
19
18
Results even worst
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 220 MBytes 185 Mbits/sec sender
[ 4] 0.00-10.00 sec 220 MBytes 185 Mbits/sec receiver
[ 6] 0.00-10.00 sec 220 MBytes 185 Mbits/sec sender
[ 6] 0.00-10.00 sec 220 MBytes 185 Mbits/sec receiver
[ 8] 0.00-10.00 sec 2.88 MBytes 2.41 Mbits/sec sender
[ 8] 0.00-10.00 sec 2.64 MBytes 2.21 Mbits/sec receiver
[ 10] 0.00-10.00 sec 3.00 MBytes 2.52 Mbits/sec sender
[ 10] 0.00-10.00 sec 2.88 MBytes 2.42 Mbits/sec receiver
[ 12] 0.00-10.00 sec 219 MBytes 184 Mbits/sec sender
[ 12] 0.00-10.00 sec 219 MBytes 184 Mbits/sec receiver
[ 14] 0.00-10.00 sec 220 MBytes 184 Mbits/sec sender
[ 14] 0.00-10.00 sec 219 MBytes 184 Mbits/sec receiver
[ 16] 0.00-10.00 sec 439 MBytes 368 Mbits/sec sender
[ 16] 0.00-10.00 sec 439 MBytes 368 Mbits/sec receiver
[ 18] 0.00-10.00 sec 441 MBytes 370 Mbits/sec sender
[ 18] 0.00-10.00 sec 441 MBytes 370 Mbits/sec receiver
[SUM] 0.00-10.00 sec 1.72 GBytes 1.48 Gbits/sec sender
[SUM] 0.00-10.00 sec 1.72 GBytes 1.48 Gbits/sec receiver
 

alex1002

Member
Apr 9, 2013
519
19
18
C:\>ping -f -l 9000 192.168.0.177

Pinging 192.168.0.177 with 9000 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.

Ping statistics for 192.168.0.177:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

C:\>netsh int ip show int
The following helper DLL cannot be loaded: WCNNETSH.DLL.

Idx Met MTU State Name
--- ---------- ---------- ------------ ---------------------------
1 50 4294967295 connected Loopback Pseudo-Interface 1
16 5 9000 connected REPO-1 10GBE
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
This one of those thing where you need to set max switch device MTU to something like 9216? It's different for various vendors but it's worth looking into, seems like you have some HEAVY fragmentation/re-transmissions going on, bet a wireshark session replayed would look...ummm...interesting :-D
 

alex1002

Member
Apr 9, 2013
519
19
18
C:\Users\administrator>ping -f -l 9014 192.168.0.177

Pinging 192.168.0.177 with 9014 bytes of data:
Reply from 192.168.0.177: bytes=9014 time<1ms TTL=128
Reply from 192.168.0.177: bytes=9014 time<1ms TTL=128
Reply from 192.168.0.177: bytes=9014 time<1ms TTL=128
Reply from 192.168.0.177: bytes=9014 time<1ms TTL=128

Ping statistics for 192.168.0.177:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\Users\administrator>ping -f -l 9216 192.168.0.177

Pinging 192.168.0.177 with 9216 bytes of data:
Reply from 192.168.0.177: bytes=9216 time<1ms TTL=128
Reply from 192.168.0.177: bytes=9216 time<1ms TTL=128
Reply from 192.168.0.177: bytes=9216 time<1ms TTL=128
Reply from 192.168.0.177: bytes=9216 time<1ms TTL=128

Ping statistics for 192.168.0.177:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
What did you change to get the jumbo frame pings working? Now that you've got the jumbo frames how does the iperf look?
 

Terry Kennedy

Well-Known Member
Jun 25, 2015
1,142
594
113
New York City
www.glaver.org
iperf is sending a single stream (meaning a single thread). On a 5520 you will bottleneck a single core of the CPU long before you reach 10Gbe throughput. You've got to get all 4 cores (or even all 8 threads) active before you'll see 10Gbe on the wire.
For what its worth:
Code:
CPU: Intel(R) Xeon(R) CPU  E5520  @ 2.27GHz (2275.82-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x106a5  Family = 6  Model = 1a  Stepping = 5

(0:4) test1:/sysprog/terry# iperf -c test2
------------------------------------------------------------
Client connecting to test2, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[  3] local 10.20.30.40 port 26252 connected with 10.20.30.41 port 5001
[ ID] Interval  Transfer  Bandwidth
[  3]  0.0-10.2 sec  11.7 GBytes  9.89 Gbits/sec
Hardware is dual E5520, network card is X540-T1. FreeBSD 8.4.