Low 10GB speeds between direct attached servers

Discussion in 'Networking' started by manxam, Apr 21, 2016.

  1. manxam

    manxam Active Member

    Joined:
    Jul 25, 2015
    Messages:
    226
    Likes Received:
    47
    Hi, I have a Windows server 2012 R2 installation on a Dell R710 with an Intel AF DA nic in an x8 slot.
    I also have an OmniOS installation on a Supermicro X8DTH with an Intel X520-DA2.

    These devices are directly connected on a 10.10.10.0/24 subnet using Twinax. Both are configured with large packets [MTU 9014 (windows) and MTU 9000 (OmniOS)]

    Performing an iperf3 (iperf -c 10.10.10.10 -i 1 -t 10) using Windows as the server:
    Code:
    [ ID] Interval           Transfer     Bandwidth
    [  5]   0.00-1.00   sec   462 MBytes  3.87 Gbits/sec
    [  5]   1.00-2.00   sec   471 MBytes  3.95 Gbits/sec
    [  5]   2.00-3.00   sec   432 MBytes  3.62 Gbits/sec
    [  5]   3.00-4.00   sec   465 MBytes  3.90 Gbits/sec
    [  5]   4.00-5.00   sec   455 MBytes  3.82 Gbits/sec
    [  5]   5.00-6.00   sec   426 MBytes  3.58 Gbits/sec
    [  5]   6.00-7.00   sec   442 MBytes  3.70 Gbits/sec
    [  5]   7.00-8.00   sec   424 MBytes  3.56 Gbits/sec
    [  5]   8.00-9.00   sec   424 MBytes  3.55 Gbits/sec
    [  5]   9.00-10.00  sec   462 MBytes  3.87 Gbits/sec
    [  5]  10.00-10.01  sec  4.02 MBytes  4.22 Gbits/sec
    Performing an iperf3 using OmniOS as the server:
    Code:
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.00   sec   619 MBytes  5.19 Gbits/sec
    [  4]   1.00-2.00   sec   679 MBytes  5.70 Gbits/sec
    [  4]   2.00-3.00   sec   672 MBytes  5.63 Gbits/sec
    [  4]   3.00-4.00   sec   671 MBytes  5.63 Gbits/sec
    [  4]   4.00-5.00   sec   677 MBytes  5.68 Gbits/sec
    [  4]   5.00-6.00   sec   671 MBytes  5.63 Gbits/sec
    [  4]   6.00-7.00   sec   674 MBytes  5.65 Gbits/sec
    [  4]   7.00-8.00   sec   672 MBytes  5.64 Gbits/sec
    [  4]   8.00-9.00   sec   685 MBytes  5.74 Gbits/sec
    [  4]   9.00-10.00  sec   655 MBytes  5.49 Gbits/sec
    Can someone please explain why I'm seeing considerably less than 10GB performance and why there's so much difference between using Windows and OmniOS as the sender/receiver?

    Thanks in advance for any advice/suggestions!
    -M
     
    #1
  2. ttabbal

    ttabbal Active Member

    Joined:
    Mar 10, 2016
    Messages:
    723
    Likes Received:
    193
    Try without the large MTU. Some systems don't like that. And shouldn't they be the same?
     
    #2
  3. azev

    azev Active Member

    Joined:
    Jan 18, 2013
    Messages:
    613
    Likes Received:
    155
    so I am having similar result with different motherboard with the same setup (dual IOH) To be honest right now I am pointing the issue to slow PCI-E bandwidth on this mobo, and planning to upgrade the system to a sandy bridge setup.

    Anyway you can try running the test multi threaded with -P 8 for example and see if that gets you better result.

    I was able to get better result by putting the CPU and pin point the numa mode configs on the nic to the 1st cpu.
    However the total bandwidth I can get per IOH chip seems to be limited to only 10Gbps.
     
    #3
  4. manxam

    manxam Active Member

    Joined:
    Jul 25, 2015
    Messages:
    226
    Likes Received:
    47
    Thanks guys. Changing the MTU to 1500 on both made no difference (@ttabbal: Windows includes the header size in the MTU so makes it 9014 whereas most *nix OS' don't include the header when setting MTU size so, therefore, it's 9000 -- both are the same true value AFAIK)

    Adding the -P 8 using the OmniOS box (X8DTH, single E5530 + 32GB + X520) as the server nets me 9.18Gb/s as [SUM]. Using the Windows box (R710, dual E5570 + 64GB + AF DA) as the server nets me 5.34Gb/s as the [SUM].

    Both servers only increase CPU slightly under testing so they're not CPU bound.

    I'm thoroughly confused...
     
    #4
  5. ttabbal

    ttabbal Active Member

    Joined:
    Mar 10, 2016
    Messages:
    723
    Likes Received:
    193
    Very strange. I get 9.7 on iperf between freenas and linux with default settings and 1500 MTU. maybe try running client and server locally? It might expose an issue with the IP stack.

    I don't have a Windows Machine here to test with. Maybe try booting a live Linux and trying it against SmartOS? If it works well, you at least know it's a software problem and on which end.

    I'd offer to test more, but I've discovered that I managed to damage the fiber installing it. sigh....
     
    #5
    Last edited: Apr 22, 2016
  6. manxam

    manxam Active Member

    Joined:
    Jul 25, 2015
    Messages:
    226
    Likes Received:
    47
    @ttabbal: Your suggestion is a good idea. As they're dual port cards I'll just loop each NIC in each server and perform an iperf between ports. This way (I hope) I'll be able to determine if it's server-to-server or a specific card / server that's the issue.

    Expect to hear back from me with.. "wtf is going on??" :)
     
    #6
  7. manxam

    manxam Active Member

    Joined:
    Jul 25, 2015
    Messages:
    226
    Likes Received:
    47
    And here it is.. Strangely I'm seeing the following on the OmniOS X520-DA2 box:
    Code:
    [SUM]   0.00-10.00  sec  33.3 GBytes  28.6 Gbits/sec                  sender
    [SUM]   0.00-10.00  sec  33.3 GBytes  28.6 Gbits/sec                  receiver
    And on the R710 AF DA box:
    Code:
    [ SUM]   0.00-10.00  sec   462 MBytes  3.87 Gbits/sec
    So, why am I seeing almost 30Gbps on a looped 10GB card on one box and 3.87Gbps on another?
    Having said that, it's obvious that the R710 with the AF DA card is the bottleneck but I'm not certain why.
    I've tried with the in-built 2012 R2 drivers as well as the most recent Intel drivers and the results are the same.
     
    #7
  8. ttabbal

    ttabbal Active Member

    Joined:
    Mar 10, 2016
    Messages:
    723
    Likes Received:
    193
    I suspect the 30gbps is the card or OS bypassing the physical layer. So you're basically testing DMA speed. Why the other machine is getting such poor speeds is a good question. You've tried drivers, maybe try swapping NICS?

    Could the card be in a slot with low speed? 1x electrical?
     
    #8
  9. azev

    azev Active Member

    Joined:
    Jan 18, 2013
    Messages:
    613
    Likes Received:
    155
    for some reason running iperf server/client locally on the same windows box I am getting similar result, where I am getting full saturation 10gb on multiple nic running concurrently.
     
    #9
  10. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,244
    Likes Received:
    743
    The OmniOS default tcp settings are optimized for 1G and low RAM usage.
    Have you increased buffer settings ex to

    max_buf=4097152 tcp
    send_buf=2048576 tcp
    recv_buf=2048576 tcp

    I set this as a part of my default tuning options in napp-it what gives me up to
    1000 MB/s not only on ip but SMB service level.

    On Windows side, disabling int_throtteling is a very performance sensitive setting.
    http://napp-it.de/doc/downloads/performance_smb2.pdf
     
    #10
  11. manxam

    manxam Active Member

    Joined:
    Jul 25, 2015
    Messages:
    226
    Likes Received:
    47
    Thank you all! After changing the settings as per @gea I'm getting "acceptable" performance. 10gb -> OmniOS and 5gb -> Windows. I'm going to tack the low performance of the 10GB on Windows to the older AF DA card. I've confirmed it's in an x8 slot but I can't determine any limitations. Disabling interrupt modulation increased speed by a little over 1Gb/s which, while it's less than 10GB, is faster than my disk subsystem can handle.

    I suspect i'll just order another X520 in the near future...
     
    #11
  12. james23

    james23 Active Member

    Joined:
    Nov 18, 2014
    Messages:
    401
    Likes Received:
    67
    (below is all iperf2 and iperf3 #'s)
    im battling this same issue, exclusive to Win hosts (2012r2) (guests on esxi 6.5u2 as well as bare metal win 2012r2).

    no probs on the same host when using ubuntu guest, getting 9.7gbit to a baremetal FN 11.2 server. but on anything win2012r2 i cant get above 4.5-6gbit.

    one change that gave me a 1 gbit boost, was to click uninstall on the "QoS Packet Scheduler" (on network adaptor -> properties). (this got me to ~6-6.5gbit in either direction).

    mtu 1500 or 9000 does not affect me (and verified mtu 9k via dont frag -s 8972 pings). using multiple streams -P 4 gets me up to ~9gbit on win, but single stream is what im after (as iscsi and smb transfers max at ~550 mb/s from ARC or all flash).
     
    #12
  13. manxam

    manxam Active Member

    Joined:
    Jul 25, 2015
    Messages:
    226
    Likes Received:
    47
    #13
  14. james23

    james23 Active Member

    Joined:
    Nov 18, 2014
    Messages:
    401
    Likes Received:
    67
    Thanks, should i do this for a vmware windows guest? (that is what im using is esxi 6.5 guests, connecting to a baremetal FN host).
    FN box has a chelsio t520 . the vmware host (w win VMs) has both a chelsio t520 and the 2x onboard x540 intel 10g nics (was seeing same ~5gbit speeds with both)

    I can update with this, i randomly tried a windows 10 guest vm, and HUGE performance boost (for SMB mainly). now when using SMB to send or RX from the FN box, i get close to maxing 10gbit (and im watching to be sure im not using arc). this is ofcourse with sync=disabled and a raid0 / stripe of 4x HGST sas3 ssds. but i needed to see this type of speed is possible, before moving forward. So a big part of this is win 10 / server 2016 is much more optimized than 2012r2 when it comes to 10gbit. Also w win10 MTU 9k vs 1500 is still good for ~ 5% improvement, but clearly it was an OS or windows issue mainly. (not even a esxi issue, as my baremetal 2012r2 system wouldnt go much above 5gbit w smb).

    ( above is with the normal vmware vmx3 driver , when using sriov passthrough for the nic, performance was about the same).

    intersting thing, even with win 10 , iperf 2 and 3 to the FN box is still ~ 5gbit (unless i use -P 6 , just like is case on 2012r2 guests)
    tks
     
    #14
Similar Threads: 10GB speeds
Forum Title Date
Networking Can't get 10Gbit speeds on Intel X550-T1 Jul 3, 2018
Networking Not getting 10gbe Speeds Jul 14, 2017
Networking Help Needed with 10GbE speeds. May 3, 2017
Networking New 10Gbe install, wildly different speeds on similar hardware Feb 4, 2017
Networking The usual noob question(s) about cheap 10gbe again Today at 12:36 PM

Share This Page