[SOLVED]Mellanox ConnectX 3 can't get 40G only 10G

Discussion in 'Networking' started by BackupProphet, Nov 16, 2018.

  1. 40gorbust

    40gorbust New Member

    Joined:
    Jan 12, 2019
    Messages:
    25
    Likes Received:
    2
    slow-scp.png


    But .. real world kicks in and then we see 50 MB/sec download and upload over the 40G link. :eek:

    Back to the drawing board.
     
    #41
  2. RageBone

    RageBone Active Member

    Joined:
    Jul 11, 2017
    Messages:
    230
    Likes Received:
    45
    Interesting!

    My little home setup is able to hit about 37gbit with 4 iperf connections pretty consistently.
    E5 1650v3 on the client side and a e5 2628L V4 12c, 1.7GHZ on the Nas / server side.
    And all that over a 10m QDR DAC cable with everything in ETH mode.

    Though, I couldn't test "realworld" since the fastest storage currently installed are 2wd reds. Freenas caching goes up to 800MB until it has to write that off to disk, lol.

    As far as I have read, the HPC people seem to put two cards into their machines, one for each CPU to not have to use the QPI link between CPUs.
    So having it on CPU 1 seems like a good idea. Another option might be to set on which CPU the software for the nic runs, but I have no clue how that would be done, or if it is possible. It should I think.

    Gona have to take a look at my setup this evening.
     
    #42
  3. 40gorbust

    40gorbust New Member

    Joined:
    Jan 12, 2019
    Messages:
    25
    Likes Received:
    2
    I think the newer CPUs are more efficient with I/O than the older e5-2450s v1 we use. On the other hand we paid less than $100 each second hand so I'm not complaining. I saw 37 Gbit so I'm far from complaining.

    Now we're trying to understand why we see 50 MB/sec from ramdisk to ramdisk SCP copy over a 40 Gbit connection :)

    Don't forget Linux caches writes (or can) and definitely caches reads. So if you copy the same file twice, the second time your sending machine should send at max speed and if the file is less than your cache (e.g. < 800 MB) it should transfer at linespeed, right?
     
    #43
  4. i386

    i386 Well-Known Member

    Joined:
    Mar 18, 2016
    Messages:
    1,659
    Likes Received:
    400
    Have you tried other protocolls? I read somewhere that scp has static flow control buffers that could become bottlenecks.
     
    #44
  5. RageBone

    RageBone Active Member

    Joined:
    Jul 11, 2017
    Messages:
    230
    Likes Received:
    45
    I think there were some issues with SCP itself, at least i think i remember that there were multiple threads on the Mellanox Forum and here and a few other places where SCP was poorly performing.
    Of cause i could be wrong, fricken reuse of acronyms and way to many protocols, and me simply remembering wrong : )
    Sadly i can't remember that there was a clear and easy fix though.

    Yes, Linux and my Freenas do cache, that's why i got 800MB/s into the cache, with long pauses in between when it wrote stuff off to disk.
    Protocol was SMB, hopefully SMB-Direct but there isn't a way to know that for on linux, at least that i know.
    SMB3_02 if i'm correct.


    I do have a Sandy-Bridge rig at hand, so if i find the time, i can try to replicate your situation on that platform.
    2620V1s or 2648L V1s are available.
     
    #45
  6. EffrafaxOfWug

    EffrafaxOfWug Radioactive Member

    Joined:
    Feb 12, 2015
    Messages:
    1,062
    Likes Received:
    353
    Yup I've seen scp give terrible throughput performance on all sorts of connections (even when not bottlenecked by CPU). Are you able to use vanilla protocols or plain iperf for testing? Failing that, dd or cp over an ssh tunnel should still perform better than scp does.
     
    #46
  7. 40gorbust

    40gorbust New Member

    Joined:
    Jan 12, 2019
    Messages:
    25
    Likes Received:
    2
    To reply to you all regarding scp (the tool) ; we were happy to get 37 Gbit at the end of the day with iPerf, learned around 20:00 (on Friday) that the speed of SCP was just 50 MB/sec from server to server with just a DAC between them, so no switch, from ramdisk to ramdisk, so we called it a day and left that task for Monday.

    I read now more than once that scp isn't the best to test the maximum single connection speed. We'll test with wget, probably setup a samba daemon on one server and a client on the other and one of the goals ; an iSCSI target and initiator. To avoid the disk to be of any influence we'll try to create an iSCSI volume on a ramdisk. Even as small as 10 GBytes should be fine to do some basic testing.

    Questions ;

    1. the cable and mode is 40 Gbit with "4 x QSFP" ; I know the ConnectX-3 card can do 56 "FDR" ; does FDR need a different cable ? We have a fairly simple "QSFP+ 40 Gbit" cable with no useful markings and just know it can handle 37 Gbit.

    2. the PDF manual of the CX354A-FCBT mentions FDR and 56 Gbit but when we tried to test and the mode was set to IB the maximum connection rate we ever saw was 40 Gbit and 4xQSFP speed and not FDR or 56 Gbit

    3. After ordering the 2 ConnectX-3 cards I found a $200 Mellanox 36 port 40G Infiniband switch IS5024 (see attached PDF) on Ebay and preemptively ordered it, trusting that "it would work". Will this switch work with the cards in ETH mode, or should the cards be in IB mode and then use "ipoib" with the corresponding much lower transfer rates (best we ever saw was 19 Gbit) ? I need to wait up to 3 more weeks before I can play with it (has to cross half the world to get here) so am hoping for some positive experiences of others with (Mellanox Infiniband) switches (I know it's going to be loud but because it's a lab-setup we'll probably remove the small 40mm fans and drill some holes and add some ordinary 12cm fans under the cover plate ; all in the name of science!

    4. What would be a good end to end tool to test real-world file/data transfer between two 40 Gbit equipped servers, besides copying files inside an SSH tunnel which is a bit overkill since it's going to be hard to snoop on the cable connection between the 2 servers. Of course we can try to dig up FTP from it's grave, I expect the SMB protocol to fail miserable at 40 GBit transfers and NFS we never needed. iSCSI is the goal so we'll set it up but takes a bit more time than a simple command-line iperf or scp line.

    5. Does anyone have good suggestions for a specific type or family of (Mellanox) cards that are easy to use, work well and compatible switches? We're talking in the 3 digit investments here, I got the dual port 40 GBit card for $130 which was on the upper end of what I wanted to pay for an uncertain experiment, especially considering the fact we have a dozen 10 Gbit cards that work FINE at exactly 10 Gbit speeds with cheap DAC cables and our SFP+ switch and require no configuration at all and each costed around $25. If our servers weren't shy of PCIe slots I'd have tried to put 4 x 10 Gbit in a server and use Linux ethernet bonding (which works fine on 1 Gbit links) ; easy peasy and very flexible. Just a mess with cables.

    Cheers!
     

    Attached Files:

    #47
  8. RageBone

    RageBone Active Member

    Joined:
    Jul 11, 2017
    Messages:
    230
    Likes Received:
    45
    1: I think 40GbE is spec, and 56GbE only in combination with a Mellanox Eth Switch. I hope not though, no experience.
    Except my setup but that is a QDR Cable, so i don't complain about less then 40Gbit/s.

    4: in Theory, SMB-Direct. Should be in the current SAMBA versions, but since my setup talks SMB3_02 and Windows10 with Direct needs 3_11, i'm not sure SMB Direct works as it should. The Samba part is also still in BETA as far as i know. Freenas has experimental support for it though, in case that's of interest.

    So, in the Future, it'll hopefully be SMB-Direct.

    Currently, it's probably iscsi with RDMA aka iSER. iSER supports Infiniband and i think eth, havn't tryed that though. iSer over IB was pretty awesome.
    With targetctl and iscsiadm, it is actually rather easy to create Ramdisks and share or connect to those.

    NFS can also do RDMA, but no experience with that.

    5: OEM HP / SUN "CX354A-FCBT"
    There are a few threads on those. Sometimes, they can be had for about 30$. There are a few threads on here about those. Give em a read.

    I'd advise against CX2 cards, since SMD Direct support is not a thing for those.
     
    #48
  9. i386

    i386 Well-Known Member

    Joined:
    Mar 18, 2016
    Messages:
    1,659
    Likes Received:
    400
    Yes, it requires qsfp14 cables, "normal" 40 gbit ethernet is using qsfp+. Qsfp14 transceiver build into dacs use a higher clock than the qsfp+ cables.
     
    #49
  10. 40gorbust

    40gorbust New Member

    Joined:
    Jan 12, 2019
    Messages:
    25
    Likes Received:
    2
    Would normal ipoib benefit from a 56G link vs a 40G link or would it max out at the regular "lower" speed e.g. 19/20/21 Gbit ?
     
    #50
  11. RageBone

    RageBone Active Member

    Joined:
    Jul 11, 2017
    Messages:
    230
    Likes Received:
    45
    iSer uses, but will not be limited by ipoib speed because of RDMA
     
    #51
  12. i386

    i386 Well-Known Member

    Joined:
    Mar 18, 2016
    Messages:
    1,659
    Likes Received:
    400
    Yes, if your application can max out 40gbit/s
     
    #52
  13. 40gorbust

    40gorbust New Member

    Joined:
    Jan 12, 2019
    Messages:
    25
    Likes Received:
    2
    #53
Similar Threads: [SOLVED]Mellanox ConnectX
Forum Title Date
Networking Another ConnectX-3 cross-flashing question Sep 26, 2019
Networking MNPA19-XTR Connectx-2 / Windows 10 - Slow LAN download speed Sep 10, 2019
Networking Mellanox ConnectX-3 Low File Transfer Speed Sep 9, 2019
Networking Mellanox ConnectX-3 VPI Twinax compatibility with Cisco Nexus 3064 Aug 14, 2019
Networking Mellanox ConnectX-2 VPI Aug 13, 2019

Share This Page