[SOLVED]Mellanox ConnectX 3 can't get 40G only 10G

Discussion in 'Networking' started by BackupProphet, Nov 16, 2018.

  1. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    219
    Likes Received:
    37
    YEAHHH BUDDDYYY ... that's the money shot right there! ESXI <=> ESXi via direct 40G = iperf shows 33G :):):)

    Curiously though I can't get above 14G FreeNAS <=> FreeNAS on those same hosts.


    I picked up ~+1G on the 10G port (FreeNAS to FreeNAS), so I'm going to assume that was the result of properly enabling jumbo frames. mtu was set to 9000 in ESXi and FreeNAS, globally enabled on the ICX-6450 (i.e. "jumbo"), BUT NOT on the vlan interface. The following command did the trick and now a Plex jail works again (yay for not getting yelled at any longer).

    Code:
    interface ve200
    ip mtu 9000
    Prior testing didn't include host to host iperf, but since FreeNAS to FreeNAS nudged closer to 10G on the other port (which I would attribute to the config change), this much, much improved result doesn't touch the switch ... curious o_O

    Code:
    [root@ESXi-01:/opt/iperf/bin] ./iperf -c 10.2.0.42 -w 1M -P 8
    ------------------------------------------------------------
    Client connecting to 10.2.0.42, TCP port 5001
    TCP window size: 1.01 MByte (WARNING: requested 1.00 MByte)
    ------------------------------------------------------------
    [ 10] local 10.2.0.41 port 51426 connected with 10.2.0.42 port 5001
    [  8] local 10.2.0.41 port 21429 connected with 10.2.0.42 port 5001
    [  7] local 10.2.0.41 port 28275 connected with 10.2.0.42 port 5001
    [  6] local 10.2.0.41 port 10155 connected with 10.2.0.42 port 5001
    [ 11] local 10.2.0.41 port 48648 connected with 10.2.0.42 port 5001
    [  5] local 10.2.0.41 port 62808 connected with 10.2.0.42 port 5001
    [  4] local 10.2.0.41 port 19689 connected with 10.2.0.42 port 5001
    [  3] local 10.2.0.41 port 57074 connected with 10.2.0.42 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [ 10]  0.0-10.0 sec  4.82 GBytes  4.14 Gbits/sec
    [  8]  0.0-10.0 sec  4.80 GBytes  4.13 Gbits/sec
    [  7]  0.0-10.0 sec  4.85 GBytes  4.17 Gbits/sec
    [  6]  0.0-10.0 sec  4.83 GBytes  4.15 Gbits/sec
    [ 11]  0.0-10.0 sec  4.81 GBytes  4.14 Gbits/sec
    [  5]  0.0-10.0 sec  4.81 GBytes  4.13 Gbits/sec
    [  4]  0.0-10.0 sec  4.87 GBytes  4.18 Gbits/sec
    [  3]  0.0-10.0 sec  4.85 GBytes  4.17 Gbits/sec
    [SUM]  0.0-10.0 sec  38.6 GBytes  33.2 Gbits/sec
     
    #21
    fohdeesha and RageBone like this.
  2. 40gorbust

    40gorbust New Member

    Joined:
    Saturday
    Messages:
    6
    Likes Received:
    0
    Gents, after playing for a whole day with a Mellanox CX354A-FCBT I learned a ton but got stuck at having iperf perform at exactly 10-11 Gbit.
    No matter how many threads (-P <number>) it keeps doing that number. Between two dual Xeon E5s, not the fastest on the block, but not loaded at all. TCP/IP works, ping works, and iperf of course, filetransfers are stuck at EXACTLY 133 MB/sec even after many repeated SCP copies between the 2 machines.

    The card reports it has a 40 Gbit link. We tried to change to ETH, then changed back to VPI, obviously the cards are working, using a QSFP+ 3 meter DAC cable (sold as 40 Gbit QDR/FDR).

    So while the link is 40 Gbit, the speed is super stable (can run many minutes and always see 10-11Gbit/sec with iperf) we're stuck why it doesn't do more. We're just running plain Linux, no VMware or something, no hypervisors, nothing going on on the servers. Cards are in Gen.2 and Gen.3 x8 electrical slots (so nothing limited by sharing with a chipset or actually just a x4 slot) so that cannot explain why it can't go higher than 10-11 Gbit.

    We have several older Mellanox 10 Gbit SFP+ cards and those have been running fine for near 2 years now to a normal 1 Gbit + 10 Gbit SFP+ switch.

    Am I doing something wrong ? The cards have a FW 2.42.5000 out of the box, we didn't have to flash them or something. We turned servers off, connected the cable in various orders (first cable, then turned on servers, first turned on servers, then connected cable etc).

    One server is an Intel S2600CP with two E5-2650's (IIRC) and the other an Intel SC5520HC with two X5660's.
     
    #22
  3. i386

    i386 Well-Known Member

    Joined:
    Mar 18, 2016
    Messages:
    1,489
    Likes Received:
    337
    What OS do you use? My guess is that you use iperf3 and windows. (Ipferf3 doesn't work that well on windows platforms)
    If you have windows only hosts you could try other benchmarks (I prefer ntttcp).

    Did you get 133mbyte/s with the sfp cards too?
    How did you measure the 133 mbyte/s?
    How does your storage look on the sending and receiving host?
     
    #23
  4. 40gorbust

    40gorbust New Member

    Joined:
    Saturday
    Messages:
    6
    Likes Received:
    0
    OS: Linux to Linux
    SFP+ cards ; those get 100-120 MB/sec but they're not connected in any way to the new 40G cards
    Measure ; SCP gave a nice report after sending a file (single) as big as a few GB (2-4)
    Storage ; we created 10 GB ramdisks (tempfs) on both servers to eliminate storage bottlenecks
    Creating a new file on the ramdisk takes literary a second for multiple GBs.

    I wonder if you can only get 40G with a minimum of 4 threads / parallel file tranfers or if it is possible to get close to the 40Gbit speed with a single filetransfer in a single thread ? Thanks for your time!
     
    #24
  5. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    219
    Likes Received:
    37
    For clarity (having just gone through something similar), "threads" and "parallel" are not interchangeable here.
    • iperf or iperf3? From my understanding using the -P switch does not multi-thread, at least with iperf3 and as confirmed here:
    • iperf3 at 40Gbps and above
    Code:
    iperf -h   -P, --parallel  #        number of parallel client threads to run
    iperf3 -h  -P, --parallel  #        number of parallel client streams to run
     
    #25
  6. 40gorbust

    40gorbust New Member

    Joined:
    Saturday
    Messages:
    6
    Likes Received:
    0
    Ok thanks, will keep an eye on that. I think we used iperf (servers in the office, now it's weekend ; time to relax... ehh search forums for answers), not iperf3.

    Should a 40G/56G card be able to transfer near it's maximum of 40/56 Gbit between two servers over a DAC cable on one thread/connection? So nothing parallel?
     
    #26
  7. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    219
    Likes Received:
    37
    • A wise man once told me, "anything is possible" ;) but at the default window size and -P 1 (as is also the default), I'd be impressed if so.
      • You will likely want to increase that window size (-w 1M for example) and -P (-P 4 for example) to find the highest bitrate.
    • I can only speak from personal experience, and limited experience relative to others here, but:
      • PCIe v2 x 8 = 32 Gbps, so no you won't hit the theoretical max of that pipe using that.
      • Also the base clock on the E5-2650 v1 = 2.0 GHz, which may not be fast enough to max it out.
        • Nearly the best I was able to hit using an E5-2680 v2 & E5-2690 v2 was posted immediately before your reply (33 Gbps).
        • To be fair, my result may not be CPU bound as that test was before I did a fair bit of tuning / learning, but the fact that it was performed between two hosts running ESXi 6.7, where MLNX's OFED isn't supported, may have hurt the results. Its quite possible testing between two Linux clients (as you are) could result in 40 Gbps (I'd have to defer to other more knowledgeable members here). But similar to you that test was via DAC and not over a switch.
    • For me personally, 40 Gbps is well beyond my actual bottleneck so I was happy to easily achieve near line rate for port 1 @ 10 Gbps (switched via SFP+) and port 2 @ 20 - 30 Gbps (direct via QSFP DAC, reserved for FreeNAS <=> FreeNAS replication).
    Code:
      PCIe Per lane (each direction):
         v1.x:  250 MB/s ( 2.5 GT/s)
         v2.x:  500 MB/s ( 5 GT/s)
         v3.0:  985 MB/s ( 8 GT/s)
         v4.0: 1969 MB/s (16 GT/s)
    Code:
    PCIe v2 x8  =  4   GB/s ( 40 GT/s)
    PCIe v2 x16 =  8   GB/s ( 80 GT/s) 
     
    #27
  8. 40gorbust

    40gorbust New Member

    Joined:
    Saturday
    Messages:
    6
    Likes Received:
    0
    Thanks, even x4 speed on gen.2 should be already 2GB/sec so enough to hit 20 GBit or more. I'm aware the CPUs aren't the fastest in the world and I'll play a bit with affinity to try to run the driver on the second CPU and now I think of it I am not sure if the card is in a slot tied to CPU 1 (out of 2) so maybe the driver is running on the other CPU and the card in a slot handled by the other CPU.

    There a lot of things we're still not sure about. Why VPI or ETH (with a DAC), does the speed of a DAC cable degrade over distance (because of errors/retransmissions or higher latency compared to fiber ; think xDSL lines that become less performant over distance).

    I bought a switch on Ebay (Mellanox MIS5024 , unmanaged 36 ports 40Gbit for $200) and that will be the final setup so even if a server to server connection can't be optimal, I hope a connection through that switch will be 30+ Gbit. I'll know in a week or 3 or so.

    I found out later, after all the test the 2 ports on the CX-354A actually are not 100% the same regarding VPI and ETH and auto-sensing. Will pay attention to that as well. Maybe it helps.

    So the brief question is ; what is the main difference between VPI and ETH mode if you have a DAC cable and later if you have an Infiniband (only) switch ; does that require the card to be in VPI (IB) mode because the switch doesn't support ETH mode ?

    Sorry for the noob questions. I've been dealing with networks for 20 years but Infiniband is new to me :)

    I bought the cards because they were cheap for me ( $135 per piece ) and the DAC cable was just $21 new and the switch $200 which sounded like a very good deal at $5.50 per 40 Gbit port :)
     
    #28
  9. svtkobra7

    svtkobra7 Active Member

    Joined:
    Jan 2, 2017
    Messages:
    219
    Likes Received:
    37
    • 2 GB/s (or Gigabytes / sec) ≠ 20 Gbps (or Gigabits per second), i.e 2 GB/s ÷ 0.125 = 16 Gbps < 20 Gbps.
    • (One of us needs to check our math, I hope its me).
    • Essentially you are asking why is fiber superior to copper over distance?
      • Frequency = (1) Light > copper generally, and (2) via copper, inductance and capacitance increase over distance, reducing frequency.
      • Fiber is not impacted by EMI.
      • But this question is out of scope (educational in nature), since you have a 3M DAC.
    As to the rest I'll leave it to someone better educated (I'm a noob as well, no IT background here). I'm not educated on IB whatsoever (barely more on 40G ETH), but I want to say it only has advantages at bitrates much higher than you are seeing at the moment, and further want to say the consensus is such that the additional benefit isn't worth the additional complexity (but that may be confirmation bias at play). After burning the firmware, I set the cards to ETH (2) and haven't looked back =>
    Code:
    ./mlxconfig -d [mt4099_pci_cr0] set LINK_TYPE_P1=2 LINK_TYPE_P2=2
    , where [mt4099_pci_cr0] = the pci device name returned by
    Code:
    ./mlxfwmanager --query
    but I'm "playing" with a limited tool kit and can't even use OFED, etc. (limitations you aren't subject to).
     
    #29
  10. i386

    i386 Well-Known Member

    Joined:
    Mar 18, 2016
    Messages:
    1,489
    Likes Received:
    337
    I'm pretty sure that the cards can do that and that the question is if the operating system and the applications can handle a 40gbit link with a single thread.

    2 cores @ 2,2 ghz should be able to saturate a 40gbit/s link (pentium d 1508):
    Unbenannt.JPG
     
    #30
    40gorbust likes this.
  11. 40gorbust

    40gorbust New Member

    Joined:
    Saturday
    Messages:
    6
    Likes Received:
    0
    Ok you got me on the 2GB == 20 Gbit which is correct by the factor 1.25. I was just lazy doing 1:10 thinking of a bit of overhead and such.

    About VPI (IB?) vs ETH I'm still a bit unclear ; which one is better for the card, besides the fact it seems the card can do 56 Gbit in VPI(IB?) mode vs 40G max in ETH mode.

    Also someone wrote that "ipoib" emulation would yield lower results. Basically I just want to be able to transfer files quickly from one server to another ; one will be a workhorse with plenty of GPUs for Machine Learning and the other would be a fileserver with a lot of HBAs and probably PCIe cards with NVME SSDs, a tape-drive for backups (LTO5) and a bunch of normal HDDs for medium-term cheap storage.

    I'm just used to using TCP/IP to connect servers together (ignoring IPX from back in the days and such) so I don't "need" eth(ernet), I'm fine if I can ping over VPI / IB no matter what is needed for that, with low CPU usage if possible and high throughput.

    If this experiment with the 40G cards works I might try to get my hands on a 100G Mellanox card but that one was around $300 so a bit expensive with 2 for just an experiment without knowing if it would work. Forums are great. I loved reading this thread and learning so much in just an hour.

    Are there Eth(ernet) 40Gb switches with QSFP+ connectors requiring the alternative ETH mode on the Mellanox card ? What does everyone think of the VPI/IB mode vs ETH mode ; which one is 'better' or 'faster' or 'more optimized' ? Sorry for the noob questions but I learned that it's better to ask questions than to pretend you're smart but aren't :)
     
    #31
  12. 40gorbust

    40gorbust New Member

    Joined:
    Saturday
    Messages:
    6
    Likes Received:
    0
    Impressive speeds for that CPU (usage). This also helps me to not give up if we cannot reach (near) 40 Gbit on these 2 servers. Danke!
     
    #32
Similar Threads: [SOLVED]Mellanox ConnectX
Forum Title Date
Networking Help with old hp connectx cx4 style cards Jan 9, 2019
Networking Mellanox ConnectX-2 QSFP to SFP+ ? Jan 9, 2019
Networking [solved] Quanta LB6M connection to Mellanox ConnectX-4 /5 Nov 19, 2018
Networking Mellanox ConnectX-2, Wake On Lan for Windows Desktops? Nov 5, 2018
Networking Mellanox ConnectX-3 - HP 649281-B21 does not bring up link with QSFP+-Transceiver Nov 4, 2018

Share This Page