[SOLVED]Mellanox ConnectX 3 can't get 40G only 10G

40gorbust

New Member
Jan 12, 2019
25
2
3
slow-scp.png


But .. real world kicks in and then we see 50 MB/sec download and upload over the 40G link. :eek:

Back to the drawing board.
 

RageBone

Active Member
Jul 11, 2017
428
109
43
Interesting!

My little home setup is able to hit about 37gbit with 4 iperf connections pretty consistently.
E5 1650v3 on the client side and a e5 2628L V4 12c, 1.7GHZ on the Nas / server side.
And all that over a 10m QDR DAC cable with everything in ETH mode.

Though, I couldn't test "realworld" since the fastest storage currently installed are 2wd reds. Freenas caching goes up to 800MB until it has to write that off to disk, lol.

As far as I have read, the HPC people seem to put two cards into their machines, one for each CPU to not have to use the QPI link between CPUs.
So having it on CPU 1 seems like a good idea. Another option might be to set on which CPU the software for the nic runs, but I have no clue how that would be done, or if it is possible. It should I think.

Gona have to take a look at my setup this evening.
 

40gorbust

New Member
Jan 12, 2019
25
2
3
Interesting!

My little home setup is able to hit about 37gbit with 4 iperf connections pretty consistently.
E5 1650v3 on the client side and a e5 2628L V4 12c, 1.7GHZ on the Nas / server side.
And all that over a 10m QDR DAC cable with everything in ETH mode.

Though, I couldn't test "realworld" since the fastest storage currently installed are 2wd reds. Freenas caching goes up to 800MB until it has to write that off to disk, lol.

As far as I have read, the HPC people seem to put two cards into their machines, one for each CPU to not have to use the QPI link between CPUs.
So having it on CPU 1 seems like a good idea. Another option might be to set on which CPU the software for the nic runs, but I have no clue how that would be done, or if it is possible. It should I think.

Gona have to take a look at my setup this evening.
I think the newer CPUs are more efficient with I/O than the older e5-2450s v1 we use. On the other hand we paid less than $100 each second hand so I'm not complaining. I saw 37 Gbit so I'm far from complaining.

Now we're trying to understand why we see 50 MB/sec from ramdisk to ramdisk SCP copy over a 40 Gbit connection :)

Don't forget Linux caches writes (or can) and definitely caches reads. So if you copy the same file twice, the second time your sending machine should send at max speed and if the file is less than your cache (e.g. < 800 MB) it should transfer at linespeed, right?
 

i386

Well-Known Member
Mar 18, 2016
2,658
774
113
32
Germany
Now we're trying to understand why we see 50 MB/sec from ramdisk to ramdisk SCP copy over a 40 Gbit connection :)
Have you tried other protocolls? I read somewhere that scp has static flow control buffers that could become bottlenecks.
 

RageBone

Active Member
Jul 11, 2017
428
109
43
I think there were some issues with SCP itself, at least i think i remember that there were multiple threads on the Mellanox Forum and here and a few other places where SCP was poorly performing.
Of cause i could be wrong, fricken reuse of acronyms and way to many protocols, and me simply remembering wrong : )
Sadly i can't remember that there was a clear and easy fix though.

Yes, Linux and my Freenas do cache, that's why i got 800MB/s into the cache, with long pauses in between when it wrote stuff off to disk.
Protocol was SMB, hopefully SMB-Direct but there isn't a way to know that for on linux, at least that i know.
SMB3_02 if i'm correct.


I do have a Sandy-Bridge rig at hand, so if i find the time, i can try to replicate your situation on that platform.
2620V1s or 2648L V1s are available.
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,395
499
83
Yup I've seen scp give terrible throughput performance on all sorts of connections (even when not bottlenecked by CPU). Are you able to use vanilla protocols or plain iperf for testing? Failing that, dd or cp over an ssh tunnel should still perform better than scp does.
 

40gorbust

New Member
Jan 12, 2019
25
2
3
To reply to you all regarding scp (the tool) ; we were happy to get 37 Gbit at the end of the day with iPerf, learned around 20:00 (on Friday) that the speed of SCP was just 50 MB/sec from server to server with just a DAC between them, so no switch, from ramdisk to ramdisk, so we called it a day and left that task for Monday.

I read now more than once that scp isn't the best to test the maximum single connection speed. We'll test with wget, probably setup a samba daemon on one server and a client on the other and one of the goals ; an iSCSI target and initiator. To avoid the disk to be of any influence we'll try to create an iSCSI volume on a ramdisk. Even as small as 10 GBytes should be fine to do some basic testing.

Questions ;

1. the cable and mode is 40 Gbit with "4 x QSFP" ; I know the ConnectX-3 card can do 56 "FDR" ; does FDR need a different cable ? We have a fairly simple "QSFP+ 40 Gbit" cable with no useful markings and just know it can handle 37 Gbit.

2. the PDF manual of the CX354A-FCBT mentions FDR and 56 Gbit but when we tried to test and the mode was set to IB the maximum connection rate we ever saw was 40 Gbit and 4xQSFP speed and not FDR or 56 Gbit

3. After ordering the 2 ConnectX-3 cards I found a $200 Mellanox 36 port 40G Infiniband switch IS5024 (see attached PDF) on Ebay and preemptively ordered it, trusting that "it would work". Will this switch work with the cards in ETH mode, or should the cards be in IB mode and then use "ipoib" with the corresponding much lower transfer rates (best we ever saw was 19 Gbit) ? I need to wait up to 3 more weeks before I can play with it (has to cross half the world to get here) so am hoping for some positive experiences of others with (Mellanox Infiniband) switches (I know it's going to be loud but because it's a lab-setup we'll probably remove the small 40mm fans and drill some holes and add some ordinary 12cm fans under the cover plate ; all in the name of science!

4. What would be a good end to end tool to test real-world file/data transfer between two 40 Gbit equipped servers, besides copying files inside an SSH tunnel which is a bit overkill since it's going to be hard to snoop on the cable connection between the 2 servers. Of course we can try to dig up FTP from it's grave, I expect the SMB protocol to fail miserable at 40 GBit transfers and NFS we never needed. iSCSI is the goal so we'll set it up but takes a bit more time than a simple command-line iperf or scp line.

5. Does anyone have good suggestions for a specific type or family of (Mellanox) cards that are easy to use, work well and compatible switches? We're talking in the 3 digit investments here, I got the dual port 40 GBit card for $130 which was on the upper end of what I wanted to pay for an uncertain experiment, especially considering the fact we have a dozen 10 Gbit cards that work FINE at exactly 10 Gbit speeds with cheap DAC cables and our SFP+ switch and require no configuration at all and each costed around $25. If our servers weren't shy of PCIe slots I'd have tried to put 4 x 10 Gbit in a server and use Linux ethernet bonding (which works fine on 1 Gbit links) ; easy peasy and very flexible. Just a mess with cables.

Cheers!
 

Attachments

RageBone

Active Member
Jul 11, 2017
428
109
43
1: I think 40GbE is spec, and 56GbE only in combination with a Mellanox Eth Switch. I hope not though, no experience.
Except my setup but that is a QDR Cable, so i don't complain about less then 40Gbit/s.

4: in Theory, SMB-Direct. Should be in the current SAMBA versions, but since my setup talks SMB3_02 and Windows10 with Direct needs 3_11, i'm not sure SMB Direct works as it should. The Samba part is also still in BETA as far as i know. Freenas has experimental support for it though, in case that's of interest.

So, in the Future, it'll hopefully be SMB-Direct.

Currently, it's probably iscsi with RDMA aka iSER. iSER supports Infiniband and i think eth, havn't tryed that though. iSer over IB was pretty awesome.
With targetctl and iscsiadm, it is actually rather easy to create Ramdisks and share or connect to those.

NFS can also do RDMA, but no experience with that.

5: OEM HP / SUN "CX354A-FCBT"
There are a few threads on those. Sometimes, they can be had for about 30$. There are a few threads on here about those. Give em a read.

I'd advise against CX2 cards, since SMD Direct support is not a thing for those.
 

40gorbust

New Member
Jan 12, 2019
25
2
3
Yes, it requires qsfp14 cables, "normal" 40 gbit ethernet is using qsfp+. Qsfp14 transceiver build into dacs use a higher clock than the qsfp+ cables.
Would normal ipoib benefit from a 56G link vs a 40G link or would it max out at the regular "lower" speed e.g. 19/20/21 Gbit ?
 

Babbage

New Member
Feb 17, 2021
10
2
3
So I have just recently installed HP 649281-B21 into a Linux/Debian and FreeBSD server, direct link, no switch.

I followed the flashing tutorial here: https://forums.servethehome.com/ind...x-3-to-arista-7050-no-link.18369/#post-178015

And I have the following cable: Mellanox MFS4R12CB-003 Infiniband Cables

But I only get 10G, not 40G :(
Is this is firmware config error or did I get the wrong QSFP cable?

EDIT: Flash tutorial is outdated.
Use this https://forums.servethehome.com/ind...net-dual-port-qsfp-adapter.20525/#post-198015
Long time lurker - 1st time poster...

Hello All, just wanted to share my experience for this. I just purchased (2) HP 649281-B21 ConnectX3 cards from ebay for $40 (both)
They had old firmware on them. So I found this page:
Chose FreeBSD which gave me this link to the firmware update utility -
I just did a wget http://www.mellanox.com/downloads/firmware/mlxup/4.15.2/SFX/fbsd10_64/mlxup
Then made mlxup executable (chmod +x mlxup)
Then ran it to see I had older HP firmware on it.

Found this HP Firmware - Drivers & Software - HPE Support Center.
Downloaded it, unzip etc to a temp folder - 2.36.5000 - This is not as new as someone else did on STH.

Then it's just './mlxup -D temp' -- Mellanox update util looks in that folder and asks if I want to update both cards.
Which I did, then did a reboot.

Bought this cable on ebay for another $20 -
IBM 1m Mellanox QSFP Passive Copper FDR10 InfiniBand Cable 90Y3810 ZZ

I had a QSFP DAC cable going from card 1 port 1 to card 2 port 2 - and it stated, up connected but DID NOT show me the link speed.
It wasn't until I bound an IP address to the ConnectX3 cards that I could see link speed.

ifconfig output:
mlxen3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:02:c9:36:15:11
inet 10.10.10.2 netmask 0xffffff00 broadcast 10.10.10.255
media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>)
status: active
nd6 options=1<PERFORMNUD>


AND...

root@truenas[~]# ./mlxup
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX3
Part Number: 649281-B21_Bx
Description: HP IB 4X FDR CX-3 PCI-e G3 Dual Port HCA
PSID: HP_0280210019
PCI Device Name: pci0:1:0:0
Port1 MAC: 0002c93614b0
Port2 MAC: 0002c93614b1
Versions: Current Available
FW 2.36.5000 N/A
CLP 8025 N/A
PXE 3.4.0718 N/A

Status: No matching image found

Device #2:
----------

Device Type: ConnectX3
Part Number: 649281-B21_Bx
Description: HP IB 4X FDR CX-3 PCI-e G3 Dual Port HCA
PSID: HP_0280210019
PCI Device Name: pci0:2:0:0
Port1 MAC: 0002c9361510
Port2 MAC: 0002c9361511
Versions: Current Available
FW 2.36.5000 N/A
CLP 8025 N/A
PXE 3.4.0718 N/A

Status: No matching image found




I flashed and updated both cards in FreeBSD 12 - going to pull one out and put it in my ESXi Server <-- 40Gbe -->FreeNAS 12U2

I did see the other writeup about changing the manufacturer to Mellanox and being able to run an even newer Firmware, but I didn't want to brick my cards...

Just wanted to share. Thanks for the great people and great articles.
 

Scollen

New Member
Jul 31, 2021
1
1
1
Hello All,

First time poster..

Just wanted to share my experience with getting 40G to show on an Iperf off a Debian jump box and my TrueNas server. I ran into similar issues that others described above. I was only able to push about 11-13Gbps despite numerous setting tweeks, MTU and bios changes (which only increased speeds .1-.3Gbps per). I followed Mellanox / Intel setting tunables and several forums. Turned on 4G decoding in Bios, and set MTU of 9000.

My setup was using a mix of HP 649281-B21 (I updated the firmware per https://forums.servethehome.com/ind...net-dual-port-qsfp-adapter.20525/#post-198015, found on eBay for $35) and Intel XL710-QDA1 (found on eBay for $65)

In the end I made two errors that where my bottle neck; Both PCIe lanes.
My jumpbox motherboard splits bandwidth between slot 1 and 3 which held back ~4Gbps worth.(Wasn't even thinking about having these speeds and equipment when I bought the board 4 years ago.)
My TrueNas Server had 6 available x8 slots. I was too distracted trying to space out cards that I didn't notice I installed in the only PCIex2 slot on the board which released the remaining 22Gbps upon giving it a x3 Lane.

This was a rather simple mistake that I made in the rush of installation, but still cost me several hours researching and tweeking trying to figure out if FreeBSD was sticking it to me or if I had faulty hardware, Ect. I had a good laugh about it now and am now able to pull 40Gbps. Hopefully this is able to help someone.
 
  • Like
Reactions: ZFSZealot