[SOLVED]Slow speeds between two Connectx-2 machines

rubylaser

Active Member
Jan 4, 2013
842
229
43
Michigan, USA
Hello,

I have (2) machines connected to each other with Mellanox Connectx-2 Ethernet cards. They are connected via 10m passive twinax cable (I need this length to reach from my office on the ground floor to my server in the basement). The basement server is running Ubuntu 14.04.4 Server on an i5-4590 with 32GB of Ram and the office workstation is running Windows 10 64-bit Professional on an e5-2670 v1 with 64GB of RAM. Here's a few tests with iperf3 between the two machines (I tried turning the Windows firewall off temporarily to see if that helped, but no difference).




As, you can see it is SLOW (the bottom picture is the best it's performed, sometimes it's in the single digit MB/s). If I don't use iperf and instead just use Samba to pull a big iso from my server, over the gigabit connection it's flat and completely saturated vs. the 10GBe only going about getting about 35MB/s and being very sporadic.

I'm new to this, so I'm leaning towards this being the fault of the LONG passive SFP+ cable, so please let me know how I should troubleshoot this, or what I should replace the cable with (is an active cable enough or do I need to try fiber, if so, what do I buy?)

Thanks!
 

rubylaser

Active Member
Jan 4, 2013
842
229
43
Michigan, USA
Do you have them in a PCIe slots with a x8 electrical connection to them?
Thanks for the idea, but yes, one is in a PCIe x16 slot on i5-4590 and one is in a PCIe x8 slot on the e5-2670 system, so that shouldn't be the issue.

*Edit: Just looked and my i5-4590 system is in a ASRock B85M Pro4, but the second PCIe x16 slot is only running at x4. Luckily, I have a 2P e5-2670 build on the way to replace this system, so maybe that will solve the problem.

I assumed this was caused by the long passive cable as I read after that fact that for runs longer than 6 meters, it's better to go with fiber or and active cable. Can anyone else weigh in on the cable issue while I wait until Thursday for my new system to show up?
 
Last edited:

rubylaser

Active Member
Jan 4, 2013
842
229
43
Michigan, USA
Did you install the Mellanox drivers, or are you using the drivers built into Win10? Install WinOF from this page if you haven't: http://www.mellanox.com/page/products_dyn?product_family=32 (WinOF > 5.10 > Windows Client > 10)

When I was testing ConnectX-2 cards with Win10 I saw very similar performance to yours prior to installing drivers.
I'm using the 4.91 drivers that are built into Windows 10. Thanks, I'll try those tonight :)
 
Last edited:

rubylaser

Active Member
Jan 4, 2013
842
229
43
Michigan, USA
Update: I installed the new driver and my iperf speeds are now more like > 3Gb/sec. I just tried a transfer via Samba from my server and saw 265MB/s. So, that improved it a lot. I hope when I can get it into an 8x electrical PCIe slot it will improve, because I'm still only about 1/3 of what I should see via iperf. Anymore ideas or if the SFP+ cable may also be a contributor, please let me know (also, what's the best fix).

The weird thing is with iperf3, if I make the Linux box (the i5-4590) the server, I see the +3Gb/s speeds. If the Windows machine (the e5-2670 v1) is the server, I only get 692 Mb/s. The Windows box as the server is consistently MUCH slower than the other direction?!?!

Thanks!
 
Last edited:
  • Like
Reactions: Rain

Rain

Active Member
May 13, 2013
240
81
28
Tweak the Receive & Transmit Buffers on the Windows machine (Linux should be fine): Right Click adapter in Network and Sharing Center > Properties > Configure... > ... Set them to the maximum. Jumbo Frames shouldn't be necessary to max out a 10Gbps iperf, don't worry about that yet.

There is also a way to tell the Mellanox drivers your expected workload (or something like that, I forgot what it's called and I don't have these cards plugged into Windows machines anymore -- someone else can chime in) in the network card configuration as well. Toy around with that too. If I'm recalling correctly, I simply set mine to "single port" or something similar to that.
 
  • Like
Reactions: rubylaser

PigLover

Moderator
Jan 26, 2011
2,954
1,260
113
Do you have them in a PCIe slots with a x8 electrical connection to them? I've read they don't work well with less.
I've done more that just "hear" this. First hand testing. The Mellanox cards (CX2/CX3) get REALLY unhappy working in slots with less than x8. This is true even though an PCIe 2.0+ x4 slot should support single port 10GBe with no issues. On x4 electrical my experience is similar to what @rubylaser is seeing - iperf at 2-3Gbps max.

When you get the cards on x8 slots my money says you will see iperf at 9.5+ Gbps without much tuning required.
 
  • Like
Reactions: rubylaser

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
I've done more that just "hear" this. First hand testing. The Mellanox cards (CX2/CX3) get REALLY unhappy working in slots with less than x8. This is true even though an PCIe 2.0+ x4 slot should support single port 10GBe with no issues. On x4 electrical my experience is similar to what @rubylaser is seeing - iperf at 2-3Gbps max.

When you get the cards on x8 slots my money says you will see iperf at 9.5+ Gbps without much tuning required.
According to the Mellanox documentation the CX3 on PCIe 3.0 only "needs" (uses) 4 lanes. On 2.0 it "needs" (uses) 8. That's one of the reasons why I bought one. I haven't used it or tested it yet though.
 
Last edited:
  • Like
Reactions: rubylaser

PigLover

Moderator
Jan 26, 2011
2,954
1,260
113
You may be right - I never did test a CX3 on an PCIe 3.0 x4 slot, only PCIe 2.0. In any case, the OPs motherboard also appears to be PCIe 2.0 x4 (in an x16 slot).
 

rubylaser

Active Member
Jan 4, 2013
842
229
43
Michigan, USA
Thanks for the input. I'll hold off on tweaking anything further until the new parts show up on Thursday. Then, I'll have two 2011 systems with multiple PCIe x8 slots on both sides :) thanks for all the help everyone!
 

BackupProphet

Well-Known Member
Jul 2, 2014
796
284
63
Stavanger, Norway
kingmakers.no
I've tried these cards on Windows. I get similar performance, something between 1gbps up to 4gbps. It varies a lot, file transfer seems to be faster than iperf.
With Linux or FreeBSD the story is different. Max performance, no tuning needed at all.
 

izx

Active Member
Jan 17, 2016
113
38
28
36
Can anyone else weigh in on the cable issue while I wait until Thursday for my new system to show up?
IIRC, the "official" limit for passive DAC is 7m @ 24AWG. 10m @ 24 AWG should be OK, but I don't see any specs for the Belkin cable. Perhaps consider this original Molex cable for not much more? Active DAC will be OK at 10m, fiber isn't necessary.
 

rubylaser

Active Member
Jan 4, 2013
842
229
43
Michigan, USA
IIRC, the "official" limit for passive DAC is 7m @ 24AWG. 10m @ 24 AWG should be OK, but I don't see any specs for the Belkin cable. Perhaps consider this original Molex cable for not much more? Active DAC will be OK at 10m, fiber isn't necessary.
Thanks for the advice, but I don't see a link. I'm interested, so please supply it.
 

ehfortin

Member
Nov 1, 2015
56
5
8
49
I've done similar testing with the same ConnectX-2 in a HP ML310e in appropriate PCIe slot and I'm also experiencing about 3.25 Gbps from OmniOS VM to OmniOS VM on another identical server. Everything is under ESXI 6 and VMXNET3 NIC. I tried from ubuntu VM to ubuntu VM and got the same performance. I've not tried physical to physical yet as I have to shutdown my Vmware cluster and boot some live Linux that has iperf to do the testing. That's my next tests.

I have 1M long DAC and I'm going through a 10 Gbps switch. I tried to remove the switch and plug back to back and got the same results so, for now, the switch doesn't seem to be the bottleneck.

Will report once physical testing is done. However, if there are some tweaking that are known to help with VMware, I'm all hear.
 

groove

Member
Sep 21, 2011
80
26
18
I'm getting similar performance numbers on ConnectX-2 even on Infiniband 40GBPS. This is between VMWare and Solaris 11.3. I even tried between Solaris 11.3 and Windows 2012 R2 and saw similar results.
 

ehfortin

Member
Nov 1, 2015
56
5
8
49
I was suspicious that I could have an issue with PCIe speed as it was reported that 3.25 Gbps usually means the PCIe slot you are using is not at the minimum speed required to get 10 Gbps. I downloaded the quickspecs and read again. At the beginning of the document, they tell you that there are PCIe 2.0 4X, PCIe 2.0 8X, PCIe 3.0 8X and PCIe 3.0 16X. Well, they refer to the connector width. Looking for the bus width, I discovered that one of the PCIe 2.0 8X is actually a 1X (same for the 4X). So, I moved the NIC to the PCIe 3.0 16X (the 8X was already used by a SAS HBA) and I finally got the 10 Gbps between two hosts having each an OmniOS VM for the test.

So, make sure you are getting the bus width (or bus speed) in the documentation as it may fool you into using an incorrect PCIe slot.

Thank you to @PigLover and @Stereodude for pointing at the right direction.
 

Rain

Active Member
May 13, 2013
240
81
28
I'm getting similar performance numbers on ConnectX-2 even on Infiniband 40GBPS. This is between VMWare and Solaris 11.3. I even tried between Solaris 11.3 and Windows 2012 R2 and saw similar results.
I too experienced poor performance with Infiniband cards. I believe IPoIB is less performant if your single-core performance is not very good; especially in VMware. When doing VMotions and other network intensive tasks (copies) over IPoIB, one CPU core/thread on the ESXi box would get pegged and performance wasn't that good.

I assume this is because IPoIB isn't hardware offloaded in the same way (the CPU has to handle the entire IP stack). I'm sure with sufficiently fast CPUs it performs great, but for lower-power parts 10GbE/40GbE cards that support hardware offloading are a must.
 

rubylaser

Active Member
Jan 4, 2013
842
229
43
Michigan, USA
Well something went horribly wrong. I just setup my second 2670 system and put the other card in a x8 slot and the iperf speeds between my hosts are now about 3.5 MB/s (terrible). This has really been a failed experiment so far :(