10GBe Card not exceeding 800Mbps

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Tolahouse

New Member
Oct 25, 2013
1
0
0
Guys I need some help here.

I have two Mellanox ConnectX-2 En cards (MNPH29D-XTR) in two different Windows 2012 systems. The cards are connected back to back with a Proline Twinax DAC cable. I have not been able to break 1GB speed between the cards.

I have loaded latest drivers from Mellanox (MLNX_VPI_WinOF-4_40_0) and the latest Firmware (2_9_1000) and played about with TCP/IP settings for the cards but issue still persists.

Could anybody give me some pointers or gotchas to resolve this issue.

Thanks
 

Aluminum

Active Member
Sep 7, 2012
431
46
28
Guys I need some help here.

I have two Mellanox ConnectX-2 En cards (MNPH29D-XTR) in two different Windows 2012 systems. The cards are connected back to back with a Proline Twinax DAC cable. I have not been able to break 1GB speed between the cards.

I have loaded latest drivers from Mellanox (MLNX_VPI_WinOF-4_40_0) and the latest Firmware (2_9_1000) and played about with TCP/IP settings for the cards but issue still persists.

Could anybody give me some pointers or gotchas to resolve this issue.

Thanks
Have you tested your disk I/O to see if it can break 800? The cards may not be the weakest link anymore.
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
Guys I need some help here.

I have two Mellanox ConnectX-2 En cards (MNPH29D-XTR) in two different Windows 2012 systems. The cards are connected back to back with a Proline Twinax DAC cable. I have not been able to break 1GB speed between the cards.

I have loaded latest drivers from Mellanox (MLNX_VPI_WinOF-4_40_0) and the latest Firmware (2_9_1000) and played about with TCP/IP settings for the cards but issue still persists.

Could anybody give me some pointers or gotchas to resolve this issue.

Thanks
Maybe you can describe how you are benchmarking your speed?

Also, use PowerShell to check to see if you have Network Direct and RDMA enabled and working - Google for details or see Custom firmware enables Windows 2012 IPoIB RDMA for Mellanox OEM Infiniband Cards - www.openida.com
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
You might also want to check to be sure the cards actually linked at 10G. Open the device driver properties page. I think the Mellanox driver has an "information" tab that should show the link speed quite clearly. If one or more of the pairs in the DAC cable are broken it will fall back to 1Gbe.
 

jpasint

New Member
Oct 20, 2013
9
0
1
You might also want to check to be sure the cards actually linked at 10G. Open the device driver properties page. I think the Mellanox driver has an "information" tab that should show the link speed quite clearly. If one or more of the pairs in the DAC cable are broken it will fall back to 1Gbe.
But he is already getting 800 MB/s so for sure he is not at 1Gb.

I was able to get around 850MB/s with my 10GbE but I was limited by hard drive I/O.

Joe

EDIT

My bad misunderstanding. I thought he said he was getting 800 MB/s over the wire.
 
Last edited:

mrkrad

Well-Known Member
Oct 13, 2012
1,244
52
48
first test using ramdisk (star wind !)

then check your IRQ (manually separate), and power settings (MAX MAX MAX)

Disable flow control!!

Disable IPV6!
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
then check your IRQ (manually separate), and power settings (MAX MAX MAX)
Don't do this if you want RDMA. If they are ConnextX-2/3 cards and both ends are Server 2012 / 2012 R2 then disabling Flow Control also disables RoCE (RDMA over Converged Ethernet). In all other cases - not Mellanox cards and/or either end is desktop windows 7/8 - RDMA won't work anyway and disabling flow control remains good advice.

Disable IPV6!
This is probably not as good an idea as it used to be. IPv6 is getting more and more "real" and Windows/MS Server handles native v6 pretty well these days. I do agree that you should disable the tunneled v6 psuedo-adapters that Windows will install (isatap and/or Teredo). From a command prompt running as administrator "netsh interface teredo set state disabled" and "netsh interface isatap set state disabled"
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
Maybe you can describe how you are benchmarking your speed?

Also, use PowerShell to check to see if you have Network Direct and RDMA enabled and working - Google for details or see Custom firmware enables Windows 2012 IPoIB RDMA for Mellanox OEM Infiniband Cards - www.openida.com
He has Connect-X2 EN cards. Do you get RDMA when in EN mode? RoCE maybe?

The thought made me laugh a little; running RoCE on an ASIC that does native RDMA. I get that are for two different markets.
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
He has Connect-X2 EN cards. Do you get RDMA when in EN mode? RoCE maybe?

The thought made me laugh a little; running RoCE on an ASIC that does native RDMA. I get that are for two different markets.
Yup. With ConnectX-2/3 cards running EN mode they support RoCE. When Server 2012 sees RoCE capable it tries to use it. But if you disable Ethernet flow control the Mellanox card disables RoCE.

Note that even through the card supports native RDMA you need the Layer-2 encapsulation of RoCE in order to get it through your Ethernet switch.
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
He has Connect-X2 EN cards. Do you get RDMA when in EN mode? RoCE maybe?

The thought made me laugh a little; running RoCE on an ASIC that does native RDMA. I get that are for two different markets.
I use 32Gbit IPoIB (IP over Infiniband) in Windows 2012 and I do get RDMA. I haven't done much with the cards in EN mode, but I assume that you would also get RDMA.
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
I use 32Gbit IPoIB (IP over Infiniband) in Windows 2012 and I do get RDMA. I haven't done much with the cards in EN mode, but I assume that you would also get RDMA.
IPoIB uses native Infiniband transport. So in that mode the ASIC gets RDMA by skipping the IP stack and just sending RDMA packets over the infiniband wire. Your devices are either direct connected using QSFP cables or connected via an IB switch.

In EN mode the card is using Ethernet link layer. With the Ethernet link layer the card assumes it is connected to an Ethernet switch for its layer-2. With an Ethernet switch in the path there is no way to use native IB for the layer-2 - so the only way to get RDMA is via RoCE.
 

33_viper_33

Member
Aug 3, 2013
204
3
18
I have the HP version of the connectx-2 VPI cards running in infiniband mode. I just finished up my first round of tests and am disappointed in the results. I started of with 2 ESXi nodes on my C6100 running a windows 7 VM each. I setup the cards as virtual switches in ESXi and gave each VM a VMXnet3 NIC. I used pfsense to provide an IP address but will likely just set a static one in the future for testing. Pushing 10Gb from ramdisk to ramdisk between the windows 7 vms, I only saw 300Mb/s steady and peaking at 400Mb/s occasionally.

Please note, I have done no optimization as of yet and am not sure where to start. My first thought was IRQs but after reading the above, I guess that is not a good idea. I'm no expert with ESXI, so please provide any knowledge you may have. I'm going to try pass through next just to ensure the cards and cable are good and hope to see better performance. The link state in ESXi says 40000, so I'm assuming that the cards and cable are ok. If pass through fixes the problem, I will teach myself SR-IOV and provide VMs virtual infiniband controllers. This is not optimal to me since I had hoped to bridge the infiniband and GBE networks.

The way I pictured this setup in my head used one of two options. The first option I was considering was setting up 1 GBE and 1 infiniband NIC on a single virtual switch. The second option was to allow pfsense act as the bridge and keep the two networks separate in ESXI. I’ve done this for a while with my 10GBE cards with success. The second option was preferable for HA but I assume the first would be better from a performance standpoint. I briefly tried the first option without success. The ESXi management and VMs all became inaccessible. I’m sure I’m missing a setting somewhere and possibly ESXi is favoring the faster link. Advice appreciated.

Overall, I'm disappointed because I was pushing 400Mb/s from VM to VM on a single ESXI box using SSDs and the onboard SATA II controller. With ramdisk and infiniband interconnects, I thought I would push much more but seem to be falling short instead. I am close to maxing out my available ram on one of the two boxes which may be part of the issue. I was considering robbing ram from one of the other nodes to bump both test nodes up to 32gb+ per node.
 

33_viper_33

Member
Aug 3, 2013
204
3
18
My satellite was down last night and I received a message that maintenance will be conducted causing connectivity to be intermittent. The best news, this will likely be the case for the next week which will slow the pace of my test.

I did rob ram form another node making 2 nodes with 32GB. Each ESXI has 10 GB of spare ram. Each VM has 20 GB, 15GB used for ramdisk. I did notice a slight improvement. I’m sustaining 360MB/s now and peaking at 450MB/s. Still not the numbers I’m hoping for.

Next test, PCIe passthrough. Just need to download the drivers, which is proving difficult with the SAT issues.
 

mrkrad

Well-Known Member
Oct 13, 2012
1,244
52
48
You are doing linear ATTO benchmark tests to an SMB shared RAMDISK?? (star wind ramdisk is free).

IRQ - important to separate the network card, it is going to get 5 million packets per second at 3-4 gigabit.

power management - kill it. max max max performance,noise,etc.

Make sure the slot is as required.

You have to have multiple MAC or VLAN to get max throughput with ESXi. It won't even use multiple queue's with a single 2008R2 box with one vmxnet3. ESXi is designed to have many vm's running at once, so one vm getting full network bandwidth is not important. at all.
 

dba

Moderator
Feb 20, 2012
1,477
184
63
San Francisco Bay Area, California, USA
...IRQ - important to separate the network card...
mrkrad, one of your common themes is the need to separate IRQs. I'm not a skeptic, in fact it seems like good advice prima facie, but I wonder if you can share any experiences or benchmarks or references? I have heard this advice from you so many times that I'm tempted to start sharing it with others, which I probably shouldn't do without learning more first.
 

zer0sum

Well-Known Member
Mar 8, 2013
849
474
63
I have the HP version of the connectx-2 VPI cards running in infiniband mode. I just finished up my first round of tests and am disappointed in the results. I started of with 2 ESXi nodes on my C6100 running a windows 7 VM each. I setup the cards as virtual switches in ESXi and gave each VM a VMXnet3 NIC. I used pfsense to provide an IP address but will likely just set a static one in the future for testing. Pushing 10Gb from ramdisk to ramdisk between the windows 7 vms, I only saw 300Mb/s steady and peaking at 400Mb/s occasionally.

Please note, I have done no optimization as of yet and am not sure where to start. My first thought was IRQs but after reading the above, I guess that is not a good idea. I'm no expert with ESXI, so please provide any knowledge you may have. I'm going to try pass through next just to ensure the cards and cable are good and hope to see better performance. The link state in ESXi says 40000, so I'm assuming that the cards and cable are ok. If pass through fixes the problem, I will teach myself SR-IOV and provide VMs virtual infiniband controllers. This is not optimal to me since I had hoped to bridge the infiniband and GBE networks.

The way I pictured this setup in my head used one of two options. The first option I was considering was setting up 1 GBE and 1 infiniband NIC on a single virtual switch. The second option was to allow pfsense act as the bridge and keep the two networks separate in ESXI. I’ve done this for a while with my 10GBE cards with success. The second option was preferable for HA but I assume the first would be better from a performance standpoint. I briefly tried the first option without success. The ESXi management and VMs all became inaccessible. I’m sure I’m missing a setting somewhere and possibly ESXi is favoring the faster link. Advice appreciated.

Overall, I'm disappointed because I was pushing 400Mb/s from VM to VM on a single ESXI box using SSDs and the onboard SATA II controller. With ramdisk and infiniband interconnects, I thought I would push much more but seem to be falling short instead. I am close to maxing out my available ram on one of the two boxes which may be part of the issue. I was considering robbing ram from one of the other nodes to bump both test nodes up to 32gb+ per node.
You really need to get rid of the Vswitches. Their performance sucks no matter what you do.
If you're hardware supports it you should look to get SRIOV working as you will be able to provide very close to 10Gbps line speed to each virtual machine.
This guide is for Intel but it will give you an idea of what SRIOV is.

Unfortunately, I'm not familiar with Mellanox/infiniband but I do believe they support SRIOV as well.
A quick google search reveals some interesing stuff - SR-IOV on ESXi 5.1 | Mellanox Interconnect Community
 

33_viper_33

Member
Aug 3, 2013
204
3
18
Zer0sum,

I appreciate the suggestion and had the same though and started it last week. Unfortunately, I'm not having much luck. I used this guide "http://forums.servethehome.com/networking/2551-infiniband-questions.html" to install drivers and open SM. I have the 1.8.2.0 driver installed and have successfully configured vswitches. Unfortunately, sr-iov is getting the better of me. I also looked through the article linked above but found no answers.

I enabled SR-IOV, VT-d, VT-x in bios. I then putty'ed into ESXi and tried the following commands:
~ # esxcli system module parameters set --module mlx4_ib --parameter-string=max_vfs=20,20
Received the following: Unable to set module parameters the following params are invalid: max_vfs
~ # esxcfg-module mlx4_ib -s max_vfs=20,20
Received the following: Unable to set module parameters the following params are invalid: max_vfs

I also tried the IPoIB and core portion of the driver with no luck. I'm going to do pass through in the next couple days for testing, but will ultimately need SR-IOV to work. Work has been very busy lately which is making testing slow. If I can get this working, I would like to do a quick writeup to help others avoid some of my errors.

Any suggestions on SR-IOV?
 
Last edited:

33_viper_33

Member
Aug 3, 2013
204
3
18
Well, my frustration is mounting... Just enabled PCI pass through and attempted to install the driver. The driver has now crashed not one, but two Windows 7 VMs. With that, I'm calling it a night. Will try again in the next couple days.
 

mrkrad

Well-Known Member
Oct 13, 2012
1,244
52
48
mrkrad, one of your common themes is the need to separate IRQs. I'm not a skeptic, in fact it seems like good advice prima facie, but I wonder if you can share any experiences or benchmarks or references? I have heard this advice from you so many times that I'm tempted to start sharing it with others, which I probably shouldn't do without learning more first.
Man if every card/chipset worked proper I'd agree. But they don't. inside the vm , and physical (or in between sr-iov).

It might be completely antiquated, but I suspect a server with a E5520 might be old enough to have some legacy gear still (usb ports etc).

Could just call me old and crazy, but my habits seem to help, and when I forget, they come bite me. I do not have any fancy new pci-e 3.0 stuff!