Infiniband help

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

jibesh101

New Member
Dec 8, 2012
22
0
1
Texas
Hello everyone,

I got 2 x Mellanox MHGH28-XTC adapters (firmware 2.9.1000) connected directly using Infiniband CX4 cables. They work fine as 10GbE adapters but if I switch them to IB adapter mode, the status show as "Network cable unplugged". OS on both systems are Windows Server 2012 and I'm using the Windows drivers provided by Mellanox.

Any one have any experience with these or any suggestions? Thanks.
 
Last edited:

cactus

Moderator
Jan 25, 2011
830
75
28
CA
I assume MHGH28, dual port ConnectX DDR. Also, when you say 10GbE you are referring to EN mode? Are you running a subnet manager when using them in Infiniband mode? Post an vstat output.
 

jibesh101

New Member
Dec 8, 2012
22
0
1
Texas
I assume MHGH28, dual port ConnectX DDR. Also, when you say 10GbE you are referring to EN mode? Are you running a subnet manager when using them in Infiniband mode? Post an vstat output.
Yes, MHGH28-XTC (mistyped it). Controller gives options for IB, Eth, or Auto. IB = IBoIP, Eth = 10GbE, Auto = Auto Detection.

Not famailiar with subnet managers (this might be the problem).

Vstat output:

C:\Program Files\Mellanox\WinMFT>vstat

hca_idx=0
uplink={BUS=PCI_E Gen1, SPEED=2.5 Gbps, WIDTH=x8, CAPS=2.5*x8}
MSI-X={ENABLED=1, SUPPORTED=128, GRANTED=26, ALL_MASKED=N}
vendor_id=0x02c9
vendor_part_id=25418
hw_ver=0xa0
fw_ver=2.09.1000
PSID=MT_04A0140005
node_guid=0002:c903:0002:8a10
num_phys_ports=2
port=1
port_guid=0202:c9ff:fe02:8a11
port_state=PORT_ACTIVE (4)
link_speed=2.50 Gbps
link_width=4x (2)
rate=10.00 Gbps
port_phys_state=LINK_UP (5)
active_speed=2.50 Gbps
sm_lid=0x0000
port_lid=0x0000
port_lmc=0x0
transport=RoCE
max_mtu=2048 (4)

port=2
port_guid=0202:c9ff:fe02:8a12
port_state=PORT_ACTIVE (4)
link_speed=2.50 Gbps
link_width=4x (2)
rate=10.00 Gbps
port_phys_state=LINK_UP (5)
active_speed=2.50 Gbps
sm_lid=0x0000
port_lid=0x0000
port_lmc=0x0
transport=RoCE
max_mtu=2048 (4)
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
That looks to be vstat when in EN mode. Speed is limited to 10Gbps and Transport is RDMA over Converge Ethernet.

So for an IB network to work, you need a subnet manager running on one of the nodes. The one that comes with WinOF is opensm. You can run it in cmd or set it as a service. When I tried messing with 2012 and QDR cards, I had to run it from cmd because it was not showing up in the services list. You will need to figure out how to get it to bind to both ports or just run two instances. dba, would be a better resource for windows help, I dont have windows running on anything ATM.
 

jmarg

New Member
Feb 28, 2013
6
0
0
Hey, I work at Mellanox if you need additional help. jeffm (at) mellanox (dot) com.
 

renderfarmer

Member
Feb 22, 2013
249
1
18
New Jersey
I have kind of a similar problem except i tried hooking up these same cards to my Cisco Topspin90 and get no lights on the HCA or the switch end.

When I interconnect two HCAs they both light up just fine though I don't knwo if it reverts to EN mode or not. This is in Win2012

Is it that the MHGH28-XTC can't work at 10Gbps IB or the switch is so old it just can't understand what the MHGH28-XTC is?
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
I have kind of a similar problem except i tried hooking up these same cards to my Cisco Topspin90 and get no lights on the HCA or the switch end.

When I interconnect two HCAs they both light up just fine though I don't knwo if it reverts to EN mode or not. This is in Win2012

Is it that the MHGH28-XTC can't work at 10Gbps IB or the switch is so old it just can't understand what the MHGH28-XTC is?
I have connected ConnectX DDR cards to SDR cards and a Topspin 120. Did you try point to point?
 

renderfarmer

Member
Feb 22, 2013
249
1
18
New Jersey
Point to point meaning one server directly to another, bypassing the switch? Yes, I did. Both ConnectX cards lit up. I have no idea why the switch won't light up when I connect them. It has the last released firmware (2.8.0).

The only reason I got these connectX cards is because my infinihost III cards are giving me all sorts of problems in winserv2012; for instance the latest drivers won't take, I have to use the old winserv2008 drivers for the cards to even work.

Not sure if this is the cause of it, but when I try creating a hyperV VM of win2012 running inside win2012 I can't get the IB connection to be shared by the VM, irrespective fo what options or settings I try for the hyperV virtual switch.
 

cactus

Moderator
Jan 25, 2011
830
75
28
CA
I believe the infinihost III cards are getting phased out. Your problems with sharing the IPoIB connection could be because windows is using L2 bridging and it may not be supported. This weekend, Ill try to get some time to play with Windows.
 

renderfarmer

Member
Feb 22, 2013
249
1
18
New Jersey
Thanks, cactus. I had high hopes for this machine as a dual puprose File Server + compute node running in VM. If birdging the connection won't work I'll have to come up with something else. Thanks for taking a crack at it.
 

renderfarmer

Member
Feb 22, 2013
249
1
18
New Jersey
Mystery solved. One of the connectX cards appears to be deffective. I bought two; a dual port and a single port. The single port works just fine - connected to my Topspin90. The dual port one shows as cables unplugged when connected to the switch and I get some sort of firmware flash error message when installing the drivers on the very last panel...