I recently got a bunch of MCX314A-like cards (actually MCX384A-BCAA Dell C6220 Mezzanines but they're basically identical on the inside to MCX314A, take the Mellanox firmware and function absolutely fine on it with 40GbE, all devices come up, network tests work fine, show correct speed etc.)
However as I have no 40GbE switches, only FDR and QDR infiniband switches, and my other machines have infiniband HCAs, I decided to try to flash infiniband MCX354A firmware onto the cards using mstflint.
The flashing took a few tries as there actually seem to be two different MCX354A firmwares, one for ConnectX-3 and one for ConnectX-3 Pro (Trying to flash non-pro version of MCX354A firmware, it complains about a non-compatible device... and block burn bricks the card so I have to short the recovery pins and flash back Pro version)
But the ConnectX-3 Pro MCX354A infiniband firmware goes in fine. I used fw-ConnectX3Pro-rel-2_42_5000-MCX314A-BCC_Ax-FlexBoot-3.4.752.bin from NVidia/Mellanox site (Firmware for ConnectX®-3 Pro IB/VPI) and flashed with:
After reboot card comes up and reports as a Pro-version of MCX354A, VPI and everything.
(below the LINK_TYPE_P1 and 2 are set to IB from my debugging attempts, but VPI is an available option)
So everything looks OK. The mlx4_core and mlx4_ib drivers also seem happy in dmesg even on debug_level=5 and so on. Basically the flashed MCX314A/MCX384A looks indistinguishable from a MCX354A with these tools as far as I've been able to figure out.
Except the problem is that it won't connect to infiniband switches or other HCAs... I tried two different switches (Mellanox SX6005 and IS5022) as well as two different Mellanox HCAs (ConnectX-2 MHQH19 and Connect-IB MCB194). but it never connects. It just stays in "Port down" and "polling" state.
Weird thing is, if I put the QSFP cable between the two ports of the flashed card, both ports light up, and it gets a connection. Starting OpenSM, the ports go to active and say they're connected at 40Gbps (and it's definitely reported as infiniband, not ethernet, it's set to IB mode in mstconfig so no ethernet devices even report for it)
Benchmark tools like ib_read_bw works, I can send data and it goes through at a nice 3900MB/s with the transmit LEDs flashing as they should etc. But only ever between the 2 ports of the card itself. Plug it to another HCA or switch, nothing...
Spent a few evenings on this, and it's driving me crazy. Anyone have any ideas?
I have a bunch of these cards, so if anyone can fix this problem and has a Dell C6100/C6220/C6320 system they'd like 40GbE or Infiniband for, I can send a couple of MCX384 cards as a reward for postage cost only (we can work out the cheapest shipping, I'm located in Europe)
However as I have no 40GbE switches, only FDR and QDR infiniband switches, and my other machines have infiniband HCAs, I decided to try to flash infiniband MCX354A firmware onto the cards using mstflint.
The flashing took a few tries as there actually seem to be two different MCX354A firmwares, one for ConnectX-3 and one for ConnectX-3 Pro (Trying to flash non-pro version of MCX354A firmware, it complains about a non-compatible device... and block burn bricks the card so I have to short the recovery pins and flash back Pro version)
But the ConnectX-3 Pro MCX354A infiniband firmware goes in fine. I used fw-ConnectX3Pro-rel-2_42_5000-MCX314A-BCC_Ax-FlexBoot-3.4.752.bin from NVidia/Mellanox site (Firmware for ConnectX®-3 Pro IB/VPI) and flashed with:
Code:
mstflint -nofs --ignore_dev_data --use_image_guids --use_image_ps --allow_psid_change -d82:00.0 -i fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin b
Code:
root@rdellnew# mstflint -d 82:00.0 q
Image type: FS2
FW Version: 2.42.5000
FW Release Date: 5.9.2017
Product Version: 02.42.50.00
Rom Info: type=PXE version=3.4.752 devid=4103
Device ID: 4103
Description: Node Port1 Port2 Sys image
GUIDs: 0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050
MACs: 0002c9000001 0002c9000002
VSD: n/a
PSID: MT_1090111019
(below the LINK_TYPE_P1 and 2 are set to IB from my debugging attempts, but VPI is an available option)
Code:
root@rdellnew1# mstconfig -d 82:00.0 q
Device #1:
----------
Device type: ConnectX3Pro
PCI device: 82:00.0
Configurations: Next Boot
SRIOV_EN True(1)
NUM_OF_VFS 8
LINK_TYPE_P1 IB(1)
LINK_TYPE_P2 IB(1)
LOG_BAR_SIZE 3
BOOT_PKEY_P1 0
BOOT_PKEY_P2 0
BOOT_OPTION_ROM_EN_P1 False(0)
BOOT_VLAN_EN_P1 False(0)
BOOT_RETRY_CNT_P1 0
LEGACY_BOOT_PROTOCOL_P1 PXE(1)
BOOT_VLAN_P1 1
BOOT_OPTION_ROM_EN_P2 False(0)
BOOT_VLAN_EN_P2 False(0)
BOOT_RETRY_CNT_P2 0
LEGACY_BOOT_PROTOCOL_P2 PXE(1)
BOOT_VLAN_P2 1
IP_VER_P1 IPv4(0)
IP_VER_P2 IPv4(0)
Code:
root@rdellnew1# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 0
Node GUID: 0x0002c9000100d050
System image GUID: 0x0002c9000100d050
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 6
LMC: 0
SM lid: 6
Capability mask: 0x0251486a
Port GUID: 0x0002c9000100d051
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 7
LMC: 0
SM lid: 6
Capability mask: 0x02514868
Port GUID: 0x0002c9000100d052
Link layer: InfiniBand
So everything looks OK. The mlx4_core and mlx4_ib drivers also seem happy in dmesg even on debug_level=5 and so on. Basically the flashed MCX314A/MCX384A looks indistinguishable from a MCX354A with these tools as far as I've been able to figure out.
Except the problem is that it won't connect to infiniband switches or other HCAs... I tried two different switches (Mellanox SX6005 and IS5022) as well as two different Mellanox HCAs (ConnectX-2 MHQH19 and Connect-IB MCB194). but it never connects. It just stays in "Port down" and "polling" state.
Weird thing is, if I put the QSFP cable between the two ports of the flashed card, both ports light up, and it gets a connection. Starting OpenSM, the ports go to active and say they're connected at 40Gbps (and it's definitely reported as infiniband, not ethernet, it's set to IB mode in mstconfig so no ethernet devices even report for it)
Benchmark tools like ib_read_bw works, I can send data and it goes through at a nice 3900MB/s with the transmit LEDs flashing as they should etc. But only ever between the 2 ports of the card itself. Plug it to another HCA or switch, nothing...
Spent a few evenings on this, and it's driving me crazy. Anyone have any ideas?
I have a bunch of these cards, so if anyone can fix this problem and has a Dell C6100/C6220/C6320 system they'd like 40GbE or Infiniband for, I can send a couple of MCX384 cards as a reward for postage cost only (we can work out the cheapest shipping, I'm located in Europe)
Last edited: