ConnectX-3 Pro MCX314A (MCX384A-BCAA) cards flashed to MCX354A Pro Infiniband/VPI firmware don't connect to other Infiniband cards or switches

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Frobbit

New Member
Feb 6, 2017
16
4
3
43
I recently got a bunch of MCX314A-like cards (actually MCX384A-BCAA Dell C6220 Mezzanines but they're basically identical on the inside to MCX314A, take the Mellanox firmware and function absolutely fine on it with 40GbE, all devices come up, network tests work fine, show correct speed etc.)

However as I have no 40GbE switches, only FDR and QDR infiniband switches, and my other machines have infiniband HCAs, I decided to try to flash infiniband MCX354A firmware onto the cards using mstflint.

The flashing took a few tries as there actually seem to be two different MCX354A firmwares, one for ConnectX-3 and one for ConnectX-3 Pro (Trying to flash non-pro version of MCX354A firmware, it complains about a non-compatible device... and block burn bricks the card so I have to short the recovery pins and flash back Pro version)

But the ConnectX-3 Pro MCX354A infiniband firmware goes in fine. I used fw-ConnectX3Pro-rel-2_42_5000-MCX314A-BCC_Ax-FlexBoot-3.4.752.bin from NVidia/Mellanox site (Firmware for ConnectX®-3 Pro IB/VPI) and flashed with:

Code:
mstflint -nofs --ignore_dev_data --use_image_guids --use_image_ps --allow_psid_change -d82:00.0 -i fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin b
After reboot card comes up and reports as a Pro-version of MCX354A, VPI and everything.

Code:
root@rdellnew# mstflint -d 82:00.0 q
Image type:          FS2
FW Version:          2.42.5000
FW Release Date:     5.9.2017
Product Version:     02.42.50.00
Rom Info:            type=PXE version=3.4.752 devid=4103
Device ID:           4103
Description:         Node             Port1            Port2            Sys image
GUIDs:               0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050
MACs:                                     0002c9000001     0002c9000002
VSD:                 n/a
PSID:                MT_1090111019

(below the LINK_TYPE_P1 and 2 are set to IB from my debugging attempts, but VPI is an available option)

Code:
root@rdellnew1# mstconfig -d 82:00.0 q

Device #1:
----------

Device type:    ConnectX3Pro
PCI device:     82:00.0 

Configurations:                              Next Boot
         SRIOV_EN                            True(1) 
         NUM_OF_VFS                          8       
         LINK_TYPE_P1                        IB(1)   
         LINK_TYPE_P2                        IB(1)   
         LOG_BAR_SIZE                        3       
         BOOT_PKEY_P1                        0       
         BOOT_PKEY_P2                        0       
         BOOT_OPTION_ROM_EN_P1               False(0)
         BOOT_VLAN_EN_P1                     False(0)
         BOOT_RETRY_CNT_P1                   0       
         LEGACY_BOOT_PROTOCOL_P1             PXE(1) 
         BOOT_VLAN_P1                        1       
         BOOT_OPTION_ROM_EN_P2               False(0)
         BOOT_VLAN_EN_P2                     False(0)
         BOOT_RETRY_CNT_P2                   0       
         LEGACY_BOOT_PROTOCOL_P2             PXE(1) 
         BOOT_VLAN_P2                        1       
         IP_VER_P1                           IPv4(0) 
         IP_VER_P2                           IPv4(0)
Code:
root@rdellnew1# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 0
        Node GUID: 0x0002c9000100d050
        System image GUID: 0x0002c9000100d050
        Port 1:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 6
                LMC: 0
                SM lid: 6
                Capability mask: 0x0251486a
                Port GUID: 0x0002c9000100d051
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 7
                LMC: 0
                SM lid: 6
                Capability mask: 0x02514868
                Port GUID: 0x0002c9000100d052
                Link layer: InfiniBand

So everything looks OK. The mlx4_core and mlx4_ib drivers also seem happy in dmesg even on debug_level=5 and so on. Basically the flashed MCX314A/MCX384A looks indistinguishable from a MCX354A with these tools as far as I've been able to figure out.

Except the problem is that it won't connect to infiniband switches or other HCAs... I tried two different switches (Mellanox SX6005 and IS5022) as well as two different Mellanox HCAs (ConnectX-2 MHQH19 and Connect-IB MCB194). but it never connects. It just stays in "Port down" and "polling" state.

Weird thing is, if I put the QSFP cable between the two ports of the flashed card, both ports light up, and it gets a connection. Starting OpenSM, the ports go to active and say they're connected at 40Gbps (and it's definitely reported as infiniband, not ethernet, it's set to IB mode in mstconfig so no ethernet devices even report for it)

Benchmark tools like ib_read_bw works, I can send data and it goes through at a nice 3900MB/s with the transmit LEDs flashing as they should etc. But only ever between the 2 ports of the card itself. Plug it to another HCA or switch, nothing...

Spent a few evenings on this, and it's driving me crazy. Anyone have any ideas?

I have a bunch of these cards, so if anyone can fix this problem and has a Dell C6100/C6220/C6320 system they'd like 40GbE or Infiniband for, I can send a couple of MCX384 cards as a reward for postage cost only :) (we can work out the cheapest shipping, I'm located in Europe)
 
Last edited:

Frobbit

New Member
Feb 6, 2017
16
4
3
43
Yeah, true I was kind of expecting some issues based on those and some similar threads.

I'm fairly sure that as they function fully as MCX314A with Mellanox standard firmware (as far as I have been able to see), the problem should be more or less equivalent to trying to get a normal (non-mezzanine) Mellanox MCX314A to work as an MCX354A infiniband adapter.

It's possible that this just can't be done due to some hardware limitation (as MCX314A is listed as ethernet only), but I thought I'd cracked it once I found the MCX354A Pro FW route .. As they actually came up as VPI and seem to fully identify as infiniband with the infiniband utils, it seems so close...

Have you backed up the old firmware by any chance (mstflint -d 82:00.0 ri backup.bin)? Can you dump the FW config (mstflint -i backup.bin dc)?
Yes, here it is:

Code:
;; Generated automatically by iniprep tool on Tue Jul 30 12:35:34 IDT 2013 from ./cx3pro_dell_baldur_2P_mezz_40g.prs
;; PRS  FILE FOR DELL BALDUR 40Ge 2P
;; $Id$



[PS_INFO]
Name = 03CYRK
Description = MCX384A-BCCA ConnectX-3 Pro Eth; dual port QSFP; 40Gb/s Mezz card for Dell

[ADAPTER]
PSID = DEL1110001023
pcie_gen2_speed_supported = true
pcie_gen3_speed_supported = true
adapter_dev_id = 0x1007
silicon_rev = 0x00
vdd_change_to_1_offset = 4
config_pca9536 = true

gpio_mode1 = 0x08002001
gpio_mode0 = 0x0580401e
gpio_default_val = 0x0928600f
gpio_pull_up = 0xfbabaf0f
gpio_pull_enable = 0xfffba01f
receiver_detect_en = true

pca9536_dir =      0x0
pca9536_init_val = 0x3
pca9536_polarity = 0x0

[HCA]
hca_header_device_id = 0x1007
hca_header_subsystem_id = 0x0010
hca_header_class_code = 0x020000
dpdp_en = false
eth_xfi_en = true
mdio_en_port1 = 0
blinking_log_led_mode=1
pcie_tx_polarity = 0x7
port1_default_phy_type=2
port2_default_phy_type=2
gpio_mapping_ncsi = true
power_save_enable = 1
slow_clock_enable = 0
hi_lo_speed_leds_supported = 1

[IB]
mlpn_en_port0 = true
mlpn_en_port1 = true
num_of_ports = Two_Ports
phy_type_port2 = XFI
phy_type_port1 = XFI
ext_phy_board_port2 = FALCON
ext_phy_board_port1 = FALCON
do_sense = false
gen_guids_from_mac = true
ref_clk_to_use = 1

read_cable_params_port1_en = true
read_cable_params_port2_en = true

backplane_connected = false
new_gpio_scheme_en = true
kr_training_enable_port1 = true
kr_training_enable_port2 = true
port1_802_3ap_cr4_enable = true
port2_802_3ap_cr4_enable = true
port2_802_3ap_cr4_ability = true
port1_802_3ap_cr4_ability = true

center_mix90phase = true
eye_open_machine_measure_time =0xd

;;Logic lane to Serdes mapping
port_swap_en = true

tx_logic_0_serdes = 0
tx_logic_1_serdes = 1
tx_logic_2_serdes = 2
tx_logic_3_serdes = 3
rx_logic_0_serdes = 3
rx_logic_1_serdes = 2
rx_logic_2_serdes = 1
rx_logic_3_serdes = 0

tx_logic_4_serdes = 4
tx_logic_5_serdes = 5
tx_logic_6_serdes = 6
tx_logic_7_serdes = 7
rx_logic_4_serdes = 7
rx_logic_5_serdes = 6
rx_logic_6_serdes = 5
rx_logic_7_serdes = 4

eth_tx_lane_polarity_port1 = 0xf
eth_rx_lane_polarity_port1 = 0xf
eth_tx_lane_polarity_port2 = 0xf
eth_rx_lane_polarity_port2 = 0xf

; start of '#include "include_QSFP_serdes_prams.h"'

;;Serdes parameters
nego_rx9_ffe_tap0=84
nego_rx9_ffe_tap1=164
nego_rx9_ffe_tap2=251
nego_rx9_ffe_tap3=132
nego_rx9_ffe_tap4=140

nego_rx15_ffe_tap3 = 140
nego_rx15_ffe_tap1 = 140

nego_rx10_ffe_tap3 = 140
nego_rx10_ffe_tap1 = 140

nego_rx8_ffe_tap3 = 140
nego_rx8_ffe_tap1 = 140

force_rx0_slicer_ind_en = 0x0
force_rx0_slicer1_enable = 0x0
force_rx0_slicer2_enable = 0x0
force_rx0_ffe_tap0 = 0xff
force_rx0_ffe_tap1 = 0x80
force_rx0_ffe_tap2 = 0x80
force_rx0_ffe_tap3 = 0x80
force_rx0_ffe_tap4 = 0x80

force_tx0_ob_preemp_pre = 0x40
force_tx0_ob_preemp_post = 0x0
force_tx0_ob_preemp_main = 0x7f
force_tx0_preemp = 0x0
force_tx0_pre_polarity = 0x1
force_tx0_post_polarity = 0x1
force_tx0_main_polarity = 0x0

force_rx1_slicer_ind_en = 0x0
force_rx1_slicer1_enable = 0x0
force_rx1_slicer2_enable = 0x0
force_rx1_ffe_tap0 = 0xff
force_rx1_ffe_tap1 = 0x80
force_rx1_ffe_tap2 = 0x80
force_rx1_ffe_tap3 = 0x80
force_rx1_ffe_tap4 = 0x80

force_tx1_ob_preemp_pre = 0x40
force_tx1_ob_preemp_post = 0x0
force_tx1_ob_preemp_main = 0x7f
force_tx1_preemp = 0x0
force_tx1_pre_polarity = 0x1
force_tx1_post_polarity = 0x1
force_tx1_main_polarity = 0x0

force_rx2_slicer_ind_en = 0xeb
force_rx2_slicer1_enable = 0x0
force_rx2_slicer2_enable = 0x0
force_rx2_ffe_tap0 = 0x64
force_rx2_ffe_tap1 = 0x80
force_rx2_ffe_tap2 = 0xde
force_rx2_ffe_tap3 = 0x80
force_rx2_ffe_tap4 = 0x46

force_tx2_ob_preemp_pre = 0x30
force_tx2_ob_preemp_post = 0x0
force_tx2_ob_preemp_main = 0x7f
force_tx2_preemp = 0x0
force_tx2_pre_polarity = 0x1
force_tx2_post_polarity = 0x1
force_tx2_main_polarity = 0x0

force_rx3_slicer_ind_en = 0xff
force_rx3_slicer1_enable = 0x8
force_rx3_slicer2_enable = 0x8
force_rx3_ffe_tap0 = 0x6c
force_rx3_ffe_tap1 = 0x80
force_rx3_ffe_tap2 = 0xff
force_rx3_ffe_tap3 = 0x80
force_rx3_ffe_tap4 = 0x80

force_tx3_ob_preemp_pre = 0xc
force_tx3_ob_preemp_post = 0x7f
force_tx3_ob_preemp_main = 0x45
force_tx3_preemp = 0x0
force_tx3_pre_polarity = 0x1
force_tx3_post_polarity = 0x0
force_tx3_main_polarity = 0x1
force_tx3_ob_bias = 0xa

auto_ddr_tx_options = 2
auto_ddr_rx_options = 1

auto_qdr_tx_options = 6
auto_qdr_rx_options = 7

preset_tx_fdr_set12_ob_preemp_pre = 17
preset_tx_fdr_set12_ob_preemp_post = 0
preset_tx_fdr_set12_ob_preemp_main=25
preset_tx_fdr_set12_preemp = 0
preset_tx_fdr_set12_pre_polarity = 1
preset_tx_fdr_set12_post_polarity = 1
preset_tx_fdr_set12_main_polarity = 0
preset_tx_fdr_set12_ob_bias = 5

preset_tx_fdr_set13_ob_preemp_main =40
preset_tx_fdr_set13_ob_preemp_pre = 28
preset_tx_fdr_set13_ob_preemp_post = 0
preset_tx_fdr_set13_preemp = 0
preset_tx_fdr_set13_pre_polarity = 1
preset_tx_fdr_set13_post_polarity = 1
preset_tx_fdr_set13_main_polarity = 0
preset_tx_fdr_set13_ob_bias = 5

preset_tx_fdr_set14_ob_preemp_main = 35
preset_tx_fdr_set14_ob_preemp_pre = 25
preset_tx_fdr_set14_ob_preemp_post = 0
preset_tx_fdr_set14_preemp = 0
preset_tx_fdr_set14_pre_polarity = 1
preset_tx_fdr_set14_post_polarity = 1
preset_tx_fdr_set14_main_polarity = 0
preset_tx_fdr_set14_ob_bias = 5

preset_tx_fdr_set15_ob_preemp_main = 30
preset_tx_fdr_set15_ob_preemp_pre = 20
preset_tx_fdr_set15_ob_preemp_post = 0
preset_tx_fdr_set15_preemp = 0
preset_tx_fdr_set15_pre_polarity = 1
preset_tx_fdr_set15_post_polarity = 1
preset_tx_fdr_set15_main_polarity = 0
preset_tx_fdr_set15_ob_bias = 5

preset_tx_mask = 0xfffe

aba_mask0_start = 0
aba_mask0_end   = 3
aba_mask0 = 0x1000
aba_mask1_start = 4
aba_mask1_end   = 5
aba_mask1 = 0x8000
aba_mask2_start = 6
aba_mask2_end   = 10
aba_mask2 = 0x4000
aba_mask3_start = 11
aba_mask3_end   = 16
aba_mask3 = 0x2000

; ABA 40GE
aba_tx2_ob_preemp_pre = 20
aba_tx2_ob_preemp_main = 42
aba_tx2_ob_preemp_post = 8
aba_tx2_ob_bias = 8
aba_tx2_pre_polarity = 1
aba_tx2_post_polarity = 1
aba_tx2_main_polarity = 0

;;3m
aba_tx3_ob_preemp_pre = 22
aba_tx3_ob_preemp_main = 42
aba_tx3_ob_preemp_post = 5
aba_tx3_ob_bias = 8
aba_tx3_pre_polarity = 1
aba_tx3_post_polarity = 1
aba_tx3_main_polarity = 0

aba_tx4_ob_preemp_pre = 26
aba_tx4_ob_preemp_main = 42
aba_tx4_ob_preemp_post = 3
aba_tx4_ob_bias = 8
aba_tx4_pre_polarity = 1
aba_tx4_post_polarity = 1
aba_tx4_main_polarity = 0

aba_tx5_ob_preemp_pre = 60
aba_tx5_ob_preemp_main = 90
aba_tx5_ob_preemp_post = 8
aba_tx5_ob_bias = 8
aba_tx5_pre_polarity = 1
aba_tx5_post_polarity = 1
aba_tx5_main_polarity = 0

aba_tx6_ob_preemp_pre = 80
aba_tx6_ob_preemp_main = 110
aba_tx6_ob_preemp_post = 10
aba_tx6_ob_bias = 8
aba_tx6_pre_polarity = 1
aba_tx6_post_polarity = 1
aba_tx6_main_polarity = 0

aba_tx7_ob_preemp_pre = 75
aba_tx7_ob_preemp_main = 110
aba_tx7_ob_preemp_post = 15
aba_tx7_ob_bias = 8
aba_tx7_pre_polarity = 1
aba_tx7_post_polarity = 1
aba_tx7_main_polarity = 0

aba_fdr_tx16_ob_preemp_pre = 17
aba_fdr_tx16_ob_preemp_post = 0
aba_fdr_tx16_ob_preemp_main=25
aba_fdr_tx16_preemp = 0
aba_fdr_tx16_pre_polarity = 1
aba_fdr_tx16_post_polarity = 1
aba_fdr_tx16_main_polarity = 0
aba_fdr_tx16_ob_bias = 5

aba_fdr_tx17_ob_preemp_main =40
aba_fdr_tx17_ob_preemp_pre = 28
aba_fdr_tx17_ob_preemp_post = 0
aba_fdr_tx17_preemp = 0
aba_fdr_tx17_pre_polarity = 1
aba_fdr_tx17_post_polarity = 1
aba_fdr_tx17_main_polarity = 0
aba_fdr_tx17_ob_bias = 5

aba_fdr_tx18_ob_preemp_main = 35
aba_fdr_tx18_ob_preemp_pre = 25
aba_fdr_tx18_ob_preemp_post = 0
aba_fdr_tx18_preemp = 0
aba_fdr_tx18_pre_polarity = 1
aba_fdr_tx18_post_polarity = 1
aba_fdr_tx18_main_polarity = 0
aba_fdr_tx18_ob_bias = 5

aba_fdr_tx19_ob_preemp_main = 30
aba_fdr_tx19_ob_preemp_pre = 20
aba_fdr_tx19_ob_preemp_post = 0
aba_fdr_tx19_preemp = 0
aba_fdr_tx19_pre_polarity = 1
aba_fdr_tx19_post_polarity = 1
aba_fdr_tx19_main_polarity = 0
aba_fdr_tx19_ob_bias = 5

aba_index0_start = 0
aba_index0_end   = 3
aba_index0 = 0
aba_index1_start = 4
aba_index1_end   = 5
aba_index1 = 3
aba_index2_start = 6
aba_index2_end   = 9
aba_index2 = 2
aba_index3_start = 10
aba_index3_end   = 16
aba_index3 = 1

aba_rx2_slicer_ind_en = 0xeb
aba_rx2_slicer1_enable = 0x0
aba_rx2_slicer2_enable = 0x0
aba_rx2_ffe_tap0 = 0x80
aba_rx2_ffe_tap1 = 0x68
aba_rx2_ffe_tap2 = 0xd7
aba_rx2_ffe_tap3 = 0x80
aba_rx2_ffe_tap4 = 0x5a

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; SFP+ section. all QSFP can be converted to SFP+ using QSA adapter.;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; ETH connected to third party device
aba_non_mlpn_tx8_ob_preemp_pre = 5
aba_non_mlpn_tx8_ob_preemp_post = 0
aba_non_mlpn_tx8_ob_preemp_main = 65
aba_non_mlpn_tx8_ob_bias = 8
aba_non_mlpn_tx8_pre_polarity = 1
aba_non_mlpn_tx8_post_polarity = 1
aba_non_mlpn_tx8_main_polarity = 0
aba_non_mlpn_tx8_preemp = 0


; end of '#include "include_QSFP_serdes_prams.h"'

[PLL]
lbist_en  = 0
lbist_shift_freq  = 3
flash_div = 0x3
lbist_array_bypass = 1
lbist_pat_cnt_lsb = 0x2
core_f = 60
core_r = 14
core_od = 2
en_427_mhz = true
cx3_gen3_def_ffe_tap2 = 201

[FW]
log_flashdev_size = 21
log_flash_sector_size = 2
 
Last edited:

necr

Active Member
Dec 27, 2017
156
48
28
124
Diff to MCX354A-FCCT:

Code:
--- F:\Downloads\fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin\dell.txt
+++ F:\Downloads\fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin\mlx.txt
@@ -1,79 +1,85 @@
-;; Generated automatically by iniprep tool on Tue Jul 30 12:35:34 IDT 2013 from ./cx3pro_dell_baldur_2P_mezz_40g.prs
-;; PRS  FILE FOR DELL BALDUR 40Ge 2P
+;; Generated automatically by iniprep tool on Tue Sep 05 140513 IDT 2017 from .cx3pro_MCX354A_fdr_09v.prs
+;
+;; PRS  FILE FOR FALCON
;; $Id$



[PS_INFO]
-Name = 03CYRK
-Description = MCX384A-BCCA ConnectX-3 Pro Eth; dual port QSFP; 40Gb/s Mezz card for Dell
+Name = MCX354A-FCC_Ax
+Description = ConnectX-3 Pro VPI adapter card; dual-port QSFP; FDR IB (56Gbs) and 40GigE;PCIe3.0 x8 8GTs;RoHS R6
+PRS_name    = cx3pro_MCX354A_fdr_09v.prs

[ADAPTER]
-PSID = DEL1110001023
+PSID = MT_1090111019
pcie_gen2_speed_supported = true
pcie_gen3_speed_supported = true
adapter_dev_id = 0x1007
silicon_rev = 0x00
-vdd_change_to_1_offset = 4
-config_pca9536 = true
-
-gpio_mode1 = 0x08002001
-gpio_mode0 = 0x0580401e
-gpio_default_val = 0x0928600f
-gpio_pull_up = 0xfbabaf0f
-gpio_pull_enable = 0xfffba01f
+vdd_change_to_1_offset = 7
+
+gpio_mode1       = 0x08000001
+gpio_mode0       = 0x04d042fe
+gpio_default_val = 0x0f787f1f
+gpio_pull_up     = 0xff2baf1f
+gpio_pull_enable = 0xfbabbfef
+
receiver_detect_en = true

-pca9536_dir =      0x0
-pca9536_init_val = 0x3
-pca9536_polarity = 0x0
+nv_cfg_en = true
+
+nv_config_sectors = 2

[HCA]
+hca_header_subsystem_id = 0x0003
hca_header_device_id = 0x1007
-hca_header_subsystem_id = 0x0010
-hca_header_class_code = 0x020000
-dpdp_en = false
+dpdp_en = true
eth_xfi_en = true
mdio_en_port1 = 0
-blinking_log_led_mode=1
-pcie_tx_polarity = 0x7
-port1_default_phy_type=2
-port2_default_phy_type=2
-gpio_mapping_ncsi = true
-power_save_enable = 1
-slow_clock_enable = 0
-hi_lo_speed_leds_supported = 1

[IB]
mlpn_en_port0 = true
mlpn_en_port1 = true
-num_of_ports = Two_Ports
+phy_type_port1 = XFI
phy_type_port2 = XFI
-phy_type_port1 = XFI
+module_power_level_supported_port0 = 5
+module_power_level_supported_port1 = 5
+
+ext_phy_board_port1 = FALCON
ext_phy_board_port2 = FALCON
-ext_phy_board_port1 = FALCON
-do_sense = false
-gen_guids_from_mac = true
-ref_clk_to_use = 1
-
+
+new_gpio_scheme_en = true
read_cable_params_port1_en = true
read_cable_params_port2_en = true
-
-backplane_connected = false
-new_gpio_scheme_en = true
-kr_training_enable_port1 = true
-kr_training_enable_port2 = true
+delta_from_edges_offst = 25
+
+spec1_3_fdr10_ib_support_port0 = true
+spec1_3_fdr10_ib_support_port1 = true
+spec1_3_fdr14_ib_support_port0 = true
+spec1_3_fdr14_ib_support_port1 = true
+cx3_spec1_3_ib_support_port0 = true
+cx3_spec1_3_ib_support_port1 = true
+cx3_spec1_2_ib_support_port0 = true
+cx3_spec1_2_ib_support_port1 = true
+mellanox_qdr_ib_support = true
+mellanox_ddr_ib_support = true
+
+port1_802_3ap_56kr4_ability = true
+port2_802_3ap_56kr4_ability = true
+
port1_802_3ap_cr4_enable = true
port2_802_3ap_cr4_enable = true
+port1_802_3ap_cr4_ability = true
port2_802_3ap_cr4_ability = true
-port1_802_3ap_cr4_ability = true
+
+port1_802_3ap_kr4_enable = true
+port2_802_3ap_kr4_enable = true
+port1_802_3ap_kr4_ability = true
+port2_802_3ap_kr4_ability = true

center_mix90phase = true
-eye_open_machine_measure_time =0xd

;;Logic lane to Serdes mapping
-port_swap_en = true
-
tx_logic_0_serdes = 0
tx_logic_1_serdes = 1
tx_logic_2_serdes = 2
@@ -93,13 +99,29 @@
rx_logic_7_serdes = 4

eth_tx_lane_polarity_port1 = 0xf
-eth_rx_lane_polarity_port1 = 0xf
eth_tx_lane_polarity_port2 = 0xf
+eth_rx_lane_polarity_port1 = 0x0
eth_rx_lane_polarity_port2 = 0xf
-
-; start of '#include "include_QSFP_serdes_prams.h"'
+tx_lane_polarity_port1 = 0xf
+tx_lane_polarity_port2 = 0xf
+
+; start of '#include include_QSFP_serdes_prams_bental.h'

;;Serdes parameters
+port0_nego_fdr_mask_en = 0xfffc
+port1_nego_fdr_mask_en = 0xfffc
+port0_nego_fdr10_mask_en = 0xfffc
+port1_nego_fdr10_mask_en = 0xfffc
+
+nego_rx4_slicer_ind_en = 255
+nego_rx4_slicer1_enable = 8
+nego_rx4_slicer2_enable = 8
+nego_rx4_ffe_tap0 = 94
+nego_rx4_ffe_tap1 = 134
+nego_rx4_ffe_tap2 = 245
+nego_rx4_ffe_tap3 = 135
+nego_rx4_ffe_tap4 = 171
+
nego_rx9_ffe_tap0=84
nego_rx9_ffe_tap1=164
nego_rx9_ffe_tap2=251
@@ -131,23 +153,6 @@
force_tx0_pre_polarity = 0x1
force_tx0_post_polarity = 0x1
force_tx0_main_polarity = 0x0
-
-force_rx1_slicer_ind_en = 0x0
-force_rx1_slicer1_enable = 0x0
-force_rx1_slicer2_enable = 0x0
-force_rx1_ffe_tap0 = 0xff
-force_rx1_ffe_tap1 = 0x80
-force_rx1_ffe_tap2 = 0x80
-force_rx1_ffe_tap3 = 0x80
-force_rx1_ffe_tap4 = 0x80
-
-force_tx1_ob_preemp_pre = 0x40
-force_tx1_ob_preemp_post = 0x0
-force_tx1_ob_preemp_main = 0x7f
-force_tx1_preemp = 0x0
-force_tx1_pre_polarity = 0x1
-force_tx1_post_polarity = 0x1
-force_tx1_main_polarity = 0x0

force_rx2_slicer_ind_en = 0xeb
force_rx2_slicer1_enable = 0x0
@@ -300,32 +305,32 @@
aba_fdr_tx16_main_polarity = 0
aba_fdr_tx16_ob_bias = 5

-aba_fdr_tx17_ob_preemp_main =40
-aba_fdr_tx17_ob_preemp_pre = 28
+aba_fdr_tx17_ob_preemp_main =46
+aba_fdr_tx17_ob_preemp_pre = 32
aba_fdr_tx17_ob_preemp_post = 0
aba_fdr_tx17_preemp = 0
aba_fdr_tx17_pre_polarity = 1
aba_fdr_tx17_post_polarity = 1
aba_fdr_tx17_main_polarity = 0
-aba_fdr_tx17_ob_bias = 5
-
-aba_fdr_tx18_ob_preemp_main = 35
-aba_fdr_tx18_ob_preemp_pre = 25
+aba_fdr_tx17_ob_bias = 3
+
+aba_fdr_tx18_ob_preemp_main = 50
+aba_fdr_tx18_ob_preemp_pre = 32
aba_fdr_tx18_ob_preemp_post = 0
aba_fdr_tx18_preemp = 0
aba_fdr_tx18_pre_polarity = 1
aba_fdr_tx18_post_polarity = 1
aba_fdr_tx18_main_polarity = 0
-aba_fdr_tx18_ob_bias = 5
-
-aba_fdr_tx19_ob_preemp_main = 30
-aba_fdr_tx19_ob_preemp_pre = 20
+aba_fdr_tx18_ob_bias = 3
+
+aba_fdr_tx19_ob_preemp_main = 60
+aba_fdr_tx19_ob_preemp_pre = 30
aba_fdr_tx19_ob_preemp_post = 0
aba_fdr_tx19_preemp = 0
aba_fdr_tx19_pre_polarity = 1
aba_fdr_tx19_post_polarity = 1
aba_fdr_tx19_main_polarity = 0
-aba_fdr_tx19_ob_bias = 5
+aba_fdr_tx19_ob_bias = 3

aba_index0_start = 0
aba_index0_end   = 3
@@ -364,7 +369,16 @@
aba_non_mlpn_tx8_preemp = 0


-; end of '#include "include_QSFP_serdes_prams.h"'
+nego_eth_rx12_slicer_ind_en = 0xff
+nego_eth_rx12_slicer1_enable= 0x8
+nego_eth_rx12_slicer2_enable= 0x8
+nego_eth_rx12_ffe_tap0=241
+nego_eth_rx12_ffe_tap1=128
+nego_eth_rx12_ffe_tap2=61
+nego_eth_rx12_ffe_tap3=99
+nego_eth_rx12_ffe_tap4=128
+
+; end of '#include include_QSFP_serdes_prams_bental.h'

[PLL]
lbist_en  = 0
@@ -376,8 +390,8 @@
core_r = 14
core_od = 2
en_427_mhz = true
-cx3_gen3_def_ffe_tap2 = 201

[FW]
+flash_has_suspend_resume = 0
log_flashdev_size = 21
log_flash_sector_size = 2
From what I see, Dell has specifically cut out the IB modes in their 2P ETH version:
Code:
+spec1_3_fdr10_ib_support_port0 = true
+spec1_3_fdr10_ib_support_port1 = true
+spec1_3_fdr14_ib_support_port0 = true
+spec1_3_fdr14_ib_support_port1 = true
+cx3_spec1_3_ib_support_port0 = true
+cx3_spec1_3_ib_support_port1 = true
+cx3_spec1_2_ib_support_port0 = true
+cx3_spec1_2_ib_support_port1 = true
+mellanox_qdr_ib_support = true
+mellanox_ddr_ib_support = true
One way would be to try to align the Serdes values in known working Mellanox VPI config, the other would be to try to add IB capabilities to the Dell FW config (would be my preferred way). You'd need to learn either how to edit FW sections, re-burn and reset the adapter (BeTeP scripts) or find .mlx base file and generate .bin using mlxburn - I don't have the base. mlx file for Pro.
 

Frobbit

New Member
Feb 6, 2017
16
4
3
43
Ok, thanks, this sounds promising! I've tried a couple of things so far, no luck yet but this gives many more options to try :)

First I tried adding those IB related lines above (spec_1_3_fdr10 etc.. ) to the .ini extracted from the Dell Mezzanine image, and then modifying the Mezzanine image:

Code:
python3 mft-scripts/fs2_update_ini.py dell_mcx384_bcaa.hacked.img dell_mcx384_bcaa.hacked.ini
Flashing it and rebooting, the device comes up and it sees a mlx4_0 infiniband device, but the ports are still in Link Layer: Ethernet:

Code:
root@rdellnew1# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.30.4212
        Hardware version: 0
        Node GUID: 0x0002c90300000001
        System image GUID: 0x0002c90300000001
        Port 1:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x0202c9fffe000001
                Link layer: Ethernet
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x0202c9fffe000002
                Link layer: Ethernet
root@rdellnew1# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.30.4212
        node_guid:                      0002:c903:0000:0001
        sys_image_guid:                 0002:c903:0000:0001
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       DEL1110001023
        phys_port_cnt:                  2
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet
So I tried to switch it to VPI or IB with mstconfig but get:

Code:
root@rdellnew1# mstconfig -d 82:00.0 q

-E- Failed to open device: 82:00.0. Unsupported FW (version 2.31.5000 or above required for CX3/PRO)
So the Dell Mezzanine firmware is too old for mstconfig....

Unfortunately it turns out there aren't any newer versions of the Dell firmware available online. As you mentioned earlier, this Mezzanine is a bit problematic. People have spent time searching for newer firmwares but can't find any e.g.

https://forums.servethehome.com/ind...ing-mellanox-mezzanine-cards-for-c6320.29245/

(As an aside, this is actually why I originally was flashing to MCX314A FW earlier, since 2.42.5000 is available for that... )

So my next step was to try using the fs2_update_ini.py script to set the modified Dell Mezzanine .ini into the MCX314A firmware with

Code:
python3 mft-scripts/fs2_update_ini.py fw-ConnectX3Pro-rel-2_42_5000-MCX314A-BCC_Ax-FlexBoot-3.4.752.hacked.bin dell_mcx384_bcaa.hacked.ini
After flashing that, it comes up again and now the link layer shows Infiniband:


Code:
root@rdellnew1# ibstat

CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 0
        Node GUID: 0x0002c90300000001
        System image GUID: 0x0002c90300000001
        Port 1:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0x0002c90300000001
                Link layer: InfiniBand

        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0x0002c90300000002
                Link layer: InfiniBand
However, the issue I had earlier is the same, it does not connect when plugged into a switch. Stays on Down + Polling.

Plugging the QSFP between the two ports of the Mezzanine, it lights up green and as earlier everything looks great:

Code:
root@rdellnew1# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 0
        Node GUID: 0x0002c9000100d050
        System image GUID: 0x0002c9000100d050
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0x0002c9000100d051
                Link layer: InfiniBand
        Port 2:
                State: Initializing
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0x0002c9000100d052
                Link layer: InfiniBand
Code:
root@rdellnew1# opensm &

-------------------------------------------------
OpenSM 3.3.20
Command Line Arguments:
Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 3.3.20

Using default GUID 0x2c9000100d051
Entering DISCOVERING state

Entering MASTER state


root@rdellnew1# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 0
        Node GUID: 0x0002c9000100d050
        System image GUID: 0x0002c9000100d050
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 6
                LMC: 0
                SM lid: 6
                Capability mask: 0x0251486a
                Port GUID: 0x0002c9000100d051
                Link layer: InfiniBand
I also tried using fs2_update_ini.py to set the .ini into the MCX354A Pro (fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752) bin, but I see the same behaviour after reboot.

It seems the .ini modifying approach is still the most promising, but as I'm not familiar with the internals of this stuff I'm not sure what exactly I should be looking at next. So far I have, after a bit of playing around kind of blindly:

Code:
1[&@:~/.../ib/mell/fwhack]164$ diff dell_mcx384_bcaa.hacked.ini dell_mcx384_bcaa.ini

33,34c33

< ;; HACK - MODIFIED - FROM hca_header_subsystem_id = 0x0010
< hca_header_subsystem_id = 0x0006
---
> hca_header_subsystem_id = 0x0010
41,42c40,41
< ;; HACK - DISABLED - port1_default_phy_type=2
< ;; HACK - DISABLED - port2_default_phy_type=2
---
> port1_default_phy_type=2
> port2_default_phy_type=2
58,60c57
< ;; HACK - This is 0 on MCX314A
< ;; HACK - MODIFIED - FROM ref_clk_to_use = 1
< ref_clk_to_use = 0
---
> ref_clk_to_use = 1
67,81c64,65
< ;; HACK - DISABLED - kr_training_enable_port1 = true
< ;; HACK - DISABLED - kr_training_enable_port2 = true
<
< cx3_spec1_3_ib_support_port0 = true
< cx3_spec1_3_ib_support_port1 = true
< cx3_spec1_2_ib_support_port0 = true
< cx3_spec1_2_ib_support_port1 = true
< spec1_3_fdr14_ib_support_port0 = true
< spec1_3_fdr14_ib_support_port1 = true
< spec1_3_fdr10_ib_support_port0 = true
< spec1_3_fdr10_ib_support_port1 = true
< mellanox_ddr_ib_support        = true
< mellanox_qdr_ib_support        = true
---
> kr_training_enable_port1 = true
> kr_training_enable_port2 = true
91c75
< ;; HACK - DISABLED - port_swap_en = true
Any ideas what could be the most fruitful approach to try next? I've tried googling for some specs and explanation of the various parameters in the .ini files but couldn't find anything...
 
Last edited:

necr

Active Member
Dec 27, 2017
156
48
28
124
After plugging the cable to the switch, have you tried forcing the lowest speed - SDR (2,5g), width 1x mode? You can do this on the switch port from an active host in the IB subnet with sudo ibportstate. Also you can force this on the NIC without a link (CX3 Pro).

If there was a good way to measure each RX/TX pair values easily...
 

Frobbit

New Member
Feb 6, 2017
16
4
3
43
Hi, Sorry for the delay in replying... the port speed to SDR idea sounds good, I will try that when I have the system set up for testing again, hopefully soon
 

msl

New Member
Jul 12, 2022
1
0
1
@Frobbit did you ever get this working? I have 4 of these cards in my machine and I am about to try to start using them to create a ceph cluster. It'd be nice to run them in ib mode!