Mellanox ConnectX-5 Infiniband + LR4 Transceivers = 10GBit/s SDR?!

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,343
819
113
Hey,

I have a Single Mode LC link which needs to transport 40G Infiniband. That means 40GBase-LR transceivers are my only option and I cannot use 56G MTP SR or 40G SR transceivers.

But now for the issue:
I am using ConnectX-5 cards (Dual Port) with the latest firmware, 16.32.1010 and I just can't get 40G IB to work.

When I plug in 40G LR transceivers, the card always negotiates 10G SDR

I have tried different vendors and different programmings (fs.com, Finisar) and even bought genuine Mellanox LR4 transceivers (MC2210511-LR4) out of pure despair... But no chance.
MC2210511-LR4:
Code:
Infiniband device 'mlx5_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0170
        base lid:        0xffff
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand

Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0171
        base lid:        0xffff
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand
10G in IB Mode, 40G in Eth mode

However, this is not a general limitation.
If I plug in a DAC
Code:
Infiniband device 'mlx5_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0170
        base lid:        0xffff
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            100 Gb/sec (4X EDR)
        link_layer:      InfiniBand

Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0171
        base lid:        0xffff
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            100 Gb/sec (4X EDR)
        link_layer:      InfiniBand
I am seeing the same limitations with some 100G transceivers
Kaiam XQX5170 (100GBase-CWDM4) -> No connection at all in IB mode, 100G in Ethernet
Code:
Infiniband device 'mlx5_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0170
        base lid:        0xffff
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand

Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0171
        base lid:        0xffff
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand
fs.com 100GBase-CWDM4 -> 10G in IB Mode, 100G in Ethernet
Code:
Infiniband device 'mlx5_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0170
        base lid:        0xffff
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand

Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0171
        base lid:        0xffff
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand
Any ideas on how to fix this?
It also bugs me that apparantly standard 100G transceivers do not work with Infiniband...
 

necr

Active Member
Dec 27, 2017
156
48
28
124
Do you have CX5 on both sides? Is there a switch somewhere? Do you have OEM or Mellanox cards (flint -d mlx5_0 q)?
Are you running OpenSM with default settings? Have you tried setting higher speed with "sudo ibportstate"? Does mlxcables (part of mft) give any output?
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,343
819
113
Do you have CX5 on both sides?
Loopback to card itself, as it is a Dual Port Card

Is there a switch somewhere?
No. But as far as I remember, same issue when connecting to an SX6036

Are you running OpenSM with default settings?
No OpenSM running. Just looking for the physical link

Have you tried setting higher speed with "sudo ibportstate"?
Code:
sudo ibportstate 3 1 speed 7 espeed 30 fdr10 0 reset
Forces 40G QDR with a DAC, but port still only links with SDR

Does mlxcables (part of mft) give any output?
Code:
sudo mst cable add && sudo mlxcables
-I- Added 2 cable devices ..
Querying Cables ....

Cable #1:
---------
Cable name    : 4e:00.0_cable_0
>> No FW data to show
-------- Cable EEPROM --------
Identifier        : QSFP+ (0dh)
Technology    : 1310 nm DFB (40h)
Compliance        : 40GBASE-LR4
Wavelength    : 1310 nm
OUI               : 0x0002c9
Vendor            : Mellanox       
Serial number     : DM231500490     
Part number       : MC2210511-LR4   
Revision          : A2
Temperature [c]   : 22 [-10..80]
Digital Diagnostic Monitoring : YES
Length    [m]     : 0 m


Cable #2:
---------
Cable name    : 4e:00.1_cable_1
>> No FW data to show
-------- Cable EEPROM --------
Identifier        : QSFP+ (0dh)
Technology    : 1310 nm DFB (40h)
Compliance        : 40GBASE-LR4
Wavelength    : 1310 nm
OUI               : 0x0002c9
Vendor            : Mellanox       
Serial number     : DM381500129     
Part number       : MC2210511-LR4   
Revision          : A2
Temperature [c]   : 26 [-10..80]
Digital Diagnostic Monitoring : YES
Length    [m]     : 0 m
mlxlink from yesterday gives some info (when FDR10 in ibportstate was not disabled)
Code:
Troubleshooting Info
--------------------
Status Opcode                   : 25
Group Opcode                    : PHY FW
Recommendation                  : FDR10 speed is not supported on non-Mellanox cable.

Tool Information
----------------
Firmware Version                : 16.32.1010
MFT Version                     : mft 4.18.0-106

Module Info
-----------
Identifier                      : QSFP+
Compliance                      : N/A
Cable Technology                : 1310 nm DFB
Cable Type                      : Optical Module (separated)
OUI                             : Mellanox
Vendor Name                     : Mellanox
Vendor Part Number              : MC2210511-LR4
Vendor Serial Number            : DM381500129
Rev                             : A2
Wavelength [nm]                 : 1310
Transfer Distance [m]           : 0
Attenuation (5g,7g,12g) [dB]    : N/A
FW Version                      : 255.255.65535
Digital Diagnostic Monitoring   : Yes
Power Class                     : 3.5 W max
CDR RX                          : N/A
CDR TX                          : N/A
LOS Alarm                       : N/A
Temperature [C]                 : 21 [-5..78]
Voltage [mV]                    : 3301.8 [2970..3630]
Bias Current [mA]               : 32.904,32.826,39.232,30.880 [8..105]
Rx Power Current [dBm]          : 0,0,0,0 [-18..4]
Tx Power Current [dBm]          : 2,2,2,2 [-12..6]
"non-Mellanox cable" - Eh... Genuine Mellanox transceivers.

I have dumped the transceiver EEPROM and it seems like NO Infiniband bits are set in the Mellanox transceiver!
Code:
Type                                                 : SFF-8636
                Identifier [128]                                     : 0x0d (QSFP+)
                Extended Identifier [129]                            : 0xc0
                Extended Identifier Description                      : 3.5 W max. power consumption
                                                                     : No CLEI code present
                                                                     : No CDR in TX, No CDR in RX
                Connector [130]                                      : 0x07 (LC)
                Transceiver Codes [131-138]                          : 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                Transceiver Type                                     : 40G Ethernet: 40G Base-LR4
                Encoding [139]                                       : 0x05 (64B/66B)
                BR, Nominal [140]                                    : 10300
                Rate Identifier [141]                                : 0x00
                Length (SMF) [142]                                   : 10000
                Length (OM3 50um) [143]                              : 0
                Length (OM2 50um) [144]                              : 0
                Length (OM1 62.5um) [145]                            : 0
                Length (Copper or Active cable, OM4 50um [2m]) [146] : 0
                Device Technology [147]                              : Transmitter not tunable
                                                                     : Pin Detector
                                                                     : No wavelength control
                Transmitter Technology [147]                         : 1310 nm DFB
                Vendor [148-163]                                     : Mellanox
                Extended Module (Infiniband Speeds) [164]            : None!
                Vendor OUI [165-167]                                 : 0:2:c9
                Vendor PN [168-183]                                  : MC2210511-LR4
                Vendor Rev [184-185]                                 : A2
                Link Codes [192]                                     : Reserved or unknown (0x00)
                Options [193-195]                                    : 0x01 0x0c 0xd8
                Vendor SN [196-211]                                  : DM381500129
                Date Code [212-219]                                  : 2015-09-18
Decoder works correctly, as it reports
Code:
Extended Module (Infiniband Speeds) [164]            : Infiniband: SDR
                                                                     : Infiniband: DDR
                                                                     : Infiniband: QDR
                                                                     : Infiniband: FDR
for Mellanox FDR cables...
 

necr

Active Member
Dec 27, 2017
156
48
28
124
Loopback to card itself, as it is a Dual Port Card
Would be best to run at least card-to-card, no one would test this loopback mode thoroughly (I wouldn't)

No OpenSM running. Just looking for the physical link
Best practice is to run OpenSM always. Without it, your debugging can also take longer.

sudo ibportstate 3 1 speed 7 espeed 30 fdr10 0 reset
I believe you're shooting yourself in the foot here. In Infiniband, 40G Ethernet bitwise equals a special mode called FDR10, which is an "special extension" to the Infiniband standard. You want to enforce this extension, without it you would only link at either QDR or FDR.
My working configs for OpenSM (enforces compatibility for all nodes):

force_link_speed 15
# 30: Disable extended link speeds
force_link_speed_ext 31

Overall I believe you're on the right track, your xceiver is FDR10 capable, you just need to enforce this speed.
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,343
819
113
Would be best to run at least card-to-card, no one would test this loopback mode thoroughly (I wouldn't)
I doubt this will change anything, but I can try.

Best practice is to run OpenSM always. Without it, your debugging can also take longer.
For issues on the physical layer? I don't really see the point, as an OpenSM configuration can actually introduce new issues (such as a wrong configuration limiting the speed)

I believe you're shooting yourself in the foot here. In Infiniband, 40G Ethernet bitwise equals a special mode called FDR10, which is an "special extension" to the Infiniband standard. You want to enforce this extension, without it you would only link at either QDR or FDR.
My working configs for OpenSM (enforces compatibility for all nodes):
Yes, this is all true, but the transceiver should also be QDR capable.



It doesn't make a different whether I enable fdr10 or not

Code:
sudo ibportstate 3 1 speed 7 espeed 30 fdr10 1 reset
And after it comes up
Code:
Infiniband device 'mlx5_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0170
        base lid:        0x3
        sm lid:          0x3
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand

Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:0c42:a103:006c:0171
        base lid:        0x4
        sm lid:          0x3
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X SDR)
        link_layer:      InfiniBand
(works correctly on DAC cable and limits EDR to FDR10, so not a command issue)
 

necr

Active Member
Dec 27, 2017
156
48
28
124
Yes, this is all true, but the transceiver should also be QDR capable
Some xceivers are - multimode EDR linked at 100, 56, 40 and 32 (QDR), posted here a while ago. And some xceivers are not multi-rate, don’t expect a 10G LAN type xceiver to link at 5G or 2,5G.

The fact that it comes at default 2,5G (SDR) points to a negotiation problem. Have you tried forcing both ends of the link, I.e your 3 1 and 4 1?

Code:
sudo ibportstate 4 1 speed 7 espeed 30 fdr10 1 reset
sudo ibportstate 3 1 speed 7 espeed 30 fdr10 1 reset