Mellanox/Nvidia Connectx-7 fw update

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Civiloid

Member
Jan 15, 2024
36
22
8
what are you using to test small frame sizes?
Cisco trex (with some custom patches, but that is a long story) and for now - just loop-back QSFP28 100G DAC (I should get QSFP112 later).

I have bunch of Mellanoxes, and on CX6-DX I can easily get about 110 Mpps TX per port and 80 Mpps RX per port. On this one - 120 Mpps TX per port, but total RX is 60 Mpps, combined for both ports and once it crosses that level one of the ports locks up and stops receiving/transmiting anything. And according to the changelog that was one of the issues that was fixed in newer firmwares (at some point in 2023)

What are your system specs?
2xXeon 8490H ES E0, 16x16GB Ram, Debian 12, OFED 24.01-0.3.3.1, cisco trex from master git branch and with few patches (basically I have my own fork where it have bundled dpdk 23.11 and some patches to make it work on higher core count machines with sub-numa clustering enabled)

I have a backup machine that is just slightly overclocked Xeon W5-3435X and has 8x16GB RAM, the same OS, and the same software, though.

What does the output of lspci -vv look like for the pci id of the cx7 nic?
Code:
# lspci -vv -s 98:00.0
98:00.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
    Subsystem: Mellanox Technologies MT2910 Family [ConnectX-7]
    Physical Slot: 1
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 32 bytes
    Interrupt: pin A routed to IRQ 16
    NUMA node: 4
    IOMMU group: 74
    Region 0: Memory at 20bff4000000 (64-bit, prefetchable) [size=32M]
    Capabilities: [60] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75W
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
            MaxPayload 512 bytes, MaxReadReq 4096 bytes
        DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 32GT/s, Width x16, ASPM not supported
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 32GT/s, Width x16
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
             10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp- ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
             AtomicOpsCtl: ReqEn+
        LnkCap2: Supported Link Speeds: 2.5-32GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
        LnkCtl2: Target Link Speed: 32GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
             EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [48] Vital Product Data
        End
    Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
        Vector table: BAR=0 offset=00002000
        PBA: BAR=0 offset=00003000
    Capabilities: [c0] Vendor Specific Information: Len=18 <?>
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 04, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
        ARICap:    MFVC- ACS-, Next Function: 1
        ARICtl:    MFVC- ACS-, Function Group: 0
    Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
        IOVCap:    Migration- 10BitTagReq- Interrupt Message Number: 000
        IOVCtl:    Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
        IOVSta:    Migration-
        Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
        VF offset: 2, stride: 1, Device ID: 101e
        Supported Page Size: 000007ff, System Page Size: 00000001
        Region 0: Memory at 000020bffa000000 (64-bit, prefetchable)
        VF Migration: offset: 00000000, BIR: 0
    Capabilities: [1c0 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Capabilities: [230 v1] Access Control Services
        ACSCap:    SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        ACSCtl:    SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
    Capabilities: [320 v1] Lane Margining at the Receiver <?>
    Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
    Capabilities: [3b0 v1] Extended Capability ID 0x2a
    Capabilities: [420 v1] Data Link Feature <?>
    Kernel driver in use: mlx5_core
    Kernel modules: mlx5_core
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
the good news is that the link capacity and more importantly link status look correct: 32GTs and x16.
also, the qsfp112 cx7 is a lot easier to use than the osfp version.

You should be using a qsfp56 dac as each port will use the full capacity (200G).

What is your eventual use case for your setup?

Can you also let us know additional info on the changelog that describes the low performance with small pkt sizes?
 

Civiloid

Member
Jan 15, 2024
36
22
8
the good news is that the link capacity and more importantly link status look correct: 32GTs and x16.
Yup, that I checked.

You should be using a qsfp56 dac as each port will use the full capacity (200G).
So far I'm way below even 100G port speed on RX. But I eventually will get few DACs.


What is your eventual use case for your setup?
Dev machine to develop packet generator as I'm very much unhappy with trex. So I wanted to have different generations of different cards (I have CX4, 5 and 6, BlueField2 and Intel 710 already), and eventually I want to do regression tests.

Can you also let us know additional info on the changelog that describes the low performance with small pkt sizes?
If you'll go to nvidia's website, you'll see that all firmwares below 28.39.1002 was retracted "due to critical issue". And for 28.39.1002 changelog have A LOT of interesting stuff:
  • Description: Fixed an issue that led to packet drops on lossless fabric due to an Rx buffer overflow.
  • Description: Fixed a HW bug that resulted in transaction loss that when cache replacement transaction occurs in parallel to code transcoding.
  • Description: Fixed a statics issue that caused the i2c access to module to lock and stuck the switch. --- that is probably what I have.

And there were more fixes like that in between:
  • Description: When connecting a ConnectX-7 adapter card to ConnectX-7 adapter card and one side is configured to RM Loopback, and the port is toggled, link flap maybe experienced.
  • Description: Fixed inconsistent TCP performance when sending multiple streams

And so on, that seems rather important to me and that might be related to my exact use case.



What is interesting for me, that initial firmware version for CX7 is 28.33.2028, but what I got is 28.33.0751 which is obviously older than that.



Meanwhile, I've tried to force-flash the card with mstflint with no luck:
My idea was that in the commit from few years ago they removed possibility to flash encrypted firmware on unencrypted device, so I restored the code around reflashing to that state. However result was that I temporary bricked my card - it was still treating FW as unencrypted and refused to start, so I flashed original one back. I needed to do
Code:
mstflint -d ${PCI_ID} -i ${FW} -nofs --ignore_dev_data burn
to make it work again.
 
Last edited:

Civiloid

Member
Jan 15, 2024
36
22
8
CleanShot 2024-03-23 at 20.05.55.png

That is what I get when I'm trying to run trex test. And behavior is extremely weird, I'm not getting proper performance on a single port configuration either:
CleanShot 2024-03-23 at 20.08.49.png

Just for comparison, that is CX6 (queue_full events are unfortunately normal on line rate on cx6, it can't do more than ~220 Mpps TX and 160Mpps RX):
CleanShot 2024-03-23 at 20.10.20.png

CleanShot 2024-03-23 at 20.11.58.png
 
  • Like
Reactions: nexox

Civiloid

Member
Jan 15, 2024
36
22
8
please try with a qsfp56 dac and let us know the results.
I will probably have QSFP56 DACs in a few weeks, unless they're delayed for any reason. But honestly, I don't expect them to behave much better. Considering that DPDK might require certain firmware in the future, unless there is a way to make it work with encrypted firmware, I'll just resell it.

Btw, I saw your posts in the CX6 thread, where you had some success with mtusb-1. By any chance, have you tried the same approach with CX7?