Chelsio: "could not connect to FW, error -6"; the amazing self-bricking NIC

Discussion in 'Networking' started by arglebargle, Sep 11, 2018.

  1. arglebargle

    arglebargle H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈

    Joined:
    Jul 15, 2018
    Messages:
    244
    Likes Received:
    75
    Last night I booted a new Chelsio T4 and the Chelsio driver package on the system auto-updated the card's firmware during boot. This morning after another reboot the card is dead:

    Code:
    root@ted:~# dmesg | grep cxgb4
    [    3.937944] cxgb4 0000:01:00.0: enabling device (0000 -> 0002)
    [    3.938325] cxgb4 0000:01:00.0: Could not fetch port params
    [    3.938416] cxgb4 0000:01:00.1: enabling device (0000 -> 0002)
    [    3.938627] cxgb4 0000:01:00.1: Could not fetch port params
    [    3.938705] cxgb4 0000:01:00.2: enabling device (0000 -> 0002)
    [    3.938925] cxgb4 0000:01:00.2: Could not fetch port params
    [    3.939002] cxgb4 0000:01:00.3: enabling device (0000 -> 0002)
    [    3.939216] cxgb4 0000:01:00.3: Could not fetch port params
    [    3.939294] cxgb4 0000:01:00.4: enabling device (0000 -> 0002)
    [    3.939706] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
    [    3.939783] cxgb4 0000:01:00.4: could not connect to FW, error -6
    
    Code:
    root@ted:~# lspci -tv
    -[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Complex
              +-00.2  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) I/O Memory Management Unit
              +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 Graphics]
              +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri HDMI/DP Audio Controller
              +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
              +-02.1-[01]--+-00.0  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
              |            +-00.1  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
              |            +-00.2  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
              |            +-00.3  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
              |            +-00.4  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
              |            +-00.5  Chelsio Communications Inc T420-CR Unified Wire Storage Controller
              |            +-00.6  Chelsio Communications Inc T420-CR Unified Wire Storage Controller
              |            \-00.7  Chelsio Communications Inc Device 0000
              +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
              +-03.2-[02]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
              +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
              +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
              +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
              +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
              +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
              +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
              +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
              +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
              +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
              +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
              +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
              +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
              +-14.4-[03]--
              +-18.0  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 0
              +-18.1  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 1
              +-18.2  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 2
              +-18.3  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 3
              +-18.4  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 4
              \-18.5  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 5
    
    I've made a little progress, unloading and reloading the driver gets the card up in debug mode:

    Code:
    [  390.981065] pps_core: LinuxPPS API ver. 1 registered
    [  390.981068] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
    [  390.982759] PTP clock support registered
    [  391.021623] Chelsio T4/T5/T6 Non-Offload Network Driver - version 3.8.0.2
    [  391.022950] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
    [  391.023033] cxgb4 0000:01:00.4: Firmware failed to return Configuration Space register 16, err = 6
    [  391.023042] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
    [  391.023106] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
    [  391.023169] cxgb4 0000:01:00.4: could not connect to FW, error 6
    [  391.023218] cxgb4 0000:01:00.4: Adapter initialization failed, error 6.  Continuing in debug mode
    [  391.031812] DMA buffer at bus address 0x2cb840000, virtual 0xffff8c8dcb840000
    [  391.031851] cxgb4 0000:01:00.4: Chelsio T420-LL-CR rev 2
    [  391.031854] cxgb4 0000:01:00.4: S/N: -, P/N: 110114640C0
    [  391.031856] cxgb4 0000:01:00.4: No firmware loaded
    [  391.031858] cxgb4 0000:01:00.4: No bootstrap loaded
    [  391.031860] cxgb4 0000:01:00.4: No TP Microcode loaded
    [  391.031862] cxgb4 0000:01:00.4: No Expansion ROM loaded
    [  391.031865] cxgb4 0000:01:00.4: Serial Configuration version: 0x0
    [  391.031867] cxgb4 0000:01:00.4: VPD version: 0x0
    [  391.031871] cxgb4 0000:01:00.4: Configuration: NIC , non-Offload capable
    [  391.031873] eth0: Chelsio T420-LL-CR (eth0) BASE-Fiber_XFI
    [  391.036091] cxgb4 0000:01:00.4 enp1s0f4: renamed from eth0
    
    and later

    Code:
    [  478.022438] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
    
    I've tried manually reloading firmware:

    Code:
    root@ted:/lib/firmware/cxgb4# cxgbtool enp1s0f4 loadcfg t4-config.txt
    root@ted:/lib/firmware/cxgb4# cxgbtool enp1s0f4 loadfw t4fw-1.20.8.0.bin
    root@ted:/lib/firmware/cxgb4# ethtool -i enp1s0f4
    driver: cxgb4
    version: 3.8.0.2
    firmware-version: 1.20.8.0, TP 0.0.0.0
    expansion-rom-version:
    bus-info: 0000:01:00.4
    supports-statistics: yes
    supports-test: no
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: no
    
    But on reboot the card is still dead.

    Anyone know how to fix this? I've been waiting for Chelsio support to return my emails on another subject for about a week, so I doubt there's much hope in help from them.

    Edit: I have a second functional card here, though it's a T420-CR not an T420-LL-CR. If there's any way to dump what I need from the second card I'm happy to give it a try.

    Edit 2: Chelsio support actually wrote back and requested a dump from the card, that was several days ago and I'm waiting on further communication.
     
    #1
    Last edited: Sep 19, 2018 at 8:23 PM
  2. arglebargle

    arglebargle H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈

    Joined:
    Jul 15, 2018
    Messages:
    244
    Likes Received:
    75
    Final word from Chelsio support is that the card had some kind of hardware failure during firmware update. Because the card is EOL they're unable to debug any further.

    I want to stress that there was no evidence of failure during firmware update, everything appeared to complete normally, and the firmware and driver on the machine are from the current Chelsio driver package. Also, all of this happened without any user intervention at all - on load the kernel driver automatically flashes the installed firmware if it's later than the firmware on the card, afaik there's no way to defer or prevent this.

    Anyway, if anyone else encounters the same error message it indicates that the card is bricked.
     
    #2
    Last edited: Sep 19, 2018 at 8:51 PM
  3. Foray

    Foray Member

    Joined:
    May 22, 2016
    Messages:
    30
    Likes Received:
    7
    Is SR-IOV on?
     
    #3
  4. arglebargle

    arglebargle H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈

    Joined:
    Jul 15, 2018
    Messages:
    244
    Likes Received:
    75
    Enabled in the bios but not presently configured or used under Linux.
     
    #4
Similar Threads: Chelsio connect
Forum Title Date
Networking Chelsio T580-CR connected to a Gnodal GS4008 Oct 6, 2015
Networking Looking for a reference list of Chelsio nic part numbers Sep 2, 2018
Networking It works! - Chelsio T520-CR and MikroTik S+RJ10 10GBASE-T Module Aug 8, 2018
Networking Slow 10Gbe SFP+ on Chelsio T520-SO Jun 30, 2018
Networking Chelsio S320 windows blue screen! Mar 4, 2018

Share This Page