Chelsio: "could not connect to FW, error -6"; the amazing self-bricking NIC

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
657
245
43
Last night I booted a new Chelsio T4 and the Chelsio driver package on the system auto-updated the card's firmware during boot. This morning after another reboot the card is dead:

Code:
root@ted:~# dmesg | grep cxgb4
[    3.937944] cxgb4 0000:01:00.0: enabling device (0000 -> 0002)
[    3.938325] cxgb4 0000:01:00.0: Could not fetch port params
[    3.938416] cxgb4 0000:01:00.1: enabling device (0000 -> 0002)
[    3.938627] cxgb4 0000:01:00.1: Could not fetch port params
[    3.938705] cxgb4 0000:01:00.2: enabling device (0000 -> 0002)
[    3.938925] cxgb4 0000:01:00.2: Could not fetch port params
[    3.939002] cxgb4 0000:01:00.3: enabling device (0000 -> 0002)
[    3.939216] cxgb4 0000:01:00.3: Could not fetch port params
[    3.939294] cxgb4 0000:01:00.4: enabling device (0000 -> 0002)
[    3.939706] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
[    3.939783] cxgb4 0000:01:00.4: could not connect to FW, error -6
Code:
root@ted:~# lspci -tv
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Complex
          +-00.2  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) I/O Memory Management Unit
          +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 Graphics]
          +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri HDMI/DP Audio Controller
          +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
          +-02.1-[01]--+-00.0  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
          |            +-00.1  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
          |            +-00.2  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
          |            +-00.3  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
          |            +-00.4  Chelsio Communications Inc T420-CR Unified Wire Ethernet Controller
          |            +-00.5  Chelsio Communications Inc T420-CR Unified Wire Storage Controller
          |            +-00.6  Chelsio Communications Inc T420-CR Unified Wire Storage Controller
          |            \-00.7  Chelsio Communications Inc Device 0000
          +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
          +-03.2-[02]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
          +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
          +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
          +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
          +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
          +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
          +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
          +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
          +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
          +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
          +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
          +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
          +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
          +-14.4-[03]--
          +-18.0  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 0
          +-18.1  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 1
          +-18.2  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 2
          +-18.3  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 3
          +-18.4  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 4
          \-18.5  Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 5
I've made a little progress, unloading and reloading the driver gets the card up in debug mode:

Code:
[  390.981065] pps_core: LinuxPPS API ver. 1 registered
[  390.981068] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[  390.982759] PTP clock support registered
[  391.021623] Chelsio T4/T5/T6 Non-Offload Network Driver - version 3.8.0.2
[  391.022950] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
[  391.023033] cxgb4 0000:01:00.4: Firmware failed to return Configuration Space register 16, err = 6
[  391.023042] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
[  391.023106] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
[  391.023169] cxgb4 0000:01:00.4: could not connect to FW, error 6
[  391.023218] cxgb4 0000:01:00.4: Adapter initialization failed, error 6.  Continuing in debug mode
[  391.031812] DMA buffer at bus address 0x2cb840000, virtual 0xffff8c8dcb840000
[  391.031851] cxgb4 0000:01:00.4: Chelsio T420-LL-CR rev 2
[  391.031854] cxgb4 0000:01:00.4: S/N: -, P/N: 110114640C0
[  391.031856] cxgb4 0000:01:00.4: No firmware loaded
[  391.031858] cxgb4 0000:01:00.4: No bootstrap loaded
[  391.031860] cxgb4 0000:01:00.4: No TP Microcode loaded
[  391.031862] cxgb4 0000:01:00.4: No Expansion ROM loaded
[  391.031865] cxgb4 0000:01:00.4: Serial Configuration version: 0x0
[  391.031867] cxgb4 0000:01:00.4: VPD version: 0x0
[  391.031871] cxgb4 0000:01:00.4: Configuration: NIC , non-Offload capable
[  391.031873] eth0: Chelsio T420-LL-CR (eth0) BASE-Fiber_XFI
[  391.036091] cxgb4 0000:01:00.4 enp1s0f4: renamed from eth0
and later

Code:
[  478.022438] cxgb4 0000:01:00.4: Firmware reports adapter error: During Device Preparation
I've tried manually reloading firmware:

Code:
root@ted:/lib/firmware/cxgb4# cxgbtool enp1s0f4 loadcfg t4-config.txt
root@ted:/lib/firmware/cxgb4# cxgbtool enp1s0f4 loadfw t4fw-1.20.8.0.bin
root@ted:/lib/firmware/cxgb4# ethtool -i enp1s0f4
driver: cxgb4
version: 3.8.0.2
firmware-version: 1.20.8.0, TP 0.0.0.0
expansion-rom-version:
bus-info: 0000:01:00.4
supports-statistics: yes
supports-test: no
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
But on reboot the card is still dead.

Anyone know how to fix this? I've been waiting for Chelsio support to return my emails on another subject for about a week, so I doubt there's much hope in help from them.

Edit: I have a second functional card here, though it's a T420-CR not an T420-LL-CR. If there's any way to dump what I need from the second card I'm happy to give it a try.

Edit 2: Chelsio support actually wrote back and requested a dump from the card, that was several days ago and I'm waiting on further communication.
 
Last edited:

arglebargle

H̸̖̅ȩ̸̐l̷̦͋l̴̰̈ỏ̶̱ ̸̢͋W̵͖̌ò̴͚r̴͇̀l̵̼͗d̷͕̈
Jul 15, 2018
657
245
43
Final word from Chelsio support is that the card had some kind of hardware failure during firmware update. Because the card is EOL they're unable to debug any further.

I want to stress that there was no evidence of failure during firmware update, everything appeared to complete normally, and the firmware and driver on the machine are from the current Chelsio driver package. Also, all of this happened without any user intervention at all - on load the kernel driver automatically flashes the installed firmware if it's later than the firmware on the card, afaik there's no way to defer or prevent this.

Anyway, if anyone else encounters the same error message it indicates that the card is bricked.
 
Last edited: