Proxmox upgrade 7.x to 8.x: Intel I350-T4 NIC no longer working (driver issues?)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

cbm128

New Member
Feb 21, 2023
2
1
3
Hello,

I have 2 thin clients (Dell Wyse 5070) with same hw configuration, each with a 4 port I350 Intel NIC.

One host is still at PVE 7.4 and works as expected.

The other one has been reinstalled with PVE 8 (first 8.0 then 8.1, now at 8.1.10).

Since installing PVE 8 the Intel I350 stopped working.

It seems that the 6.x kernels available in 8.x no longer likes the I350.
Prior to upgrade, when the systems was running 7.x, the NIC worked without issues.

During boot the kernel spew some errors, seems like the driver is crashing, when the igb module gets loaded and refuses to use the NIC:

Apr 19 14:52:18 isis kernel: igb: Intel(R) Gigabit Ethernet Network Driver
Apr 19 14:52:18 isis kernel: igb: Copyright (c) 2007-2014 Intel Corporation.
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: flags: 64bit ncq sntf pm clo only pmp pio slum part deso sadm sds apst
Apr 19 14:52:18 isis kernel: idma64 idma64.0: Found Intel integrated DMA 64-bit
Apr 19 14:52:18 isis kernel: igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost
Apr 19 14:52:18 isis kernel: ------------[ cut here ]------------
Apr 19 14:52:18 isis kernel: igb: Failed to read reg 0x18!
Apr 19 14:52:18 isis kernel: WARNING: CPU: 3 PID: 131 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Modules linked in: intel_lpss_pci(+) cqhci igb(+) i2c_i801 intel_lpss xhci_pci(+) xhci_pci_renesas i2c_smbus sdhci i2c_algo_bit idma64 ahci(+) xhci_hcd libahci r8169 dca realtek video wmi pinctrl_geminilake aesni_intel crypto_simd cryptd
Apr 19 14:52:18 isis kernel: CPU: 3 PID: 131 Comm: (udev-worker) Not tainted 6.8.4-2-pve #1
Apr 19 14:52:18 isis kernel: Hardware name: Dell Inc. Wyse 5070 Extended Thin Client/012KND, BIOS 1.29.0 02/05/2024
Apr 19 14:52:18 isis kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Code: c7 c6 03 e4 53 c0 e8 8c 13 8e d9 48 8b bb 28 ff ff ff e8 c0 9d 3c d9 84 c0 74 c1 44 89 e6 48 c7 c7 f8 f0 53 c0 e8 bd 3a be d8 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff c3 cc cc cc cc 66 0f
Apr 19 14:52:18 isis kernel: RSP: 0018:ffffbcffc0363848 EFLAGS: 00010246
Apr 19 14:52:18 isis kernel: RAX: 0000000000000000 RBX: ffff97f712364f38 RCX: 0000000000000000
Apr 19 14:52:18 isis kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 19 14:52:18 isis kernel: RBP: ffffbcffc0363858 R08: 0000000000000000 R09: 0000000000000000
Apr 19 14:52:18 isis kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000018
Apr 19 14:52:18 isis kernel: R13: ffff97f701e7a0c0 R14: ffff97f7123649e0 R15: ffff97f712364000
Apr 19 14:52:18 isis kernel: FS: 0000701e554e48c0(0000) GS:ffff97fb6bd80000(0000) knlGS:0000000000000000
Apr 19 14:52:18 isis kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 14:52:18 isis kernel: CR2: 0000605acc650098 CR3: 00000001120c0000 CR4: 0000000000350ef0
Apr 19 14:52:18 isis kernel: Call Trace:

I reported only the first occurrence of the issue but it repeats for all the 4 ports of the NIC.

So the card is seen as a device but the driver/kernel crash when it initialize it.

I have tried, without success, the following to solve the issue:

- flash the NIC with the latest firmware
- trigger a recreation of the NIC NVM storage (one of the complaints is that the NVM is corrupted) by resetting the NIC to the defaults (bootutil64e -ALL -DEFAULTCONFIG)
- disable/enable the NIC WOL/PXE boot
- update the Wyse 5070 system bios
- installed PVE kernel 6.8 (isis_journalctl-b.txt)
- tried some kernel boot parameter to disable power saving features of the PCIe (pcie_port_pm=off pcie_aspm=off)

Thinking of an hardware issue I also moved the NIC to the other host running Proxmox 7.4, there the NIC worked like a charm.
No NVM corruption messages or any issue.
NIC worked at full speed.

Then I tried booting the Proxmox 8 host with a different OS:
  • ubuntu server 24.04 behave like Proxmox, spewing the same kernel trace
  • with archlinux (2024.04.01 iso), which is based on kernel 6.8.2, the NIC works!!!
What is even more puzzling is that the igb driver version in archlinux (checked with "lsmod igb") is the same as the one in the Proxmox kernel 6.8.4

There must be something else which eludes my analysis capabilities! :)

Possibly something else in the kernel?

But I'm not an expert and I am quite stuck since I would like to have both hosts at PVE8.

Do you have suggestions?

Thanks,

Max
 
  • Like
Reactions: USER189364