I've been troubleshooting a fatal PCIe error on Linux that occurs when booting with an HBA card installed. I have gone through a lot of debugging steps; there's too much to dump here, so here's the cliff notes:
- My goal is to run Proxmox. I've ran it on this system before, just without the HBA card.
- I have an R720 server.
- The HBA card is a NetApp 111-00341+F2.
- I'm using the latest Proxmox installer (Linux version 5.15.30).
- Someone else has used this card on Ubuntu 18.04.2 successfully (on a different server).
- There is a thread here linked from here where someone claiming to be a NetApp dev states the cards should work in FreeBSD and Linux.
The PCIe error I get when booting Linux is:
Unfortunately, I can't find any instances of this error on search engines. Plenty of similar errors, but the exact error severity and error type are critical here.
OS's and configs I've tried:
- Proxmox: boots into the above fatal PCIe error.
- Ubuntu 22.04 Desktop: boots into the above fatal PCIe error.
- Ubuntu 16.04.7 Desktop: installer can't boot; kernel panic and PCI13120 error on the front panel screen.
- Arch installer v20220701: boots into the above fatal PCIe error.
- Arch was the fastest to boot so I also tried it these PCI kernel options (none of them booted successfully):
- FreeBSD 13.1 installs and boots successfully. All disks connected to the HBA appear. I didn't try much else with FreeBSD but it seems to work.
I think that FreeBSD working demonstrates that the server and card are both working and compatible. Now the question is, what's different between how FreeBSD and Linux handle PCI? What other dials do I have to tweak Linux's PCI behavior? In general, how do I proceed debugging this problem?
- My goal is to run Proxmox. I've ran it on this system before, just without the HBA card.
- I have an R720 server.
- The HBA card is a NetApp 111-00341+F2.
- I'm using the latest Proxmox installer (Linux version 5.15.30).
- Someone else has used this card on Ubuntu 18.04.2 successfully (on a different server).
- There is a thread here linked from here where someone claiming to be a NetApp dev states the cards should work in FreeBSD and Linux.
The PCIe error I get when booting Linux is:
Code:
PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
device [8086:0e04] error status/mask=00004000/00318000
OS's and configs I've tried:
- Proxmox: boots into the above fatal PCIe error.
- Ubuntu 22.04 Desktop: boots into the above fatal PCIe error.
- Ubuntu 16.04.7 Desktop: installer can't boot; kernel panic and PCI13120 error on the front panel screen.
- Arch installer v20220701: boots into the above fatal PCIe error.
- Arch was the fastest to boot so I also tried it these PCI kernel options (none of them booted successfully):
conf1
, conf2
, nommconf
, noearly
.- FreeBSD 13.1 installs and boots successfully. All disks connected to the HBA appear. I didn't try much else with FreeBSD but it seems to work.
I think that FreeBSD working demonstrates that the server and card are both working and compatible. Now the question is, what's different between how FreeBSD and Linux handle PCI? What other dials do I have to tweak Linux's PCI behavior? In general, how do I proceed debugging this problem?