Dell R630 rNDC in a Dell R620

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

aarcane

Member
Feb 16, 2016
35
1
8
39
So I'm running a Dell R620, and I picked up this nice Mellanox LX EN Connectx-4 rNDC (Part number R887V) which was designed for a Dell R630. Once I get into the OS and am running debian or FreeBSD, it's perfectly detected with no issues, and AFAICT, it passes traffic with no issue. However, when I'm posting the system, I see errors like the following crop up on me, and startup obviously hangs when initializing firmware interfaces trying to negotiate with the firmware. I also don't get any option rom configuration options, or PXE boot options from the rNDC, so I strongly suspect that it's related to the Mellanox rNDC. But I'm not sure how to fix it, or if it's possible to do so with software or firmware.

Has anybody run into this problem before? If so, how can I get these errors and the associated autoconfiguration slowdown on boot to go away? How can I enable PXE booting from these cards? If all else fails, will erasing the existing option ROM using Mellanox tools solve the error?

Code:
Plug & Play Configuration Error:
Option ROM Device Location Table Error
 Bus#01/Dev#00/Func#1: Unknown PCI Device
 
Plug & Play Configuration Error:
IRQ Allocation
 Bus#01/Dev#00/Func#0: Unknown PCI Device

Plug & Play Configuration Error:
IRQ Allocation
 Bus#01/Dev#00/Func#1: Unknown PCI Device
Note: Error message transcribed by hand from a screenshot, so any minor typos are mine, not Dell's
 

oneplane

Well-Known Member
Jul 23, 2021
845
484
63
Are you booting in UEFI or Legacy CSM mode? Maybe the option ROM only works in one mode but not the other.

Edit: digging a little bit in my memory and checking google, I see that this exact issue is something I had with a 7xx series, and apparently so did someone else. It's definitely an issue with the generational difference where the firmware can't use the data provided by the PCI device to automatically setup memory regions for the OptionROM and setup things like IRQs etc. That is not really a problem if you aren't trying to use iSCSI or PXE during boot, but you can't really disable this either since the system will try to discover all PCI devices using PnP during firmware startup. Technically it should be feasible to tell the firmware to ignore a certain bus/device combo, but I doubt that's something that has been added to any commercial firmware.
 
Last edited:

oneplane

Well-Known Member
Jul 23, 2021
845
484
63
It's a firmware level issue for which there is no fix that I'm aware of?
Yeah it's essentially the firmware not knowing what to do with the PCI device data. Sometimes it only happens during CSM launch but most of the time this is just a DXE phase UEFI limitation and while Dell could fix that, they would rather just sell a new server matching the 'generations' of devices together. Side-effect is that other PCI devices suffer from the same issue.
 

LodeRunner

Active Member
Apr 27, 2019
540
227
43
Yeah it's essentially the firmware not knowing what to do with the PCI device data. Sometimes it only happens during CSM launch but most of the time this is just a DXE phase UEFI limitation and while Dell could fix that, they would rather just sell a new server matching the 'generations' of devices together. Side-effect is that other PCI devices suffer from the same issue.
Yeah. Isn't the rDNC connector on the Rx20 series PCIe 2.0 and in the Rx30 they bumped it to PCIe 3.0? I seem to recall that being a thing, though PCIe is backwards compatible and Dell just doesn't want certify everything and bother with a firmware update and everything that entails.
 

neggles

is 34 Xeons too many?
Sep 2, 2017
62
37
18
Melbourne, AU
omnom.net
I've run these in 12G and 13G with no trouble - what you have to do is switch bootmode to UEFI-only, disable network booting / the UEFI network stack, and disable option ROM load for those slots. otherwise you can just wait out the slow boot :p
 

oneplane

Well-Known Member
Jul 23, 2021
845
484
63
I've run these in 12G and 13G with no trouble - what you have to do is switch bootmode to UEFI-only, disable network booting / the UEFI network stack, and disable option ROM load for those slots. otherwise you can just wait out the slow boot :p
That's what it generally boils down to yes. On some older firmwares there is a weird PEI stage that just 'breaks' no matter what you do when you don't use the "whitelisted" parts from the manufacturer (or use just 'dumb' parts). That said, the only reason you'd want a device in your system accessed by your firmware would be if you wanted to use it for pre-boot I/O or use it for boot selection. If you just want to use it inside an OS, the firmware really doesn't need to know about it.
 

aarcane

Member
Feb 16, 2016
35
1
8
39
So I tried to 'delete' the oprom, and flint told me it couldn't delete the oprom because it contained important data. So I gave up on that, then I tried to set the configuration to enable the 62 SRIOV VFIs that the device allegedly supports, and when I rebooted, debian was like "Is this a NIC?" and hung after loading the initrd but before outputting anything else to console, and when I tried to load the MST service on Windows 2019, windows shat itself and rebooted, so sadly, I don't have the card any longer (in a useful state). I'm not about to try booting into Linux then hot plugging the RNDC unless I see someone else reputable show it as an actually viable solution, since I'm not rich enough to replace R620s left and right, but as of now, the card is a paperweight for me unfortunately.