Did you enable in the Firmware PCIe ACS and AER? PCIe ACS is required for proper granularity of IOMMU Groups INCLUDING SR-IOV Virtual Functions.
IOMMU, SR-IOV and ACS are enabled in BIOS. I'm not quite sure if there is an AER option, but anything that looks remotely like it could be related to this is set to enabled. To make sure I even tried different combinations and the "Auto" setting.
Regarding the Mellanox NIC, it's set like this:
Code:
mlxconfig -d /dev/mst/mt4099_pciconf0 q
Device #1:
----------
Device type: ConnectX3
Device: /dev/mst/mt4099_pciconf0
Configurations: Next Boot
SRIOV_EN True(1)
NUM_OF_VFS 24
LINK_TYPE_P1 ETH(2)
LINK_TYPE_P2 ETH(2)
LOG_BAR_SIZE 3
BOOT_PKEY_P1 0
BOOT_PKEY_P2 0
BOOT_OPTION_ROM_EN_P1 False(0)
BOOT_VLAN_EN_P1 False(0)
BOOT_RETRY_CNT_P1 0
LEGACY_BOOT_PROTOCOL_P1 None(0)
BOOT_VLAN_P1 1
BOOT_OPTION_ROM_EN_P2 False(0)
BOOT_VLAN_EN_P2 False(0)
BOOT_RETRY_CNT_P2 0
LEGACY_BOOT_PROTOCOL_P2 None(0)
BOOT_VLAN_P2 1
IP_VER_P1 IPv4(0)
IP_VER_P2 IPv4(0)
CQ_TIMESTAMP True(1)
Also, "options mlx4_core port_type_array=2,2 num_vfs=8,8,8 log_num_mgm_entry_size=-1"
is set in "/etc/modprobe.d/mlx4_core.conf".
All of this works flawlessly when sticking my older Epyc 7401 into the machine, or even in a Ryzen box.