R86s U2 SFP+ interfaces not showing up

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

sketch

New Member
Nov 1, 2023
9
0
1
Hi, I recently got an R86s U2 but I'm having trouble getting the SPF+ interfaces to work.

I can see the three 2.5Gbps interfaces but not the two SFP+ ones.

Code:
$ ip l | grep -P '^\d'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
2: enp1s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
3: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

$ lspci| grep -i eth
01:00.0 Ethernet controller: Intel Corporation Device 125c (rev 04)
02:00.0 Ethernet controller: Intel Corporation Device 125c (rev 04)
03:00.0 Ethernet controller: Intel Corporation Device 125c (rev 04)
05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
if I check dmesg I see some errors from mlx4_core.

Code:
$ lsmod | grep mlx
mlx4_core             405504  0

$ sudo dmesg | grep -i mlx
[    1.397624] mlx4_core: Mellanox ConnectX core driver v4.0-0
[    1.397657] mlx4_core: Initializing 0000:05:00.0
[   67.788745] mlx4_core 0000:05:00.0: command 0x34 timed out (go bit not cleared)
[   67.788758] mlx4_core 0000:05:00.0: device is going to be reset
[   67.788761] mlx4_core 0000:05:00.0: crdump: FW doesn't support health buffer access, skipping
[   68.805242] mlx4_core 0000:05:00.0: device was reset successfully
[   68.805246] mlx4_core 0000:05:00.0: Failed to override log_pg_sz parameter
[   68.805248] mlx4_core 0000:05:00.0: Failed to init fw, aborting.
[   69.829020] mlx4_core: probe of 0000:05:00.0 failed with error -5

$ sudo mstconfig q
-E- Failed to open device: /sys/bus/pci/devices/0000:05:00.0/config. Cannot perform operation, Driver might be down.
Has anyone run into this before? I'm hoping I'm missing something and didn't get a dud.

I'm running Ubuntu 22.04 although I also booted into the Openwrt install it shipped with and didn't see the interfaces there either.

Code:
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

$ uname -a
Linux r86su2 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
 

sketch

New Member
Nov 1, 2023
9
0
1
I thought maybe the firmware needed to be updated but best I can tell 2.42.5 should be fine.

Code:
$ sudo mstflint -d 5:00.0 q
Image type:            FS2
FW Version:            2.42.5000
FW Release Date:       5.9.2017
Product Version:       02.42.50.00
Rom Info:              type=UEFI version=14.11.45 cpu=AMD64
                       type=PXE version=3.4.752
Device ID:             4099
Description:           Node             Port1            Port2            Sys image
GUIDs:                 ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
MACs:                                       f45214cd2d06     f45214cd2d07
VSD:                   
PSID:                  MT_1680110023
 

sko

Active Member
Jun 11, 2021
246
129
43
Usually Mellanox ConnectX support ethernet and infiniband, so you'd need to load the appropriate driver for it to be switched in the corresponding mode. The 'core' driver on its own doesn't register any device in the OS.

No idea how Linux handles this, on BSDs to get it running in ethernet mode, you'd simply load the mlx[4/5]en (FreeBSD) or mxc (OpenBSD) driver via rc.conf and thats it...
 
  • Like
Reactions: sketch

sketch

New Member
Nov 1, 2023
9
0
1
Thanks for the suggestion, sko.

I booted Opnsense since I found several references online of people using it successfully with the R86S but no difference there. I tried loading mlx4_en with `kldload mlx4en` but no cigar.

I enabled debugging in mlx4-core and tried loading mlx4-en in ubuntu as well but still no luck.

Code:
$ cat /etc/modprobe.d/mlx4.conf
options mlx4_core debug_level=1

$ sudo modprobe -vv mlx4-en
modprobe: INFO: ../libkmod/libkmod.c:367 kmod_set_log_fn() custom logging function 0x55c0e0783830 registered
insmod /lib/modules/5.15.0-88-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko debug_level=1
insmod /lib/modules/5.15.0-88-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko
modprobe: INFO: ../libkmod/libkmod.c:334 kmod_unref() context 0x55c0e1742440 released

$ lsmod | grep mlx
mlx4_en               155648  0
mlx4_core             405504  1 mlx4_en
The debug logging got me some additional error messages in dmesg.

Code:
$ sudo dmesg | grep mlx
[  542.242739] mlx4_core: Mellanox ConnectX core driver v4.0-0
[  542.242773] mlx4_core: Initializing 0000:05:00.0
[  543.270559] mlx4_core 0000:05:00.0: FW version 2.42.5000 (cmd intf rev 3), max commands 16
[  543.270563] mlx4_core 0000:05:00.0: Catastrophic error buffer at 0x1f020, size 0x10, BAR 0
[  543.270566] mlx4_core 0000:05:00.0: Communication vector bar:2 offset:0x800
[  543.270567] mlx4_core 0000:05:00.0: FW size 385 KB
[  543.270569] mlx4_core 0000:05:00.0: Internal clock bar:0 offset:0x78f50
[  543.270570] mlx4_core 0000:05:00.0: Clear int @ f0058, BAR 0
[  543.274135] mlx4_core 0000:05:00.0: Mapped 26 chunks/6168 KB for FW
[  608.653452] mlx4_core 0000:05:00.0: command 0x34 timed out (go bit not cleared)
[  608.653458] mlx4_core 0000:05:00.0: device is going to be reset
[  608.653484] mlx4_core 0000:05:00.0: crdump: FW doesn't support health buffer access, skipping
[  609.669943] mlx4_core 0000:05:00.0: device was reset successfully
[  609.669968] mlx4_core 0000:05:00.0: Failed to override log_pg_sz parameter
[  609.669970] mlx4_core 0000:05:00.0: Failed to init fw, aborting.
[  610.693692] mlx4_core: probe of 0000:05:00.0 failed with error -5
Looking at the output of `lspci` it still shows the driver as `mlx4_core`. I suspect mlx4-en depends on mlx4-core to successfully initialize the device but I don't know.

Code:
05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
    Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]
    Flags: fast devsel, IRQ 29
    Memory at 7fd00000 (64-bit, non-prefetchable) [size=1M]
    Memory at 6000000000 (64-bit, prefetchable) [size=8M]
    Expansion ROM at 7fc00000 [disabled] [size=1M]
    Capabilities: [40] Power Management version 3
    Capabilities: [48] Vital Product Data
    Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [c0] Vendor Specific Information: Len=18 <?>
    Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
    Capabilities: [148] Device Serial Number ███████████████████████
    Capabilities: [154] Advanced Error Reporting
    Capabilities: [18c] Secondary PCI Express
    Kernel modules: mlx4_core
I also tried setting the port type to ethernet in mlx4.conf and reloading mlx4-en but I'm still seeing the same errors in dmesg.

Code:
$ cat /etc/modprobe.d/mlx4.conf
options mlx4_core debug_level=1 port_type_array=2,2

$ sudo modprobe -vv mlx4-en
modprobe: INFO: ../libkmod/libkmod.c:367 kmod_set_log_fn() custom logging function 0x56440c724830 registered
insmod /lib/modules/5.15.0-88-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko debug_level=1 port_type_array=2,2
insmod /lib/modules/5.15.0-88-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko
modprobe: INFO: ../libkmod/libkmod.c:334 kmod_unref() context 0x56440c97b440 released
 
Last edited:

sko

Active Member
Jun 11, 2021
246
129
43
try setting the "sys.device.mlx4_core0.mlx4_port1" sysctl to "eth". I just checked some FreeBSD hosts with ConnectX3 NICs and I've also set that sysctl on those.
 

sketch

New Member
Nov 1, 2023
9
0
1
try setting the "sys.device.mlx4_core0.mlx4_port1" sysctl to "eth". I just checked some FreeBSD hosts with ConnectX3 NICs and I've also set that sysctl on those.
No luck.

I loaded mlx4en with kldload mlx4en and then tried sysctl sys.device.mlx4_core0.mlx4_port1=eth but I get an unknown oid error.
 

MrTeeJay

New Member
Feb 19, 2019
6
4
3
Curious question, have you installed any SFP modules? I don't think anything shows up if you haven't anything installed in the SFP ports...
 

sketch

New Member
Nov 1, 2023
9
0
1
Curious question, have you installed any SFP modules? I don't think anything shows up if you haven't anything installed in the SFP ports...
Yes I’ve put an Ethernet transceiver in one port and a fiber one in the other. Both transceivers work in CX3 cards I have in other machines.

Come to think of it those other CX3s work out of the box with Ubuntu. I can see mlx4_core and mlx4_en loaded on them in dmesg without errors as well.

I’m starting to wonder if there’s something wrong with the R86S although I’m not sure what my options will be if that’s the case since I ordered it from aliexpress.
 

sko

Active Member
Jun 11, 2021
246
129
43
No luck.

I loaded mlx4en with kldload mlx4en and then tried sysctl sys.device.mlx4_core0.mlx4_port1=eth but I get an unknown oid error.
what's the output of 'sysctl sys | grep mlx' and 'sysctl dev | grep mlx'?

Also I *strongly* suggest not using OPNsense or PFsense but vanilla FreeBSD. Especially the latter ones are using the development branch, which was NEVER intended for production use (and *will* break from time to time, like during the latest change to OpenSSL3 in base) and they also often used beta drivers (with known results...)
Also the crappy middleware will constantly overwrite/change settings you made the "right way" and make easy things hard or impossible.
There's a good chance the NICs won't work because of one/some of those reasons or because they don't include some drivers.

I'm running 6 FreeBSD hosts (12.4-RELEASE & 13.2-RELEASE) with ConnectX3&4 NICs and those NICs work fine there...
 

sketch

New Member
Nov 1, 2023
9
0
1
what's the output of 'sysctl sys | grep mlx' and 'sysctl dev | grep mlx'?

Also I *strongly* suggest not using OPNsense or PFsense but vanilla FreeBSD. Especially the latter ones are using the development branch, which was NEVER intended for production use (and *will* break from time to time, like during the latest change to OpenSSL3 in base) and they also often used beta drivers (with known results...)
Also the crappy middleware will constantly overwrite/change settings you made the "right way" and make easy things hard or impossible.
There's a good chance the NICs won't work because of one/some of those reasons or because they don't include some drivers.

I'm running 6 FreeBSD hosts (12.4-RELEASE & 13.2-RELEASE) with ConnectX3&4 NICs and those NICs work fine there...
I tried again with freebsd 12.4 but I'm getting the same results.

Here is the output of those sysctl commands

Code:
root@:~ # sysctl sys | grep mlx
root@:~ # sysctl dev | grep mlx
dev.mlx4_core.%parent:
The dmesg errors are the same mlx4_core0: Failed to override log_pg_sz parameter and device_attach: mlx4_core0 attach returned 5



Code:
root@:~ # ifconfig | grep '^\w'
igc0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
igc1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
igc2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384

root@:~ # dmesg | grep -i mlx
mlx5en: Mellanox Ethernet driver 3.6.0 (December 2020)

root@:~ # kldload -v mlx4en
Loaded mlx4en, id=7

root@:~ # dmesg | grep -i mlx
mlx5en: Mellanox Ethernet driver 3.6.0 (December 2020)
mlx4_core0: <mlx4_core> mem 0x7fd00000-0x7fdfffff,0x6000000000-0x60007fffff at device 0.0 on pci5
mlx4_core: Mellanox ConnectX core driver v3.6.0 (December 2020)
mlx4_core: Initializing mlx4_core
mlx4_core0: command 0x34 timed out (go bit not cleared)
mlx4_core0: device is going to be reset
mlx4_core0: device was reset successfully
mlx4_core0: Failed to override log_pg_sz parameter
mlx4_core0: Failed to init fw, aborting.
device_attach: mlx4_core0 attach returned 5

root@:~ # freebsd-version
12.4-RELEASE
 

sko

Active Member
Jun 11, 2021
246
129
43
mlx4_core0: command 0x34 timed out (go bit not cleared)
According to a bunch of search results, this error points to MSI being deactivated...

Can you give me the output of 'sysctl -a | grep msi'?
MSI(-X) is not in some legacy-mode or disabled in BIOS?


And just to be clear: this is all bare metal? Not on some hypervisor and SR-IOV magic?
 

sketch

New Member
Nov 1, 2023
9
0
1
According to a bunch of search results, this error points to MSI being deactivated...

Can you give me the output of 'sysctl -a | grep msi'?

Code:
root@:~ # sysctl -a | grep msi
hw.ice.rdma_max_msix: 64
hw.sdhci.enable_msi: 1
hw.puc.msi_disable: 0
hw.pci.honor_msi_blacklist: 1
hw.pci.msix_rewrite_table: 0
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1
hw.mfi.msi: 1
hw.malo.pci.msi_disable: 0
hw.ix.enable_msix: 1
hw.bce.msi_enable: 1
hw.aac.enable_msi: 1
machdep.disable_msix_migration: 0
machdep.num_msi_irqs: 2048
dev.igc.2.iflib.disable_msix: 0
dev.igc.1.iflib.disable_msix: 0
dev.igc.0.iflib.disable_msix: 0
compat.linuxkpi.mlx4_msi_x: 1

MSI(-X) is not in some legacy-mode or disabled in BIOS?
Not that I can tell but I didn't see anything under that name MSI/MSI-X in the BIOS that I could tell.
Could it be under some other name?

And just to be clear: this is all bare metal? Not on some hypervisor and SR-IOV magic?
yeah. baremetal and SR-IOV is disabled in the BIOS too.
 

sketch

New Member
Nov 1, 2023
9
0
1
I tried looking at the mlx4 source but I’m not sure what to make of this.


Code:
mlx4_cfg.log_pg_sz_m = 1;
mlx4_cfg.log_pg_sz = 0;

err = mlx4_MOD_STAT_CFG(dev, &mlx4_cfg);
if (err)
    mlx4_warn(dev, "Failed to override log_pg_sz parameter\n");

Code:
/* Attempt to access reserved or unallocaterd resource: */
    CMD_STAT_BAD_RESOURCE    = 0x05,
 

sketch

New Member
Nov 1, 2023
9
0
1
I compared the lspci output on the r86s to the output on the other machine I have with a working CX3 cards.

The r86s shows:
Code:
Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
On the working machines it's: MSI-X: Enable+ which seems to point at MSI like sko suggested.

I've been looking through the BIOS to see is there's a setting but I haven't found anything yet.
I also tried reseting the BIOS to the defaults but still nothing.
 

jmilleriec

New Member
Feb 24, 2024
1
0
1
Did you ever solve this problem? I'm having the same problem, the SFP interfaces are not showing up under opnsense or vyos...
 

cloudhax

New Member
Feb 29, 2024
18
19
3
I've found that my problem seems intermittent, I seem to semi-reliably see the SFP+ ports normally on the first boot after plugging in the usb-c power input. subsequent reboots the ports are just missing like they don't power up. I'll try another power adapter or just hope it never needs to reboot :rolleyes: