ConnectX-3 dual port 40Gbit MCX354-QCBT NICs flashed to FCBT work, but show "cable disconnected" after Windows computer goes to sleep or shuts down

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

unphased

Active Member
Jun 9, 2022
148
26
28
I'm connecting my Windows HTPC in the living room to my NAS/workstation in the office with an MPO fiber run. Using the 10GTek transceivers. The workstation runs Ubuntu 20.04 and is on 24/7 while the Windows PC will be powered down most of the time unless I'm using it.

When I bring it back online the network does not come up. I have to reseat the transceiver at the workstation side. Not that it would be acceptable either, but rebooting the workstation does NOT work either. Needless to say, this is highly inconvenient!

I noticed that on Ubuntu 20.04 the mellanox NIC works out of the box. I also know that it's not possible to get the OFED drivers to do the kernel build because the kernel is too new for those drivers. A bit concerned about that actually. Maybe we can hack the installer to allow it to run on more recent kernels because I am looking to go to 22.04 soon as well.

Anyway since reseating at the workstation side makes it work again I suspect that I just need a better driver there than the one the kernel comes with. But I'm still also hopeful that there is some "reset command" I might be able to run to trigger it to redetect the line when the other machine comes online.

Any tips?
 

prdtabim

Active Member
Jan 29, 2022
170
66
28
That is a well know "bug". The problem is the VPI protocol trying to select between ethernet and infiniband ...
Solution 01: just fix the protocol from auto to ethernet in the properties of the adapter in windows.
Solution 02: fix the protocol to ethernet using mlxconfig
Example: ( adjust the device to match your card )
mst start
mlxconfig -d /dev/mst/mt4099_pci_cr0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2
 

unphased

Active Member
Jun 9, 2022
148
26
28
I have these programs on my linux machine and had them set up before. OK so teh command from my history is:

mlxconfig -d /dev/mst/mt4099_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Note this is pciconf0 instead of your pci_cr0.

I had assumed that since the setting seemed to have "taken" across reboots that I was done with it. But yes at your suggestion it does seem likely the VPI behavior is causing what I'm seeing!!!

I will test some more.
 

unphased

Active Member
Jun 9, 2022
148
26
28
for about two days, during a particular time range when the linux machine was online the whole time, the connection worked as expected; upon waking the windows machine from sleep or cold boot, the connection was made every time. Now I had to bring the linux machine down to reorganize the hard drives and it's back to the strange behavior again.

I have a suspicion that once I run mst start and then get the connection working, that might make the difference. At any rate, at least as reported by mlxconfig, the setting is already set to Ethernet mode on both ports... But I suppose it's possible that it does not report things properly. I should perhaps just try to run the mlxconfig each time the Linux machine reboots also.
 

prdtabim

Active Member
Jan 29, 2022
170
66
28
for about two days, during a particular time range when the linux machine was online the whole time, the connection worked as expected; upon waking the windows machine from sleep or cold boot, the connection was made every time. Now I had to bring the linux machine down to reorganize the hard drives and it's back to the strange behavior again.

I have a suspicion that once I run mst start and then get the connection working, that might make the difference. At any rate, at least as reported by mlxconfig, the setting is already set to Ethernet mode on both ports... But I suppose it's possible that it does not report things properly. I should perhaps just try to run the mlxconfig each time the Linux machine reboots also.
The configurations with mlxconfig are persistent over resets and power cycles.
I use HP connectx-3 pro in a 40Gb/s point to point between 2 linux machines using port 1 and 10Gb/s in port 2 with a CRS309 switch . No issues since fixed at ethernet mode in both machines.
 

unphased

Active Member
Jun 9, 2022
148
26
28
I still can't make this work. It works some of the time but not all of the time, and each time I have to pull put the fiber transceiver at the linux end to get network connection to re-establish. I thought that always keeping MST active (sudo mst start after booting linux side) would do the trick, but nope, on some cold boots from the windows side, I still get network cable disconnected.

This is very frustrating because the transceiver is going to get worn out after repeated insertions, and besides, it's insanely inconvenient if I'm trying to start playing games in the living room but then realize that I have to go upstairs to my office, and bend down under my desk and reach awkwardly.
 

unphased

Active Member
Jun 9, 2022
148
26
28
What else is also super annoying about this. Once I go upstairs to fiddle with the silly cable and come back, i have to restart Steam as well. Since the network is down, so the mapped drive isn't connected, so the steam library folder isn't accessible when steam starts on startup, and the only way to make steam see my installed games is to restart it.

This is way too many hoops to jump through
 

klui

Well-Known Member
Feb 3, 2019
832
455
63
Get a DAC 1-3m cable and test with your system next to your server for a couple of hours. Run through your use cases and see if it still exhibits the problem. If it does it could be your card(s). If not it's your transceiver or cable.
 

unphased

Active Member
Jun 9, 2022
148
26
28
Thanks. I was using hiberrnate but I am very fine with ditching it if necessary. However, the fast startup option is not there.

1661676118102.png

I will also check in the BIOS and switch all fast boot related options off. It already boots really quite fast (around 10 seconds) so it's not really a problem. I suppose I could run the machine 24/7 at least in the cool months but I would prefer not to. If I set up solar power for the house this may become the norm though.
 

unphased

Active Member
Jun 9, 2022
148
26
28
Windows Fast Startup is disabled... It's well hidden but I found it. I'll try swapping to the other dual port cx3 cards that I've got for the Linux machine. But it's not looking promising. Maybe I will try to stick to using sleep instead of hibernate and see if it helps. There are still a few variables like this that I can test.