[SOLVED] 1Gb uplink from SX6036 does not re-enable after power cycle. Needs transceiver resocketing

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

unphased

Active Member
Jun 9, 2022
148
26
28
I use a 10Gtek 1 (1.25?) Gb RJ45 to SFP+ transceiver (from amazon) and a SFP+ to QSFP+ adapter (ebay listing title was HP MELLANOX 655902-001 655874-B21 QSFP/SFP ADAPTER) to connect my LAN switches (going through one unifi switch flex mini powered on USB-C to reach my router) to my SX6036.

I set up my basement "cluster" of a few machines on 40Gbit off the switch. Am currently away on a trip for several months. Had a power outage early Christmas Day morning. Now, the switch just reports for this uplink port Cable Disconnected.

I already knew of this behavior from before, to be honest. I think I just didn't want to believe it. Yes, I had to manually configure the specific port to 1Gb in the switch web ui to allow it to work. It's just very frustrating that even when I reboot the switch, this uplink port does not come online on the mellanox side. From the Unifi web ui it shows all of my ports are active. So the unifi switch flex side of the connection seems to be happy. I have done a firmware upgrade on both of these switch flex's, hoping that would do something, but it doesn't.

Does anyone know of different things I can attempt from the mellanox webui or the console (I am connected to the switch with both serial and ethernet on mgmt) to try to bring the port back online? I'm really hoping not to have to send my neighbor back into the house one more time. The first time was because I failed to setup tailscale etc on not a one of my many raspberry pi's before the power outage and got screwed on remote access that way, but that has now been rectified... i was able to connect to my workstation (which is behind this mellanox switch to my LAN) via my pre-configured wifi on the neighbor's network and I was able to use a shell script that I wrote with a wifi re-connect fallback to re-point the wifi at my home routers. I might expand this script into a long running one that checks for internet and round robin attempts reconnection across all the available wifi networks so I can stay connected better whenever either of the ISP's available to me is down, it seems like this is a less frequent type of event than short power outages, but at least it's something I can do.

I might attempt to reboot both the mellanox switch and the unifi switch flex units at the same time in hopes it will bring it up somehow. But I think it's some bad state that the transceiver gets itself into.

I guess technically I am not losing a huge amount of capability with the mellanox switch not being able to link to the rest of the LAN as I should be able to set up DHCP on it, so I could gain high speed networking between the machines in my cluster, they just won't be able to get internet through them, but internet access would be slow over wifi only. So I would probably send my neighbor in to resocket the transceiver. I can't believe that I am currently contemplating building an arduino/raspi solenoid actuator to resocket network transceivers. I guess the clear next step when I'm home would be to try a bunch of other equipment combos until I end up with something that is actually solid.

Really hoping there is a way to hard reset/cycle the port from the mellanox switch cli or something.
 

unphased

Active Member
Jun 9, 2022
148
26
28
I guess I should also put this switch behind a network switchable smart plug. As I suspect that the transceiver can be woken up by physical power cycling instead of a switch reboot. What I'm trying now is disabling the port, then rebooting the switch, and re-enabling it; possibly rebooting again after that. worth a shot but doubt it will improve the situation.
 

unphased

Active Member
Jun 9, 2022
148
26
28
once my helpful neighbor came to help me replug the cable, i realized I misremembered which port i plugged it into, which is important to get right for configuring it from the webui of course... now i'm back up and i am still hopeful it may come back up properly on its own at the next power outage. I am also starting to wonder if i have to manually save any and all config (such as 1Gbit mode for a given ethernet port) for it to survive to the next reboot, which is likely what actually screwed me this time! It's possible that it is not suffering any physical requirement to reseat the transceiver, I'll have to re-evaluate that.

Yeah not saving config to disk without a manual action actually seems extremely likely how they built this thing, now that I think about it. I got my brain rotted by more "modern" settings UIs, manual save is definitely a pre-2020 software design, a solid and predictable one. Now in any event, I've got tons of backup raspberry pi tailscale setups so I will be able to survive a wiped mellanox switch config.