Hi folks. Long time lurker, first time poster. I have a recently acquired Brocade 6610 that seems to get in a pissing match with my server every morning before it just stops talking to it like a jilted lover. Wondering if anyone else has had similar issues.
Server is running latest edition of Proxmox.
NIC is a Mellanox CX354-FCCT, latest firmware. Port being used is set to ethernet mode. Used from ebay, but shows no problems elsewhere.
Cable is a FS.com QSFP+ 40gbps passive DAC. Brand new from vendor.
Switch is a 6610-48p, updated and fully licensed thanks to a rockstar member here we all know and appreciate. Used from a local sale.
At some point overnight, the 40gb connection will start to flap, up and down within a second, every second until link error disable threshold is reached and the port goes into err-disable for 5 seconds. It'll repeat the cycle until the port is eventually released from err-disable, but the link doesn't come back up. Then I wake up and go replug the DAC and it works fine again all day. Looking up the statistics on the switch shows several hundred InErrors accumulate, but that's all. The server shows no signs anything is wrong (other than losing connection). If it's worth noting, during the day the interface will accumulate a few more InErrors (less than 10) but with no effect on performance or link. Heavy traffic doesn't seem to be the issue, as it successfully performed a VM backup over the connection about an hour before the incident.
I've tried switching NICs and DACs for identical models and configuration. Tried changing to the other QSFP port (yes the 40gb port not the breakouts). Flow control is enabled on the switch and NIC (ethtool says the interface supports pause frame use Symmetric Receive-only, but ethtool -a lists TX and RX = on). Link error disable is set on the interface with a 5 second recovery time. Errdisable recovery is set for all causes and 10 seconds recovery. Network is flat, no VLANs or special configurations apart from the error handling mentioned.
Unfortunately I don't have another switch to test with, but directly linking two servers together on the same port/DAC results in a stable connection with no hiccups.
Has anyone had a similar issue before, or am I chasing ghosts when they're in the house next door? Any troubleshooting tips would be appreciated as well.