Sorry to hear that. It definitely helped in my case, even though the logic behind this had been unclear to me. Furthermore I also tried to find any log relevant to that port blocking and wasn't able to spot anything.
If that helps any, I've been using Mellanox X3 (flashed to ethernet) with both 10G and 40G links towards 6610. I don't have any LAG towards ESXi, it's all active+spare configuration (40G active + 10G spare).
Also ESXi 6.7 and 7.0 behaved exactly the same just reporting physical port down.
I'd ask
@fohdeesha for any hints if you can 100% rule out HW/optics/DACs.
Thanks for the follow up.
Yep i am pretty sure that i have ruled out almost everything
Started on switch #1 (6610) with FS.COM QSFP to SFP+ breakouts - thats where the problems started - and i thought first it was the FS cables (had 3 of them) so tried each one and same problems - so that started me down the path that it was a cable issue. Then purchased some working Arista cables (same QSFP to SFP+ breakouts) and got the same sorts of intermittent issues.
So decided it must be a switch problem (even though it felt very much like a Spanning tree issue)
So purchased a 6610POE and cabled it to the 6610 with a Dell SFP+ DAC 3 metre cable (one that i had on hand that i have had for about 6 years)
Thats when i started having the problems with rebooting of the POE switch - so gave up on that and swapped in a spare 6610 - so i now have the first 6610 which has 2 x 1GB copper links in a Trunk/Channel to my Cisco 4948 (that i am trying to retire) - which is where nearly all my devices are connected.
On this 6610 i have one FS.COM breakout for QSFP to SFP+ and have two of the ports on there connected to a single Dell host on a dual port card and that is working and is solid - as long as i do not make any VLAN changes or other config changes on the ports.
This 6610 also has a fibre SFP+ module going to another Linux host that has not missed a beat at any point.
The 2nd 6610 is now mounted in my rack (and will become the permanent one) and is attached through a single copper DAC cable to one of the 1/3/x ports on the first 6610.
This is the one i am doing all the testing on at the moment and can not nail down.
So i am pretty confident it is not
a cable problem
a switch problem (as in faulty switch)
a card problem (although all of them i have tested have been Intel 82599 based)
a transceiver problem
Last night after a switch restart with the no-spanning tree lines on each of the 1/2/x ports and a host restart i now have two Intel cards in the one host talking to that switch - i will do more VLAN changes and updates tonight and see if it breaks again.
The other thing that sort of points to some form of spanning tree issue is that the port/cable remains blocked - rather than a card issue on the host i.e. if i start plugged into (say) 1/2/2 and it has a problem and drops out - then i can not connect that to anything else and get it to come back up - but i can take the cable for say 1/2/3 and connect that to the same port on the same host and it comes back up at both the ESXi level and the switch Int Brief level.
There must be a table somewhere on the switch of ports that are blocked for whatever reason - but it is not being reported in the logs (or to the sysylog server i have setup) - nor anywhere else i can find
Craig