spent a bit more time on the board with a bad nic, still not working but adding some notes here.
to be more precise, the issue is that both 10G interfaces on node1 of this board do not negotiate a connection (node2 is fine).
- activity/connectivity LEDs do not light up, except (steady) for a few seconds when the board is first powered up
- bios configuration is identical on both nodes, jumper configuration is identical to another working board
- probed all the pins of the SFP+ cage, all voltages were comparable across the two nodes, with (predictably) more variability on the differential rx/tx pairs of the working node
- probed a handful of voltages around the Inphi PHY, all seem to be comparable to a working node
- infrared camera shows similar temperatures for all components around the transceiver cages / PHYs
- I realized that the SFP cages/connectors are snap-fit and non-soldered, so was also able to rule out the connection between the transceiver and the board by swapping the connectors and trying the same connector/transceiver pairing across two nodes (kinda already ruled them out as a failure point by probing the sfp pins though)
- the OS does see two 10G network interfaces and seems to be able to interact with them in a limited capacity
- `ethtool -p` is not able to blink interface LEDs, `ethtool -r` does not have an effect
- I can get an eeprom dump using `ethtool -e`, and it's mostly similar to one from a working node.
- most surprising (and encouraging!) to me, `ethtool -m` can actually interact with the transceiver and get real data. nothing different from a working node with the same 10GBaseT transceiver outside of SN and temperature/voltage readings.
- in case you're wondering: yes, I was just going down the list of options in `ethtool -h` to see what I could do with it at this point
I also followed traces on the bottom of the board between the SFP pins and the FPGA - if I'm remembering correctly the only exposed traces are running to 2 (tx fault), 3 (tx disable) and 8 (receiver loss of signal). There's clearly a lot of buried traces going to other pins, so I can't rule out additional connectivity to the FPGA - I still plan to watch for chattiness at boot at some point.
I keep coming back to a theory that the FPGA is for some reason setting the tx disable / fault pins, but the measured voltages don't support that theory. So, unsure if I'll get much further without finally teaching myself to use a logic analyzer and inspecting protocol data on the RX/TX lines...which does sound fun, but not exactly where I was hoping this project would take me