SR-IOV problem with CX3

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

llowrey

Active Member
Feb 26, 2018
167
140
43
I have a Mellanox MCX354A-FCBT (flashed from HP OEM) that I'm running in SR-IOV mode with 8 VFs. The host is using the PF as its primary interface and, at present, I only have one guest running a VF.

The guest is experiencing tons of problem with the VF. Connections will run at full speed for a variable amount of time then hang. When they hang I see messages like the following via dmesg (on the guest). I don't see anything unusual on the host.

Code:
[53005.122622] mlx4_en: eth0: TX timeout on queue: 4, QP: 0xb94, CQ: 0xd6, Cons: 0xffffffff, Prod: 0x3d7
[53005.275385] mlx4_en: eth0: Steering Mode 2
[53441.341698] mlx4_en: eth0: TX timeout on queue: 4, QP: 0xb94, CQ: 0xd6, Cons: 0xffffffff, Prod: 0x3d8
[53441.495372] mlx4_en: eth0: Steering Mode 2
[53472.061348] mlx4_en: eth0: TX timeout on queue: 4, QP: 0xb94, CQ: 0xd6, Cons: 0xffffffff, Prod: 0x3d7
[53472.217599] mlx4_en: eth0: Steering Mode 2
It's always queue 4. I'm always able to open new connections and resume so this glitch does not kill the entire VF, just one active connection at a time.

Host OS: CentOS 8
Guest OS: Fedora 32

Any ideas? Bad card?