Proxmox network - not working with Xeon D 10Gbase-T NIC

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
I have been having a strange issue. I added a new Xeon D node to the Proxmox hosting cluster. There are four other Xeon D nodes and I set this one up the exact same way as another node. The primary 1GbE cluster NIC works fine.

I have tried using eth2 and making a vmbr1 with the first 10Gbase-T NIC. I am trying to put that NIC on 10.0.104.0/24 as a dedicated Ceph NIC.

The lights on the physical machine's NICs light up but I cannot get the link to go up when I check via ethtool and the NIC says "no broadcast on ip a"
Code:
16: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:25:90:5d:74:84 brd ff:ff:ff:ff:ff:ff
    inet 10.0.104.208/24 brd 10.0.104.255 scope global vmbr1
Code:
# ethtool vmbr1
Settings for vmbr1:
        Link detected: no
Any ideas on this one?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
OK - now enter the bizarre:

Code:
# ping 10.0.104.201
PING 10.0.104.201 (10.0.104.201) 56(84) bytes of data.
64 bytes from 10.0.104.201: icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from 10.0.104.201: icmp_seq=2 ttl=64 time=0.163 ms
64 bytes from 10.0.104.201: icmp_seq=3 ttl=64 time=0.152 ms
64 bytes from 10.0.104.201: icmp_seq=4 ttl=64 time=0.196 ms
64 bytes from 10.0.104.201: icmp_seq=5 ttl=64 time=0.203 ms
How I fixed it: watched the first part of the Sunday Night Football Patriots-Cardinals game. Seriously, it just started working.
 

Jb boin

Member
May 26, 2016
49
16
8
36
Grenoble, France
www.phpnet.org
Its the bridge interface, not the physical one so it might be a configuration or software issue.
What is the state of the physical interface that is part of the bridge and the output of "brctl show"?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
@Jb boin That was my guess but I tried it on the non-bridged network (e.g. just enabling eth2 as well.) Seems like it decided to start working... strange.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
@_alex here is the output:

Code:
vmbr1
 bridge id              8000.9e59f4c243c1
 designated root        8000.9e59f4c243c1
 root port                 0                    path cost                  0
 max age                  20.00                 bridge max age            20.00
 hello time                2.00                 bridge hello time          2.00
 forward delay             0.00                 bridge forward delay       0.00
 ageing time             300.00
 hello timer               0.00                 tcn timer                  0.00
 topology change timer     0.00                 gc timer                 186.05
 flags


eth2 (1)
 port id                8001                    state                  disabled
 designated root        8000.9e59f4c243c1       path cost                100
 designated bridge      8000.9e59f4c243c1       message age timer          0.00
 designated port        8001                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags
Trying to force the link to 10GbE. Also removing the bridge to see if that helps.
 

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
hm, obviously the phy eth2 down ...
i'd try to get a direct link between two hosts up, without any switch and then go further. i wouldn't be surprised if there is still a driver issue with those nics :(

recently had problems with 10gbe on my connectx-2 sfp+ ports. it turned out it was an stp issue on the bridges.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
It is strange that this one is not coming up:
Code:
root@fmt-pve-08:~# ip link show | grep UP
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
14: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
And I did double-check the physical port was lit and the switch port was as well.
 

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
NO-CARRIER looks like a Layer-1 problem, but strange that the lites show ok.
tried to connect to eth3 / the other 10gbe port, and see if it still states no-carrier?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
NO-CARRIER looks like a Layer-1 problem, but strange that the lites show ok.
tried to connect to eth3 / the other 10gbe port, and see if it still states no-carrier?
Maybe tomorrow. Machine is in the data center. The really strange thing is that there are three other Xeon D nodes with the Eth2 configuration all working flawlessly.
 

_alex

Active Member
Jan 28, 2016
866
97
28
Bavaria / Germany
oh, when other nodes work on the same port maybe really a bad cable or nic. would also try if eth3 / the other 10gbe nic comes up on that node.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
oh, when other nodes work on the same port maybe really a bad cable or nic. would also try if eth3 / the other 10gbe nic comes up on that node.
It is the onboard NIC. I will try connecting the other 10GbE using a different cable. This one is a 1.5M Cat 7 cable which should be fine for 10GbE.
 

RobstarUSA

Active Member
Sep 15, 2016
233
104
43
I have a friend who has a Xeon-D with onboard 10Gbit nic. I remember He was having issues similar to you & it seemed that the kernel chose the wrong/older driver & it partially worked. I ended up grabbing the newest from Intel & compiling manually & that solved his issue. I will ping him to have him run a "modinfo" to find out the version & the exact model of his Xeon-D/board and I'll post a follow-up.

Do the proxmox nodes all have the same updates/kernel version? Does modinfo on the driver show the same version? You can find out the driver the kernel selected with "ethtool -i <ethinterfacename>"
 
  • Like
Reactions: Patrick

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
I have a friend who has a Xeon-D with onboard 10Gbit nic. I remember He was having issues similar to you & it seemed that the kernel chose the wrong/older driver & it partially worked. I ended up grabbing the newest from Intel & compiling manually & that solved his issue. I will ping him to have him run a "modinfo" to find out the version & the exact model of his Xeon-D/board and I'll post a follow-up.

Do the proxmox nodes all have the same updates/kernel version? Does modinfo on the driver show the same version? You can find out the driver the kernel selected with "ethtool -i <ethinterfacename>"
I didn't even think about that! Thanks for the idea. I will check this out tomorrow.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
Not working:
Code:
# ethtool -i eth2
driver: ixgbe
version: 4.4.6
firmware-version: 0x800003e7
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Working:
Code:
# ethtool -i eth2
driver: ixgbe
version: 4.4.6
firmware-version: 0x800000ea
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Maybe it is the firmware revision.