napp-it ixgbe error

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

animefans

New Member
Jul 18, 2019
23
5
3
I have a Fujitsu D2755-A11 (Intel X520 clone) installed to my nappit box (omnios-r151036-a13510b579)

I only plugged in one port (ixgbe2), yet ixgbe3 show it's up...
Code:
kkfong@nappit:~$ dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
igb0         Ethernet             up         1000   full      igb0
igb1         Ethernet             unknown    0      half      igb1
ixgbe2       Ethernet             up         10000  full      ixgbe2
ixgbe3       Ethernet             up         10000  full      ixgbe3
kkfong@nappit:~$
Right now I have ip configured, but it doesn't ping the other end (ruckus router)
Code:
ixgbe2: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 9000 index 6
        inet 10.10.30.80 netmask ffffff00 broadcast 10.10.30.255
Code:
kkfong@nappit:~$ ping 10.10.30.3
no answer from 10.10.30.3
kkfong@nappit:~$ ^C
On the router side, i also can't ping the nappit box
Code:
SSH@ICX7150-C12 Router#show ip cache
Entries in default routing instance:
Total number of cache entries: 2
D:Dynamic  P:Permanent  F:Forward  U:Us  C:Complex Filter
W:Wait ARP  I:ICMP Deny  K:Drop  R:Fragment  S:Snap Encap
      IP Address         Next Hop        MAC            Type Port           Vlan Pri
1     192.168.1.11       DIRECT          0000.0000.0000 PU   n/a                 0
2     10.10.30.3         DIRECT          0000.0000.0000 PU   n/a                 0
Code:
SSH@ICX7150-C12 Router#ping 10.10.30.80
Sending 1, 16-byte ICMP Echo to 10.10.30.80, timeout 5000 msec, TTL 64
Type Control-c to abort
Request timed out.
No reply from remote host.
SSH@ICX7150-C12 Router#
But I can ping another host just fine
Code:
SSH@ICX7150-C12 Router#ping 10.10.30.140
Sending 1, 16-byte ICMP Echo to 10.10.30.140, timeout 5000 msec, TTL 64
Type Control-c to abort
Reply from 10.10.30.140    : bytes=16 time<1ms TTL=64
Success rate is 100 percent (1/1), round-trip min/avg/max=0/0/0 ms.
SSH@ICX7150-C12 Router#
in /var/adm/messages, it shows the link is up
Code:
Dec 30 23:14:12 nappit mac: [ID 469746 kern.info] NOTICE: ixgbe2 registered
Dec 30 23:14:12 nappit ixgbe: [ID 611667 kern.info] NOTICE: ixgbe2: Intel 10Gb Ethernet
Dec 30 23:14:12 nappit mac: [ID 469746 kern.info] NOTICE: igb1 registered
Dec 30 23:14:12 nappit pseudo: [ID 129642 kern.info] pseudo-device: pm0
Dec 30 23:14:12 nappit genunix: [ID 936769 kern.info] pm0 is /pseudo/pm@0
Dec 30 23:14:12 nappit pseudo: [ID 129642 kern.info] pseudo-device: eventfd0
Dec 30 23:14:12 nappit genunix: [ID 936769 kern.info] eventfd0 is /pseudo/eventfd@0
Dec 30 23:14:12 nappit mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 10000 Mbps, full duplex
Dec 30 23:14:13 nappit mac: [ID 469746 kern.info] NOTICE: ixgbe3 registered
Dec 30 23:14:13 nappit ixgbe: [ID 611667 kern.info] NOTICE: ixgbe3: Intel 10Gb Ethernet
and then complain about it missing 90 mins later...
Code:
Dec 31 00:48:48 nappit ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: failed to read status register: device may be gone
Dec 31 00:48:48 nappit ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: failed to read status register: device may be gone
Dec 31 00:48:48 nappit ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: failed to read status register: device may be gone
Dec 31 00:48:48 nappit ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: failed to read status register: device may be gone
Dec 31 00:48:48 nappit mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 10000 Mbps, full duplex
Dec 31 00:48:50 nappit smbd[518]: [ID 413393 daemon.error] dyndns: failed to get domainname
Dec 31 00:48:50 nappit smbd[518]: [ID 413393 daemon.error] dyndns: failed to get domainname
Dec 31 00:48:57 nappit ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: failed to read status register: device may be gone
Dec 31 00:48:57 nappit ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: failed to read status register: device may be gone
Dec 31 00:48:57 nappit ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: failed to read status register: device may be gone
If I reboot, it would work fine, until it's not

Anyone got idea how to troubleshoot this problem?

Thanks
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
I would try

1. update to current OmniOS stable 151040
2. revert MTU to 1500 (disable Jumbo)

- optionally try a fan for the nic in case its a overheating problem
 

Freebsd1976

Active Member
Feb 23, 2018
390
73
28
I would try

1. update to current OmniOS stable 151040
2. revert MTU to 1500 (disable Jumbo)

- optionally try a fan for the nic in case its a overheating problem
upgrade nic firmware ?
Btw my nic x552 also use ixgbe , run normal since 151036 to 151040
 

animefans

New Member
Jul 18, 2019
23
5
3
I will look into OS update, MTU, and cooling

As far as NIC firmware, any idea if I can flash intel's, or I have to stick with Fujitsu?

BTW, it doesn't look good at all now : ixgbe2 is completely missing

and ixgbe3 can ping after boot, but fail 2 mins later?

Code:
Last login: Fri Dec 31 11:56:33 2021 from 192.168.50.139
OmniOS r151036  omnios-r151036-a13510b579       January 2021
kkfong@nappit:~$ dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
igb0         Ethernet             up         1000   full      igb0
igb1         Ethernet             unknown    0      half      igb1
ixgbe3       Ethernet             up         10000  full      ixgbe3
kkfong@nappit:~$ ping 10.10.30.3
10.10.30.3 is alive
kkfong@nappit:~$ vi /var/adm/messages
kkfong@nappit:~$ ping 10.10.30.3
^C
kkfong@nappit:~$ dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
igb0         Ethernet             up         1000   full      igb0
igb1         Ethernet             unknown    0      half      igb1
ixgbe3       Ethernet             up         10000  full      ixgbe3
kkfong@nappit:~$ ping 10.10.30.3
^C
kkfong@nappit:~$ dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
igb0        phys      1500   up       --         --
igb1        phys      1500   unknown  --         --
vnicomnidev0 vnic     1500   up       --         igb0
ixgbe3      phys      9000   up       --         --
kkfong@nappit:~$ dladm show-ether
LINK            PTYPE    STATE    AUTO  SPEED-DUPLEX                    PAUSE
igb0            current  up       yes   1G-f                            bi
igb1            current  unknown  yes   0G-h                            bi
ixgbe3          current  up       yes   10G-f                           bi
kkfong@nappit:~$ ping 10.10.30.3
no answer from 10.10.30.3
kkfong@nappit:~$
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
To decide if it is a firmware problem or simply a bad nic one would need a second nic. Maybe you ask at Topicbox if someone else has this nic.
 

animefans

New Member
Jul 18, 2019
23
5
3
when omni OS first booted up with Fujitsu NIC in it, it recognize both port : ixgbe2 and ixgbe3
In fact, ixgbe2 was working (traffic going thru, serviing files thru NFS, etc) for about an hour
Then slowly it start degrading
* lost network until reboot
* lost ixgbe3
* working port/ixgbe2 not even coming up

Your hint about heat COULD be an issue
the NIC is installed on a riser, and close to a fan blowing air out
and to unmount that fan I had to take the motherboard out

To make matter worse, I reinstalled my original/working 10gbe NIC (Supermicro AOC-STGN-I2S)
It has 2 NIC port, and I also lost one port (ixgbe1)
ixgbe0 works 95% of the time
There are instances where it lost network, and I had to reboot it to get it working again. This is the reason I looked into a new/different NIC...

Now that this "working" NIC is back, it's not working...
ixgbe0 comes up
Code:
kkfong@nappit:~$ dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
igb0         Ethernet             up         1000   full      igb0
igb1         Ethernet             unknown    0      half      igb1
ixgbe0       Ethernet             up         10000  full      ixgbe0
IP assigned
Code:
kkfong@nappit:~$ ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
igb0/v4           static   ok           192.168.1.80/24
ixgbe0/v4         static   ok           10.10.30.80/24
lo0/v6            static   ok           ::1/128
it can ping itself
Code:
kkfong@nappit:~$ ping -c 5 10.10.30.80
10.10.30.80 is alive
but it can't ping the router on the other end...
Code:
kkfong@nappit:~$ ping -c 5 10.30.30.3
no answer from 10.30.30.3
On the router side, it can ping my other 10gbe NIC on another host, just not the nappit host
Code:
SSH@ICX7150-C12 Router#ping 10.10.30.3
Ping self done.
SSH@ICX7150-C12 Router#ping 10.10.30.140
Sending 1, 16-byte ICMP Echo to 10.10.30.140, timeout 5000 msec, TTL 64
Type Control-c to abort
Reply from 10.10.30.140    : bytes=16 time=1ms TTL=64
Success rate is 100 percent (1/1), round-trip min/avg/max=1/1/1 ms.
SSH@ICX7150-C12 Router#ping 10.10.30.80
Sending 1, 16-byte ICMP Echo to 10.10.30.80, timeout 5000 msec, TTL 64
Type Control-c to abort
Request timed out.
No reply from remote host.
SSH@ICX7150-C12 Router#
I also need to get nappit up and running asap, so I will park this issue for the moment
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
Problems after some time mostly indicate a temp problem. If there are other problems or log entries beside nic even RAM or bad connectors can be a reason.
 
Last edited:

animefans

New Member
Jul 18, 2019
23
5
3
Gea
As usual, thank you for all your feedback!
Hopefully i can find time to get to the bottom of it