Solaris 11.1 upgrade - vlan stopped working

PigLover · Oct 31, 2012

Sorry for the long post. Cross-posted on [H].

I upgraded a test server to Solaris 11.1 from 11/11 a few days ago. After I did the upgrade my one VLAN-based network connection stopped working. I've been messing with it for days and am a bit baffled.

Note that everything was up and working just before the upgrade. This is not a switch configuration problem or a cables issue.

All of the other networking survived the upgrade just fine...

The link that stopped working is a VLAN running on top of a two-link LAG. The untagged link running over the same LAG works just fine.

So here's the strange part. As I was trying to get underneath it all today I fired up wireshark to see if I could figure it out. As soon as I put wireshark up on the interface (in its default promiscuous mode) the link started working. All the packets in the trace looked normal. All was good. As soon as I stopped the trace the link was dead again. Start a trace - link in promiscuous mode - and all is good again. Stop and it stops...

If I bring bring up wireshark on the link without promiscuous mode the link does NOT start working. If I initiate a ping that should go out on the broken VLAN I see a series of ARP requests but no answers (running wireshark on the machine being ping'd I see all of the ARP request come in and the answers go out, but the Solaris machine never sees the answers).

So - did Oracle manage to break VLANs in 11.1? Any ideas how to get it working again?

A few bits on info from the machine. The only thing that looks odd/wrong I've highlighted below.

Phil@TEST:~$ dladm show-link
LINK CLASS MTU STATE OVER
e1000g1 phys 1500 up --
e1000g0 phys 1500 up --
ixgbe0 phys 9000 up --
ixgbe1 phys 9000 up --
aggr2 aggr 9000 up ixgbe0 ixgbe1
aggr2vlan5 vlan 9000 up aggr2

Phil@TEST:~$ dladm show-vlan
LINK VID OVER FLAGS
aggr2vlan5 5 aggr2 -----

Phil@TEST:~$ ipadm show-addr aggr2vlan5
ADDROBJ TYPE STATE ADDR
aggr2vlan5/v4 dhcp ok 192.168.5.101/24

Phil@TEST:~$ dladm show-linkprop aggr2vlan5
LINK PROPERTY PERM VALUE DEFAULT POSSIBLE
aggr2vlan5 autopush rw -- -- --
aggr2vlan5 zone rw -- -- --
aggr2vlan5 state r- unknown up up,down
aggr2vlan5 mtu rw 9000 1500 1500-9000
aggr2vlan5 maxbw rw -- -- --
aggr2vlan5 cpus rw -- -- --
aggr2vlan5 cpus-effective r- 0-7 -- --
aggr2vlan5 rxfanout rw -- 8 --
aggr2vlan5 rxfanout-effective r- 16 -- --
aggr2vlan5 pool rw -- -- --
aggr2vlan5 pool-effective r- -- -- --
aggr2vlan5 priority rw high high low,medium,high
aggr2vlan5 forward rw 1 1 1,0
aggr2vlan5 protection rw -- -- mac-nospoof,
restricted,
ip-nospoof,
dhcp-nospoof
aggr2vlan5 mac-address r- 0:1b:21:6b:23:98 0:1b:21:6b:23:98 --
aggr2vlan5 allowed-ips rw -- -- --
aggr2vlan5 allowed-dhcp-cids rw -- -- --
aggr2vlan5 rxrings r- -- -- --
aggr2vlan5 rxrings-effective r- -- -- --
aggr2vlan5 txrings r- -- -- --
aggr2vlan5 txrings-effective r- -- -- --
aggr2vlan5 txrings-available r- 0 -- --
aggr2vlan5 rxrings-available r- 0 -- --
aggr2vlan5 rxhwclnt-available r- 0 -- --
aggr2vlan5 txhwclnt-available r- 0 -- --
aggr2vlan5 vsi-mgrid rw -- -- --
aggr2vlan5 etsbw-lcl rw -- 0 --
aggr2vlan5 etsbw-lcl-effective r- -- -- --
aggr2vlan5 etsbw-rmt-effective r- -- -- --
aggr2vlan5 etsbw-lcl-advice r- -- -- --
aggr2vlan5 cos rw -- 0 --

PigLover · Nov 1, 2012

Fixed. Don't know exactly how/why, but fixed.

When I came home this afternoon I deleted the all of the IPs associated with the VLAN and the LAG. I deleted the VLAN itself and deleted the LAG. Basically tore down all of the datalink and IP layers leaving only the raw interface cards. Rebuilt the LAG, rebuilt the VLAN and reinstalled the IPs. And like magic, the whole thing is happy.

Something in the configs of the LAG or the VLAN must have been corrupted during the upgrade. But now its all fat, dumb and happy again.

Very odd.

Search

Solaris 11.1 upgrade - vlan stopped working

PigLover

Moderator

PigLover

Moderator