Figured I'd post this as an FYI.
Spent a fair few days banging my head against a LAG problem, seems to trace back to a Freenas problem.
Long and short of it is after setting up the lag on the freenas box I HAD TO RESTART. Otherwise it would be bad times. I swear I read a post on this thread nonchalantly mentioning that fact (probably from grand master fohdeesha) but I can't for the life of me find the post.
Here are the log dumps (truncated and ham-fisted sterilized so time codes are a mess). What would happen is some IP next hop flapping of the pfsense firewall (10.x.x.1) as well as my laptop (ssh to switch, 10.x.x.206) across pretty much any interface that carried more than a single device (another switch, unifi AP, pfsense, etc). Flapping the next hop between ports 'in the flow path' including the firewall port I could sort of understand but when the freenas ports were included I knew something was really, truely wrong. I'm sure this is a fun intellectual exercise for why the switch would do this, but at this point I'm in the 'glad it's fixed/not broken any more' category.
May or may not have had something to do with my jails, I swear in the first example below that the flapping started once I started to bring the jails back up. Jails are plex, syncthing, unifi controller, all using VNET.
Scenario 1
Code:
Apr 1 16:45:35:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
Apr 1 16:45:35:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
Apr 1 16:45:35:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
Apr 1 16:45:34:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
Apr 1 16:45:34:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
Apr 1 16:45:34:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
Apr 1 16:45:33:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
Apr 1 16:45:32:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/1
Apr 1 16:45:32:D:next hop router 10.X.X.1 moved from port 1/2/5 to port 1/2/2
Apr 1 16:45:31:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/4
Apr 1 16:45:31:D:next hop router 10.X.X.1 moved from port 1/2/2 to port 1/2/5
Apr 1 16:45:25:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
Apr 1 16:45:25:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
Apr 1 16:45:10:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/1
Apr 1 16:45:09:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/4
Apr 1 16:45:01:D:next hop router 10.X.X.1 moved from port 1/2/4 to port 1/2/2
Apr 1 16:45:01:D:next hop router 10.X.X.1 moved from port 1/2/2 to port 1/2/4
Apr 1 16:44:56:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/1
Apr 1 16:44:55:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/4
Apr 1 16:44:51:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
Apr 1 16:44:50:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
Apr 1 16:31:03:D:next hop router 10.X.X.206 moved from port 1/1/3 to port 1/1/1
Apr 1 16:29:14:I:System: Interface ethernet 1/1/1, state up
Apr 1 16:29:12:I:System: Interface ethernet 1/1/1, state down
Apr 1 16:28:17:I:System: PoE: Allocated power of 30000 mwatts on port 1/1/1.
Apr 1 16:29:07:I:Security: SSH login by public key from src IP 10.X.X.206 from src MAC XXX to PRIVILEGED EXEC mode using RSA as Server Host Key.
Apr 1 16:29:00:I:Security: SSH login by public key from src IP 10.X.X.206 from src MAC XXX to USER EXEC mode using RSA as Server Host Key.
Apr 1 16:28:49:I:System: Interface ethernet 1/1/1, state up
Apr 1 16:28:47:I:System: Interface ethernet 1/1/1, state down
Apr 1 16:28:24:I:System: Interface ethernet 1/1/1, state up
Apr 1 16:28:22:I:NTP: System clock is synchronized to 10.X.X.1.
Apr 1 16:28:19:I:System: PoE: Power enabled on port 1/1/1. #setting up for upstairs Unifi AP
Apr 1 16:28:09:I:System: Logical link on dynamic lag interface ethernet 1/2/4 is up.
Apr 1 16:28:09:I:System: Interface ethernet 1/2/4, state up
Apr 1 16:28:08:I:System: Logical link on dynamic lag interface ethernet 1/2/5 is up.
Apr 1 16:28:08:I:System: Interface ethernet 1/2/5, state up
Apr 1 16:28:08:I:Trunk: Group (1/2/4, 1/2/5) created by 802.3ad link-aggregation module.
Apr 1 16:28:08:I:System: dynamic lag 2, has new peer info (priority=32768,id=XXX,key=338) (N/A)
Scenario 2
Code:
00 days 00h10m20s:D:next hop router 10.X.X.1 moved from port 1/2/4 to port 1/2/2
00 days 00h10m19s:D:next hop router 10.X.X.1 moved from port 1/2/2 to port 1/2/4
00 days 00h10m13s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/1
00 days 00h10m13s:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/4
00 days 00h10m09s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/1
00 days 00h09m54s:I:System: Logical link on dynamic lag interface ethernet 1/2/4 is up.
00 days 00h09m54s:I:System: Interface ethernet 1/2/4, state up
00 days 00h09m53s:I:Trunk: Group (1/2/4, 1/2/5) created by 802.3ad link-aggregation module.
00 days 00h09m53s:I:System: dynamic lag 2, has new peer info (priority=32768,id=XXX,key=338) (N/A)
00 days 00h09m50s:I:System: Logical link on dynamic lag interface ethernet 1/2/4 is down.
00 days 00h09m47s:I:System: dynamic lag 2 peer info (priority=32768,id=XXX,key=338) remove (LagExpiry)
00 days 00h09m43s:I:Trunk: Group (1/2/4, 1/2/5) removed by 802.3ad link-aggregation module.
00 days 00h07m59s:I:System: Interface ethernet 1/2/5, state down
00 days 00h07m57s:I:System: Interface ethernet 1/2/4, state down
00 days 00h07m43s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/1
00 days 00h07m43s:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/4
00 days 00h07m43s:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
00 days 00h07m42s:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
00 days 00h07m41s:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
00 days 00h07m41s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/1
00 days 00h07m40s:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/4
00 days 00h07m40s:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
00 days 00h07m40s:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
00 days 00h07m40s:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/1
00 days 00h07m40s:D:next hop router 10.X.X.206 moved from port 1/1/1 to port 1/2/5
...
00 days 00h05m17s:D:next hop router 10.X.X.206 moved from port 1/1/3 to port 1/2/5
00 days 00h05m14s:D:next hop router 10.X.X.206 moved from port 1/2/5 to port 1/1/3
00 days 00h05m13s:D:next hop router 10.X.X.206 moved from port 1/1/3 to port 1/2/5
00 days 00h05m11s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/3
00 days 00h05m10s:D:next hop router 10.X.X.206 moved from port 1/1/3 to port 1/2/4
00 days 00h05m09s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/3
00 days 00h05m09s:D:next hop router 10.X.X.206 moved from port 1/1/3 to port 1/2/4
00 days 00h05m09s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/3
00 days 00h05m08s:D:next hop router 10.X.X.206 moved from port 1/1/3 to port 1/2/4
00 days 00h05m08s:D:next hop router 10.X.X.206 moved from port 1/2/4 to port 1/1/3
00 days 00h05m08s:D:next hop router 10.X.X.206 moved from port 1/1/3 to port 1/2/4
00 days 00h05m05s:I:Security: SSH login by public key from src IP 10.X.X.206 from src MAC XXX to USER EXEC mode using RSA as Server Host Key.
00 days 00h04m07s:I:System: Interface ethernet 1/1/1, state up
00 days 00h04m04s:I:System: Interface ethernet 1/1/1, state down
00 days 00h03m42s:I:System: Interface ethernet 1/1/1, state up
00 days 00h03m40s:I:System: Interface ethernet 1/1/1, state down
00 days 00h03m17s:I:System: Interface ethernet 1/1/1, state up
00 days 00h03m13s:I:System: PoE: Power enabled on port 1/1/1.
00 days 00h03m10s:I:System: PoE: Allocated power of 30000 mwatts on port 1/1/1.
00 days 00h03m05s:D:next hop router 10.X.X.1 moved from port 1/2/2 to port 1/2/5
00 days 00h03m05s:D:next hop router 10.X.X.1 moved from port 1/2/5 to port 1/2/2
00 days 00h03m04s:D:next hop router 10.X.X.1 moved from port 1/2/2 to port 1/2/5
00 days 00h03m03s:D:next hop router 10.X.X.1 moved from port 1/2/5 to port 1/2/2
00 days 00h03m03s:D:next hop router 10.X.X.1 moved from port 1/2/2 to port 1/2/5
00 days 00h03m02s:D:next hop router 10.X.X.1 moved from port 1/2/5 to port 1/2/2
00 days 00h03m01s:I:System: Logical link on dynamic lag interface ethernet 1/2/5 is up.
00 days 00h03m01s:I:System: Interface ethernet 1/2/5, state up
00 days 00h03m00s:I:System: Logical link on dynamic lag interface ethernet 1/2/4 is up.
00 days 00h03m00s:I:System: Interface ethernet 1/2/4, state up
00 days 00h03m00s:I:Trunk: Group (1/2/4, 1/2/5) created by 802.3ad link-aggregation module.
00 days 00h03m00s:I:System: dynamic lag 2, has new peer info (priority=32768,id=XXX,key=338) (N/A)
00 days 00h02m58s:I:System: Interface ethernet 1/1/40, state up
00 days 00h02m58s:I:System: Interface ethernet 1/1/37, state up
00 days 00h02m58s:I:System: Interface ethernet 1/1/39, state up
00 days 00h02m58s:I:System: Stack unit 1 POE Power supply 2 with 748000 mwatts capacity is up
00 days 00h02m58s:I:System: Stack unit 1 POE Power supply 1 with 748000 mwatts capacity is up
00 days 00h02m57s:I:System: Interface ethernet 1/1/3, state up
00 days 00h02m57s:I:System: Interface ethernet 1/1/40, state down
00 days 00h02m57s:I:System: Interface ethernet 1/1/38, state up
00 days 00h02m56s:I:System: Interface ethernet 1/1/40, state up
00 days 00h02m55s:I:System: Interface ethernet 1/2/2, state up
00 days 00h02m55s:I:System: Interface ve 1, state up
00 days 00h02m55s:I:System: Warm start
(N/A)
In the above:
10.x.x.1 is pfsense firewall
10.x.x.206 is laptop, on wifi (yes i know) so could be coming through 1/1/1 or 1/1/3
1/1/1 is unifi ap, poe from switch
1/1/3 is link to dgs1100 switch that runs downstairs (tv area, consoles, another ap, etc)
1/2/2 is link to pfsense that firewalls wan
1/2/4 & 1/2/5 are lag to freenas
log starts at warm start, so flapping happens almost immediately. This log dump was basically after a 'reload' command of the first scenario. tried unplugging the lag as can be seen in the log, unconfig and reconfig the lag, starts right back up with the flapping.
Config file wasn't captured in my terminal dumps, but it was basically fohdeesha's guide:
unstack 1/2/1 etc, ntp, gateway, etc
plus the basic lag config:
lag XXX dynamic, ports eth 1/2/4 eth 1/2/5, primary 1/2/4, deploy
no layer 3 stuff yet, haven't gotten that far.
So yah, restart your freenas box after configuring lag, if you lag. And if you have ideas on what my logs are saying, I'm all ears and very curious, but I think I learned my lessons from this which are the usual:
start simple
test first
don't tear down what's working, add to it first
don't orphan your config interfaces
read this thread
read your logs
read your logs
by the way your switch has logs