Mikrotik SwOS LAG Configuration

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

mattlach

Active Member
Aug 1, 2014
323
89
28
Hey all,

I recently reached out on the Mikrotik forums here, and unfortunately did not receive any replies. I was hoping maybe someone in here has some experience I can lean on:

Hey all,

My main switch in my rack is an CRS317-1G-16S+.

I also have a CSS326-24G-2S+ in that same rack, for gigabit copper stuff.

I wanted to link up the CSS326-24G-2S+ to the CRS317-1G-16S+ using both of its SFP+ ports in a LACP configuration in order to get almost non-blocking performance on the gigabit ports, but I am having a little trouble.

Back on my old HP Procurve's this was a manual process. Thell the switch which ports to group together into what Procurve called a "Trunk" (which was ambiguous, as Cisco used the same term for a link with multiple VLAN's), tell the switrch which link aggregation mode to use, and then connect it to a similarly manually configured device on the other end.

Mikrotik's SwOS seems to automate things a little more....


From the Wiki:
1625254659799.png

Mode (default: passive) Specify LACP packet exchange mode or Static LAG mode on ports:
Passive: Place port in listening state, use LACP only when it's contrary port uses active LACP mode
Active: Prefer to start LACP regardless contrary port mode
Static: Set port in a Static LAG mode

Group Specify a Static LAG group
Trunk (read only) Represents group number port belongs to.
Partner (read only) Represents partner mac-address.




The only way to manually select which ports are members of the LACP group seems to be to select "static" mode, other wise the group column cannot be populated. My gut was to use this method, as I usually don't trust automated things, but the manual is a little bit ambiguous if this results in true link aggregation to provide extra bandwidth, or if it is just failover.

Because of this I used the Active/Passive mode. I selected active on two SFP ports on both sides (CRS317 and CSS326) and just plugged in the short 1ft DAC cables (Molex Branded) and to my astonishment, it just worked. Both switches correctly auto-identified that they were in link aggregated mode, with the correct other port, and everything just worked.

I was pretty impressed, but that only lasted for 3-4 days.

Suddenly I had no connectivity across the switches. Troubleshooting ensued (first I thought it was my pfSense router, but it checked out)

Finally, I figured out that it was being caused by my beautiful automated LAG group. Somehow it had randomly forgotten that it was part of a LAG group, and the resultant loop was causing all sorts of problems network wide. Nothing obvious occurred that caused this to happen. There were no other changes made to any configuration.

So,

A few questions:
1.) Did I do something wrong in configuring this? It seems possible, as good documentation seems difficult to find.

2.) Is forgetting aggregated links a common problem?

3.) In order to use link aggregation in the future, without this happening again, what should I do?

4.) If I use manually configured LAG, will I still get the full bandwidth doubling benefits, or will it just go into a fallback configuration?

I appreciate any help!

--Matt
 

nickf1227

Active Member
Sep 23, 2015
197
128
43
33
Just like in Cisco, I would put both links in "Active" mode, not "Static" mode.
"Static" mode should only be used as last-resort. Passive mode is a little better, but can result in exaclty the results you saw. You really want an active LACP negotiation between the ports for the most stability.

PS, I prefer procurve's way of doing it. :)
 

mattlach

Active Member
Aug 1, 2014
323
89
28
Just like in Cisco, I would put both links in "Active" mode, not "Static" mode.
"Static" mode should only be used as last-resort. Passive mode is a little better, but can result in exaclty the results you saw. You really want an active LACP negotiation between the ports for the most stability.
That's exactly what I did. I set the switches to active on both sides, and while it worked fine at first, a week or so later, they forgot they were supposed to be aggregated and created some horrific loops causing the entire network to become non-functional.

PS, I prefer procurve's way of doing it. :)
You and me both. I prefer controlling things manually, and knowing what is going on over black box shit going on in the background...
 

nickf1227

Active Member
Sep 23, 2015
197
128
43
33
That's exactly what I did. I set the switches to active on both sides, and while it worked fine at first, a week or so later, they forgot they were supposed to be aggregated and created some horrific loops causing the entire network to become non-functional.
They shouldn't be possible. In Active configuration, the switches should disable any links that do not receive LACP PDUs which would prevent a loop. Unless, for some reason they switched from "Fast" to "Slow" (I don't know how Mikrotik handles that?).

Are you on an old firmware? Are both switches on the same firmware?