Mikrotik network redundancy

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

charlie

Member
Jan 27, 2016
58
3
8
Budapest, HU
Hi all,

We are planning a new network infrastructure, based on Mikrotik devices for ~60 server.

The plan is using CCR1072-1G-8S+ (or maybe CCR1036-8G-2S+ plus a TGE switch) as "core router" plus three CRS226-24G-2S+RM (with tge connection beetwen router and switches).

CCR will be the default gateway for network, and its communicate via BGP with uplink provider on a TGE link.

However, with these setup network can be a SPOF. If we adding an other CCR router to the system (and an other uplink), we can configure VRRP on their, with this we have a redundancy on this leve, but in this case CRS switches can be a SPF. If we adding three more CRS which connecting to the second router (so, duplicate the original setup), we can connect each server to two switch and we can configure bonding on that, theoretically in this setup servers always reach routers and internet.

IS this a good idea, or totally wrong way what we figured out?
 

ultradense

Member
Feb 2, 2015
61
11
8
41
IMO a good idea. However, you cannot create a bonding spanning ports on different switches. If your servers support switch independent load balancing (like hyper-v switch or vmware) it can be a great solution. If your servers do not support this, you should try to find a modular switch chassis or some expensive switches with support for m-lag or TRILL or the like).
 

wildchild

Active Member
Feb 4, 2014
389
57
28
You would need switches that support stacking cisco,juniper and i believe zyxel have this option, but could be others aswell
 

charlie

Member
Jan 27, 2016
58
3
8
Budapest, HU
IMO a good idea. However, you cannot create a bonding spanning ports on different switches. If your servers support switch independent load balancing (like hyper-v switch or vmware) it can be a great solution. If your servers do not support this, you should try to find a modular switch chassis or some expensive switches with support for m-lag or TRILL or the like).
Hi,
Linux support this kind of bonding (offcourse this is not real bonding, only have this name) so that not cause issue I guess.

It will be mutch easier if mikrotik will be support stp within they switches...
 

charlie

Member
Jan 27, 2016
58
3
8
Budapest, HU
You would need switches that support stacking cisco,juniper and i believe zyxel have this option, but could be others aswell
This setup can be under 10k € (including routers), I dont think I can bough stackable switch systems for this price (crs226 is around 220 €! )
 

ultradense

Member
Feb 2, 2015
61
11
8
41
Hi,
Linux support this kind of bonding (offcourse this is not real bonding, only have this name) so that not cause issue I guess.

It will be mutch easier if mikrotik will be support stp within they switches...
Spanning Tree Protocol is fully supported within MT switches, but I don't see how that can be a factor in static bonding..?
 
  • Like
Reactions: apnar

wildchild

Active Member
Feb 4, 2014
389
57
28
This setup can be under 10k € (including routers), I dont think I can bough stackable switch systems for this price (crs226 is around 220 €! )
Well.. you could take a look at 3750's used.

Another option, depending on what your severs are running, could be internal vswitch running ospf or (i)bgp.
Since you didnt spec 10g.. 3750's should work fine
 

aero

Active Member
Apr 27, 2016
346
86
28
54
You have a good setup in mind. Use fault tolerant active/standby bonding mode and you do not need stacked switches or any special switch configuration to support it.
 

charlie

Member
Jan 27, 2016
58
3
8
Budapest, HU
Well.. you could take a look at 3750's used.

Another option, depending on what your severs are running, could be internal vswitch running ospf or (i)bgp.
Since you didnt spec 10g.. 3750's should work fine
Hi,

Tge connection is required beetwen switches and routers (24 server easily generate more traffic than 1 gigabit).
 

wildchild

Active Member
Feb 4, 2014
389
57
28
Well there goes the 3750 idea..
Dont agree with aero about the bonding
Bonding usually is based upon virtual macs.
If switches arent aware of those virt macs, stp will get in the mix and most likely give you a mirade of issues.
Only was this will work over multiple hosts is running a virtual switch then uplinking, stacking...
 

NetWise

Active Member
Jun 29, 2012
596
133
43
Edmonton, AB, Canada
A few questions...

You mention 60 servers, but don't mention the OS. This would suggest they may be multiple/random, and not "All Linux" or "All ESXi", etc. Suggesting that whatever you're doing won't be necessarily consistent across the board, and require different tricks (at least as it pertains to teaming/bonding/redundancy/etc).

There are 60 servers, 2 ports each. I'm going to assume anything with 60 servers is going to have an IPMI of some sort, so another 30 ports. 50% of IPMI on one set, 50% on the other. So needs to accommodate 90 ports, twice.

Traffic flow isn't mentioned. If it's all outbound/internet based (eg: like a hosting provider), then it has to flow to a central point. But if it's server to server traffic (eg: like a business server room might be), then traffic might go from port to port. If traffic has to go East/West, then it's going to have to go all the way up to the router, then over to the other side and back. So having everything go back to a core router may not be ideal there.

Is the traffic intended to be "teamed for throughput/balancing" or "redundant for failover"?

Outside of the switches you've particularly configured, I'd probably put in 4x 48port stacking switches with 2x10GbE uplinks for the switching. In the past I've used PowerConnect 5548's with HDMI stacking, or Dell N2048, also Cisco 3750E's with 10GbE uplinks. In this case, any port-channel/LACP/bonding you might want to do is quite easily supported. East/West traffic doesn't have to go up to the router, or even utilize the 10GbE links. So that helps a lot there.

A comment was made that 24 1GbE servers will easily out perform 1GbE uplinks, which is totally true. But 60 will also easily out perform a single 10GbE uplink out from the edge. Is that uplink going to just be routed to, or will it also be firewalled and inspected? That will drastically reduce the throughput or increase the CPU requirements on your edge.

The last problem is how you're getting to that 10GbE uplink. Unless the provider is providing you with dual 10GbE ports, you're going to have to put something in front of your router, to split that single cable into two so that you can have a pair of redundant routers.

Without knowing the parts about:
* IPMI
* OS's in use
* Traffic expectations - East/West, North/South, Outbound, Firewalled or Routed, etc.
* Uplink/Internet/Downstream configuration

It's hard to know if your solution is a "good idea" beyond anything more than an electrical/port-count level. There's a lot more to designing a network than "do I have enough ports".
 
  • Like
Reactions: aero

charlie

Member
Jan 27, 2016
58
3
8
Budapest, HU
Hi,

We have mostly HP and SuperMicro servers. The usage is mixed, we have owned server, we providing dedicated servers to our clients, etc. And, i need to notice, not need this kind of redundancy for all server! For costumer rented server, this will be definitely an addons for extra payment, and we calculating, not all clients will be ordering this. But in the planning, we calculate with the best solution.

For IPMI network, we using cheap 100/1000 MBits HP switches, because that network totally independent from our live network.

We are a hosting provider, so internal traffic is not too high, only backups generate high internal traffic at night (but not all server backup up by us). Currently we have two switches LB4M switches which are connected via TGE link to our network provider (now they do routeing for us, we would like to switch BGP based solution in the future, because we would like to connect some local IX - we are a RIPE member). Now we have two 42U cabinet, and we will opening the new one when we deploying this (or other) networking. These cabinets are quite far from each other.

We not plannign to create firewalling on CCR, because that not have anough power for that, and our network provider have a solution for DDOS filtering, etc.

The main goal is to create redundancy to the Internet, 1GBit/s port speed always enough (where not, we will using full tge network).
 

NetWise

Active Member
Jun 29, 2012
596
133
43
Edmonton, AB, Canada
Okay, so some more questions:

* IPMI is completely separate - so this is for your access or the customer access to the bare metal? If the latter, this seems to muddy things a little.
* Now that you bring up racks, and especially the fact that a new rack will be implemented, and that it is distant. That changes your requirements. You're going to want your redundancy isolated to the racks, I assumed. So having a set of switches on one side, then another set, doesn't work as you'll be crossing racks, and that's ugly. Let's assume you have 30 servers per rack, you're going to want those to go to say 2x 48port switches at the top of rack. Then from there, go via 2x 10GbE uplinks back to whereever your core routers are - which could be the top of the one rack. If you start doing rack to rack cabling of 30-60 cables, that's going to be a nightmare to troubleshoot.

So knowing this, and realizing that you can't have more than 42-48U worth of 1U servers in the rack anyways, I would:
* Put 1x of your TGE capable routers at the top of each rack, splits your failure domain
* Put 2x 48pt 1GbE switches, stacking or VPC capable (but stacking preferred, as you can't assume what your customers will do with their OS's so you have to support the Lowest Common Denominator) in each rack. They will uplink each via 2x10GbE, each switch to one of the 2 TGE routers. This will give each set of 96x 1GbE a total of 4x 10GbE uplinks to the core router. You'll still have cross cabinet cabling, but it's now limited to 4 in each direction, and is minimal.
* Your 3rd rack, when added, follows the same design.
* Include 1x 48pt 1GbE switch per rack for your OoB/IPMI - but this has to uplink back to somewhere as well.
* Because of the backup requirements, you're going to want to design for East/West traffic, behind the router. The last thing you want is to be saturating the 10GbE uplinks during backups, which could be pretty easy to do with the number of boxes. If you can NOT go through the core, that would be much better.

Are you able to get enough power per rack on this? if you do have 30+ 1U servers in one rack, even assuming 30A circuits, in my experience you get to use 24A as an 80% load. Then you're recommended to use < 50% of that, in case CircuitA dies and shifts load to CircuitB, you don't go over. So you're generally looking at < 12A per circuit. That means your servers need to be drawing about 0.3A at any given time. - less if you're doing > 30 servers per rack, and account for the switching as well.
 
  • Like
Reactions: aero and wildchild

Jb boin

Member
May 26, 2016
49
16
8
36
Grenoble, France
www.phpnet.org
Juniper EX4200 costs about 650€ with dual PSU and can be stacked up to 10 switches on one virtual chassis (32Gbps between switches connected using a VCP cable and 10Gbps with the ones using an uplink module (SFP+ or XFP)) and they can do 10Gb uplinks as well.
 
  • Like
Reactions: NetWise

charlie

Member
Jan 27, 2016
58
3
8
Budapest, HU
We have mostly 2U servers, and two cabinet is full, so that is not a planning, this is a working thing. We don't worry about power per rack, in DC ~10kW power available per rack.

Internal backup traffic is not too much, in last month it was only 10Tbyte (mission criticals system backed up to a backup server on diferent DC, which are going through Internet) - and we don't create backup for all data/servers.

However, first, CCR1072 it will be a good choosing for "core" router?
 

Jb boin

Member
May 26, 2016
49
16
8
36
Grenoble, France
www.phpnet.org
If its not a "risky" traffic on which you could get DDOS or high PPS (especially UDP), it might be enough.

I did put a CCR1036 as the BGP router on my previous job and its still working fine but stressing it with small UDP packets was making the router to load (one of the CPU stuck at 100%) and losing packets and even crashing/rebooting at first (wasnt doing it anymore after an upgrade).

If its a backup infrastructure or to host internal services, it should be good enough but if its for a shared hosting/vps/dedicated servers infrastructure, i would make sure before that the current firmwares doesnt have these issues anymore.

--
You can also do BGP directly with Juniper EX switches, they require a license to do so that depends on the switch model and its number of ports.
You could use two 24 ports switches (for example EX4300-24T which are quite powerful and can be found at decent price) on a virtual chassis that would do only distribution uplinked to the distribution virtual chassis (that would be 48 ports switches indeed) with 10Gbps on each of the VC members so if any switches is down or needs to be upgraded, the links would stay up.

It will be more expensive but its still way less than with a Juniper MX router for example.
You would also have to add uplink modules if using EX4200/4300, EX3300 directly has 4*SFP+ and the Virtual Chassis cables.

ps: i never used EX switches to do BGP, we have a MX80 at my current job which is too expensive for your needs.
 
  • Like
Reactions: Jon Massey

aero

Active Member
Apr 27, 2016
346
86
28
54
Well there goes the 3750 idea..
Dont agree with aero about the bonding
Bonding usually is based upon virtual macs.
If switches arent aware of those virt macs, stp will get in the mix and most likely give you a mirade of issues.
Only was this will work over multiple hosts is running a virtual switch then uplinking, stacking...
I'm sorry, this is categorically false. An active/standby bond presents the virtual mac address on only one port. The switch doesn't know, nor care that it is a virtual mac address. I think you also may not understand how stp works. It doesn't come into play in this scenario at all.

Stacked switches are most certainly not needed for active/standby fault tolerant.

I have run configurations like this at many corporations over the years. Please describe these issues you think will occur?

Here's a brief description of what this bonding mode does:
1 (active-backup) Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The bond’s MAC address is externally visible on only one port (network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary option affects the behavior of this mode.
excerpts from https://www.kernel.org/doc/Documentation/networking/bonding.txt

The active-backup, balance-tlb and balance-alb modes do not
require any specific configuration of the switch.
11.2.1 HA Bonding Mode Selection for Multiple Switch Topology
-------------------------------------------------------------

In a topology such as the example above, the active-backup and
broadcast modes are the only useful bonding modes when optimizing for
availability; the other modes require all links to terminate on the
same peer for them to behave rationally.
 

NetWise

Active Member
Jun 29, 2012
596
133
43
Edmonton, AB, Canada
Active/Standby works that way to my experience as well.

I think the question is more if the end user with the server has access to change network settings. Be it via console, drivers, etc. If they do, and especially if the hosting provider does NOT, I think it is prudent to design the network in a way that 'trusts no one' and assumes the end user will do 'whatever the hell' they feel like, policy be damned. If that is true, then you have to build to assume everyone is a jerk, regardless of if you know a good and stable way to do it.

Users. They'll kill your design every time. Especially if you let them in the datacenter like little chaos monkeys.


Sent from my iPhone using Tapatalk
 
  • Like
Reactions: wildchild

aero

Active Member
Apr 27, 2016
346
86
28
54
Agreed, I just wanted to clear up technically how that bonding mode works so others reading this thread know it's an option for certain use cases.
 
  • Like
Reactions: NetWise