Ring network with dual port network cards without a switch

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

40gorbust

New Member
Jan 12, 2019
25
2
3
ring-network-dual-port-ethernet.png

I have a few servers that I do like to connect with 10Gbit and later 40Gbit (Mellanox ConnectX-3 cards in ETH mode) but without an 'expensive' and loud switch.

I can get the cards with 2 ports for a little bit more than cards with 1 port so the initial investment is nearly the same ; 6 cards for 6 cards in total.

If I'd connect each card with a DAC cable (cost effective) and give each couple of cards their own network (/24) then I can easily manually add routing to each server so it knows where it's neighbors are.
Of course the server that is furthest away would just have 1 route to it so packets know which interface they should use.

Downside:

- bandwidth is shared so when server A is sending to D all data goes through B and C and
- higher latency due to multiple hops
- CPU usage on each server even while traffic is not for him
- if one cable breaks half of the network is down
- one server down, half the network down/unreachable

Upside:

- doesn't need to buy a switch
- no electricity costs for switch
- no noise (network cards are notorious silent)
- slightly more complicated to setup routing
- cannot leverage the dual network ports for redundancy or bonding
- can keep adding servers without running out of ports on the switch

This is for a 'home-lab' so not really mission critical servers. All
servers would be running services like file-server, machine-learning, VMs etc.

Anyone done this with 10 Gbit or 40 Gbit Ethernet and has some suggestions or remarks or advice ?
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
Mellanox has a discussion of switchless network layouts, for Mellanox Connectx-5 cards. It also talks about tradeoffs of various forms of rings and trees.

http://www.mellanox.com/related-doc...infiniband-interconnect-solutions-archive.pdf

The motivation there may be similar: eliminating a costly 100GB switch. The connectx-5 does the switching and routing in the cards, so that there's good network performance and minimal impact on the host.

I've built HPC clusters of 16 hosts using 4-port cards for MPI applications on a shoestring budget, some years ago. It can certainly be done with static routing tables like you suggest.

But I agree with Patrick. The performance of a host routing packets at 10Gb will not be great. You might try running iperf3 between 'A' and 'C' to get an idea of how it might perform.
 
Last edited:

40gorbust

New Member
Jan 12, 2019
25
2
3
What about using a 10GbE switch or two and using 2-4 10GbE per box? https://forums.servethehome.com/index.php?threads/mikrotik-crs317-1g-16s-rm-10gbe.16428/
Interesting switch. I see it's nearly silent until it isn't. That's good. So no noise at night in the home-lab (if I need to keep the network running anyway).

Issues:

16 ports. Full is full.
No (high bandwidth) uplink so connecting two switching requires 2 bonded ports (for at least 20 Gbit) which leaves 2 x 14 ports left. While that's a nice amount it comes at the price of 2 x $399 so $800 for 28 effective ports and a bottleneck of 20 Gbit in between. $28 per port which is definitely reasonable.

Now I got a bunch of dual port 10 Gbit cards for $14 (thank you second-hand market) and we're testing with 40 Gbit Mellanox Infiniband (and Ethernet) cards. So far the latter is great (37 Gbit on a single link!) but the cards costed me $130 each (second hand). Two 40Gb switches are on the way (18 port and 32 port) but I already was warned they are loud. So I thought ; I can hack the switches and remove the 40mm fans and put in 12cm fans and drill holes through the top and bottom (the switches don't have to go in a rack and if they go in a rack I can save 1U above and below for airflow).

For a home-lab that's a bit overkill ( 32 x 40 Gbit ports ) so I thought why not do everything with software.

(FYI I got a Mellanox IS5023 and IS5024 ordered, at $200 and $140 each).
 

40gorbust

New Member
Jan 12, 2019
25
2
3
Mellanox has a discussion of switchless network layouts, for Mellanox Connectx-5 cards. It also talks about tradeoffs of various forms of rings and trees.

http://www.mellanox.com/related-doc...infiniband-interconnect-solutions-archive.pdf

The motivation there may be similar: eliminating a costly 100GB switch. The connectx-5 does the switching and routing in the cards, so that there's good network performance and minimal impact on the host.

I've built HPC clusters of 16 hosts using 4-port cards for MPI applications on a shoestring budget, some years ago. It can certainly be done with static routing tables like you suggest.

But I agree with Patrick. The performance of a host routing packets at 10Gb will not be great. You might try running iperf3 between 'A' and 'C' to get an idea of how it might perform.
Interesting the 'just connect hosts and forget about a switch' is actually discussed by vendors. Interesting. I'm less crazy than I thought.

I've yet to build the setup (cards are coming, spread between office and home). I test between 2 servers ; with 2 DACS between them, from port 1A via 2A to 2B (on server 2) to port 1B back. That's 2 hops and 1 internal, should give some ideas without having to already put 6 servers up et all. Good suggestion.
 

Haitch

Member
Apr 18, 2011
122
14
18
Albany, NY
If you have sufficient cards and slots, for six servers you could look at doing a three tier set up

Tier 1: single host, single dual port, each linked to the two tier two hosts
Tier 2: two hosts, 2 x dual port cards - one port to Tier 1 host, other three down to the Tier 3 hosts. Single 1Gb link between the two hosts for redundancy.
Tier 3: Three hosts, single Dual port, uplinking to the two tier hosts.

no decent drawing software here, but I think you can visualize it.

Using weighted static routes you have redundancy. Tier 3 has redundant links up to Tier 2, if a Tier 3 host goes down, that's the only node lost. If a Tier 2 host goes down, you have redundant paths via the other host. If Tier 1 goes down, you have the 1Gb link between the two Tier 2's for redundancy.
 

40gorbust

New Member
Jan 12, 2019
25
2
3
If you have sufficient cards and slots, for six servers you could look at doing a three tier set up

Tier 1: single host, single dual port, each linked to the two tier two hosts
Tier 2: two hosts, 2 x dual port cards - one port to Tier 1 host, other three down to the Tier 3 hosts. Single 1Gb link between the two hosts for redundancy.
Tier 3: Three hosts, single Dual port, uplinking to the two tier hosts.

no decent drawing software here, but I think you can visualize it.

Using weighted static routes you have redundancy. Tier 3 has redundant links up to Tier 2, if a Tier 3 host goes down, that's the only node lost. If a Tier 2 host goes down, you have redundant paths via the other host. If Tier 1 goes down, you have the 1Gb link between the two Tier 2's for redundancy.
3-tiered-network.jpg

You mean this? I whipped out my best drawing skills!
I added optional 1Gbit links between the T3 hosts.

The next challenge would be how to setup fail-over with static routing without using scripts to detected failed links. Is there some gateway discovering protocol that would follow the links from node to node and map the network on it's own and make paths with weights based on bandwidth ?
 

aero

Active Member
Apr 27, 2016
353
90
28
54
Why static routing? Check out quagga.

Or search for BFD protocol for Linux. Maybe there's such a daemon.... Never looked.
 

zxv

The more I C, the less I see.
Sep 10, 2017
156
57
28
(FYI I got a Mellanox IS5023 and IS5024 ordered, at $200 and $140 each).
If you are considering a switch, a used brocade icx6610 could provide you gigabit, 10gb and 40gb ethernet ports, and thus the possibility to uplink to 40gbe later, should you upgrade to a faster network. They list for around $300 lately on ebay, and can be had for less by best offer if you are patient enough.

https://forums.servethehome.com/index.php?threads/brocade-icx6450-icx6610-etc.21107/

However, even with a fast ethernet switch, there is still significant value in troubleshooting certain things like RDMA on a ConnectX-3 using a direct a host-to-host connection.
 

Haitch

Member
Apr 18, 2011
122
14
18
Albany, NY
View attachment 10234

You mean this? I whipped out my best drawing skills!
I added optional 1Gbit links between the T3 hosts.

The next challenge would be how to setup fail-over with static routing without using scripts to detected failed links. Is there some gateway discovering protocol that would follow the links from node to node and map the network on it's own and make paths with weights based on bandwidth ?
Yep, exactly like that.

For the failover routing, id dynamic routing is not available you can use static routing, just use different weighted (metric) routes. If one route is not available, the next highest weighting would be used. No route is required for directly connected subnets, so:

Tier 3: routes to 0.0.0.0/0 with next hop being the IPs of the Tier 2 hosts
Tier 2: routes to 0.0.0.0/0 with next hop being the IP of the tier 1 host, and the T2 peer.
Tier 3: Static routes to all the T2 <-> T3 subnets via both the T2 hosts.
 

Marsh

Moderator
May 12, 2013
2,659
1,506
113
zxv mentioned about ICX switches.

Cost was $75 ( before ebay seller bump the price to $100 ) for ICX 6450 , 4 ports 10gbe switch.
$125-$150 for ICX 6610 which it has more than 8 x 10gbe ports.

For the 40gbe ports, then there is SX60xx switch for not more than $200 shipped

Question is why ?
 

blood

Member
Apr 20, 2017
42
14
8
45
Why not simplify this by taking it down to layer 2 and just use bridges to "switch" the packets between systems? It sounds like you'll have nice NICs, so something like Open vSwitch compiled with the right DPDK support could give you pretty fast switching (for software) with one flat L2 and a single IP subnet. No dynamic routing (or even static routing other than a default gateway), and you could still enable a full ring for resiliency as long as you lay down spanning tree of some flavor.
 

40gorbust

New Member
Jan 12, 2019
25
2
3
Why not simplify this by taking it down to layer 2 and just use bridges to "switch" the packets between systems? It sounds like you'll have nice NICs, so something like Open vSwitch compiled with the right DPDK support could give you pretty fast switching (for software) with one flat L2 and a single IP subnet. No dynamic routing (or even static routing other than a default gateway), and you could still enable a full ring for resiliency as long as you lay down spanning tree of some flavor.
I never thought about L2 bridging. Not sure how to set that up in both Windows nor Linux clients on these cards. Any tips in that direction would be appreciated.

It would definitely make things much easier, no static routing, no dynamic routing using demons or even BGP which I know but don't master at all. Yet.

There are several reasons not to go for a $200 switch with enough ports ;

1. those things are loud and would require to be torn apart to change fans or remove all the 40mm fans and replace them with 12cm ones that are inside the chassis or similar. Doable but I'm not looking forward to it.

2. the switches in the network room are quite far (well relative far) from both the servers and the desktops (that I intend later to upgrade to 10 and 40 gbit connections). In short ; it would take a lot of DAC cables each from 3 meters (fine, cheap) to up to 10 meters (not so cheap)

3. if I'm correct I could make an old fashioned ring-network using 2-port NICs in the desktops in the office (small office, 20 desktops) using a lot of relative short cables (solves 1) and just "1" connection back to the server-room. Doesn't even have to be ring ; it can be a bus. The last desktop on the bus has a bit more latency than the others and the total bandwidth is limited to the maximum speed of the card, say 40 Gbit, but that is fast enough for the bursts we see on the LAN of the desktops.

In particular I'm experimenting with a cheap SAN made from a few SSDs that has iSCSI volumes for the desktops. With 40 Gbit (if everything works well) that could give a burst speed of near 5 GBytes/second to get data/files from the fileservers. Plenty of bandwidth and even if that 5 GByte/sec has to be shared with 2,3,5 or even 10 people at the same time (highly unlikely) then the speed of 0.5 GByte/sec is still higher than what the (cheap SATA) SSDs in the servers can deliver.

Compared to the current 1 Gbit network which has a limit of around 110-120 MBytes/sec that is in a worst case scenario still 500% better and in the best scenario 5000% better. Now those are nice numbers worth the hassle :)

Do we need the speed. No. I could even make up a story and reduce everyone's speed to 100 Mbit and still nobody would complain. But that's no fun right?
 

ttabbal

Active Member
Mar 10, 2016
763
212
43
47
For an office, even a small one, I wouldn't use a setup like this. It has too many possible failure points. There are good reasons we don't use token-ring anymore. :) For a homelab, maybe. For business, no way.

For the longer cables, use fiber. It's cheaper once you get past the cheap short DACs. Even with the transceiver costs. And if you're already doing off-lease etc, transceivers are only about $10/ea. And patch cables are almost as cheap as CAT6 now, even new.

I would do a switch, 40G in the server room for servers, 10G for the clients. Or even 10G/1G. Even if you must mod the switch for noise, it's still less work than maintaining software switching on the machines with failover setups.
 

40gorbust

New Member
Jan 12, 2019
25
2
3
Hmmm. While I think still it's feasible to have a ring-network even in 2019 (I think we only had one cat5 cable that died on us in 4 years) I do agree that it's looking for trouble.

I've found a Dell 4810-ON switch with 48 ports @ 10 gbit (SFP+) and 4 ports @ 40 Gbit (QSFP+). Before I only looked at 10 Gbit only switches and then wasn't happy with the limited bandwidth from switch to server. With 4 x 40 Gbit ports however ...

The downside is that the thing is $720 (USD) in this area which is twice as expensive as the 36 port 40 Gbit switch that is on the way to here.

48 ports @ 10 gbit though should be good for a looong time. Even if half of the ports would fail it would be good enough.
 

Frank173

Member
Feb 14, 2018
75
9
8
You can get a used dell switch with 32 100gb ports for around usd 3300. That's around 500 dollars broken down over 6 servers, take away the extra 150-200 dollars you pay for a 2 port 100gb nic over a single port one and you pay an extra 300 dollars per server but get a full l2/l3 100gb switch and get a 100gb network that is future proof. Quite a no-brainer imo unless you are severely budget constrained.

What about using a 10GbE switch or two and using 2-4 10GbE per box? https://forums.servethehome.com/index.php?threads/mikrotik-crs317-1g-16s-rm-10gbe.16428/