Network routing issues

Quartzeye

New Member
Jul 29, 2013
14
0
1
All,
I have three Proxmox/Debian multi-homed servers. They each has (1) 1GBE, (2) 10GBE, and (2) 40Gb Infiniband adapters.

I have added (4) entries in the rt_table,
2 rt10G
3 rt10G02
4 rt40G01
5 rt40G02

I have set the routes in the /etc/network/interfaces for each adapter (only one shown here but each is set correctly with changes for each adapter)
auto vmbr1
iface vmbr1 inet static
address 172.20.10.15
netmask 255.255.255.0
bridge_ports eno1
bridge_stp off
bridge_fd 0
mtu 9000
post-up ip route add 172.20.10.0/24 dev vmbr1 src 172.20.10.15 table rt10G
post-up ip route add default via 172.20.10.1 dev vmbr1 table rt10G
post-up ip rule add from 172.20.10.15/24 table rt10G
post-up ip rule add to 172.20.10.15/24 table rt10G
#10Gbe Primary Network

This an example of what I am trying to do (don't get hung up on the different IP addresses as it is an example only):
Selection_999(470).png


The problem I am running into is that while normal network traffic "SEEMS" to be working correctly I cannot cross mount directories to each server.

On each server I have a storage array mounted at "/pve-ischeme/{servername}/data"

I export the same directory in the /etc/exports file on each server to each static route like below:.
/pve-ischeme/r720xd-02/data 172.20.1.0/24(rw,fsid=0,sync,no_root_squash,crossmnt,no_subtree_check)
/pve-ischeme/r720xd-02/data 172.20.10.0/24(rw,fsid=1,sync,no_root_squash,crossmnt,no_subtree_check)
/pve-ischeme/r720xd-02/data 172.20.11.0/24(rw,fsid=2,sync,no_root_squash,crossmnt,no_subtree_check)
/pve-ischeme/r720xd-02/data 172.20.40.0/24(rw,fsid=3,sync,no_root_squash,crossmnt,no_subtree_check)
/pve-ischeme/r720xd-02/data 172.20.41.0/24(rw,fsid=4,sync,no_root_squash,crossmnt,no_subtree_check)

In my /etc/fstab on a different server I try to mount the same directory from one server to (5) mount points on the other servers using the static routes for each like below"
172.20.10.17:/pve-ischeme/r720xd-02/data /pve-ischeme/r720xd-02/data nfs4 rw,soft,intr,rsize=8192,wsize=8192,timeo=600,retrans=5
172.20.10.17:/pve-ischeme/r720xd-02/data /pve-ischeme/r720xd-02/10G01 nfs4 rw,soft,intr,rsize=8192,wsize=8192,timeo=600,retrans=5
172.20.11.17:/pve-ischeme/r720xd-02/data /pve-ischeme/r720xd-02/10G02 nfs4 rw,soft,intr,rsize=8192,wsize=8192,timeo=600,retrans=5
172.20.41.17:/pve-ischeme/r720xd-02/data /pve-ischeme/r720xd-02/40G01 nfs4 rw,soft,intr,rsize=8192,wsize=8192,timeo=600,retrans=5
172.20.42.17:/pve-ischeme/r720xd-02/data /pve-ischeme/r720xd-02/40G02 nfs4 rw,soft,intr,rsize=8192,wsize=8192,timeo=600,retrans=5

The mount -a command doesn't throw any erros but when I navigate to the mounted folders, I can only access the default route mount. When I try to ls into any of the other directories it just hangs. I can ping across the static routes and vm/containers on each server seem to communicate properly. PFsense is showing traffic across the three Ethernet routes. Infiniband cannot be bridged so I was planning to do OS level direct mounts and leverage the direcotries at the OS level. NFS seems to not find anything not on the default 1GB route. SSH cannot connect when I use the -b option to force it on a specific static route.

My entire premise here is to have the storage in each server mounted to the other servers across multiple static routes so that I can share all the local storage across the cluster and segregate traffic across the (5) subnets to utilize their inherent speed. Having mounts as this would presumably allow me to attach the directories into my containers and vm's and reduce bottlenecks while allowing traffic to be isolated.

As an example, I want all traffic heading out to the wan to be on my 1GB Ethernet subnet. I want all my native inter-vm/container traffic on my two 10GB Ethernet subnets, lastly, I want all my backend data traffic between servers, vm's, and containers running on the two 40GB Infiniband subnets.

From everything I have been reading and testing, this should be doable. I just don't know why it isn't working. I would appreciate any help in understanding the following

1) can I segregate traffic over static routes and leverage the different speeds that comes with the various adapters?
1a) Have I done that correctly?
2) Can I share a directory from one server over multiple static routes?
2a) Have I done that correctly?
3) Can I mount a single directory from a remote server to multiple locations on a different server using different static routes?
4) Why does my routing from my VM's and containers seem to work but I cannot navigate the mounted folders across any route other than the default route?
5) Why when using SSH I cannot bind traffic across any route but the default route?

I know these are advances topics and while I have learned a lot on the internet, I still have more to learn on this subject. Again, ANY ASSISTANCE would be appreciated.
 

Takrbark3

New Member
Dec 17, 2017
24
0
1
40
Hi!

Don't use the "rt_table" solution, its an old "linux hackish-way" to implement the VRF (Virtual Route Forwarding) technology.
The linux variant for VRF is called "linux namespaces" its more simple and less pain.

You need "bind the interface/subinterface" to the specific "namesepace", after it will use the "separate routing table" inside the "namespace" (separate instance) and each "namespace" has its separate "iptables-rules" also.


Here the example ( using for your drawing )
(In this example the inter-vlan-traffic is handled on the PFSENSE side - for simplicity)
Code:
------Proxmox-Hosts---------

# LEFT-SIDE - VM-TO-VM #
---------------------------------------------------------------------------
vlan10 - mgmt
vlan 100 - 10.10.1.0/24 - proxmox: .1
vlan 200 - 10.10.10.0/24 - proxmox: .1
vlan 300 - 10.10.15.0/24 - proxmox: .1
vlan 400 - 10.10.20.0/24 - proxmox: .1
vlan 500 - 10.10.25.0/24 - proxmox: .1
#

# RIGH-SIDE - VM-TO-GW (pfsense) #
vlan 101 - 10.11.1.0/24 - proxmox: .1 ; pfsense: .2
vlan 201 - 10.11.10.0/24 - proxmox: .1 ; pfsense: .2
vlan 301 - 10.11.15.0/24 - proxmox: .1 ; pfsense: .2
vlan 401 - 10.11.20.0/24 - proxmox: .1 ; pfsense: .2
vlan 501 - 10.11.25.0/24 - proxmox: .1 ; pfsense: .2
---------------------------------------------------------------------------



# interfaces - interface PHY #
-------------------------------------------
nic-1GB -> eth0
nic-2x10GB -> bond10
nic-2x40GB -> bond40
-------------------------------------------


# proxmox - interface bridges #
-------------------------------------------
vmbr0
vmbr100
vmbr200
vmbr300
vmbr400
vmbr500
#
vmbr101
vmbr201
vmbr301
vmbr401
vmbr501
-------------------------------------------


# I'm using "openvswitch", you could use the "bridge-utils" too
--------------------------------------------------------------------------------------
$> ovs-vsctl add-port vmbr0 eth0
$> ovs-vsctl add-port vmbr100 bond10.100
$> ovs-vsctl add-port vmbr200 bond10.200
$> ovs-vsctl add-port vmbr300 bond10.300
$> ovs-vsctl add-port vmbr400 bond10.400
$> ovs-vsctl add-port vmbr500 bond10.500
#
$> ovs-vsctl add-port vmbr101 bond40.101
$> ovs-vsctl add-port vmbr201 bond40.201
$> ovs-vsctl add-port vmbr301 bond40.301
$> ovs-vsctl add-port vmbr401 bond40.401
$> ovs-vsctl add-port vmbr501 bond40.501
--------------------------------------------------------------------------------------


# Create Namespaces #
--------------------------------------------------------------------------------------
$> ip netns add vrf100
$> ip netns add vrf200
$> ip netns add vrf300
$> ip netns add vrf400
$> ip netns add vrf500
--------------------------------------------------------------------------------------


# bind interface to Namespace #
--------------------------------------------------------------------------------------
$> ip link set dev vmbr100 netns vrf100
$> ip link set dev vmbr101 netns vrf100
#
$> ip link set dev vmbr200 netns vrf200
$> ip link set dev vmbr201 netns vrf200
#
$> ip link set dev vmbr300 netns vrf300
$> ip link set dev vmbr301 netns vrf300
#
$> ip link set dev vmbr400 netns vrf400
$> ip link set dev vmbr401 netns vrf400
#
$> ip link set dev vmbr500 netns vrf500
$> ip link set dev vmbr501 netns vrf500
--------------------------------------------------------------------------------------


# configure the interfaces and routing inside each namespace #
--------------------------------------------------------------------------------------
vmbr100 > 10.10.1.1
vmbr101 > 10.11.1.1 ; pfsense: .2
$> ip netns exec vrf100 bash
>> ip addr add 10.10.1.1/24 dev vmbr100
>> ip addr add 10.11.1.1/24 dev vmbr101
>> ip link set lo up
>> ip link set vmbr100 up
>> ip link set vmbr101 up
>> ip route add default via 10.11.1.2 dev vmbr101
>> exit
#
vmbr200 > 10.10.10.1
vmbr201 > 10.11.10.1 ; pfsense: .2
$> ip netns exec vrf200 bash
>> ip addr add 10.10.10.1/24 dev vmbr200
>> ip addr add 10.11.10.1/24 dev vmbr201
>> ip link set lo up
>> ip link set vmbr200 up
>> ip link set vmbr201 up
>> ip route add default via 10.11.10.2 dev vmbr201
>> exit
#
vmbr300 > 10.10.15.1
vmbr301 > 10.11.15.1 ; pfsense: .2
$> ip netns exec vrf300 bash
>> ip addr add 10.10.15.1/24 dev vmbr300
>> ip addr add 10.11.15.1/24 dev vmbr301
>> ip link set lo up
>> ip link set vmbr300 up
>> ip link set vmbr301 up
>> ip route add default via 10.11.15.2 dev vmbr301
>> exit
#
vmbr400 > 10.10.20.1
vmbr401 > 10.11.20.1 ; pfsense: .2
$> ip netns exec vrf400 bash
>> ip addr add 10.10.20.1/24 dev vmbr400
>> ip addr add 10.11.20.1/24 dev vmbr401
>> ip link set lo up
>> ip link set vmbr400 up
>> ip link set vmbr401 up
>> ip route add default via 10.11.15.2 dev vmbr401
>> exit
#
vmbr500 > 10.10.25.1
vmbr501 > 10.11.25.1 ; pfsense: .2
$> ip netns exec vrf500 bash
>> ip addr add 10.10.25.1/24 dev vmbr500
>> ip addr add 10.11.25.1/24 dev vmbr501
>> ip link set lo up
>> ip link set vmbr500 up
>> ip link set vmbr501 up
>> ip route add default via 10.11.25.2 dev vmbr501
>> exit
--------------------------------------------------------------------------------------


# Simple startup script #
---------------------------------------------------------------------------------------------------------
Create vrf100.sh , vrf200.sh, vrf300.sh, vrf400.sh , vrf500.sh file, put inside /etc/network/ - directory

/etc/network/interfaces
----------------
auto vmbr100
iface vmbr100 inet manual
ovs_type OVSBridge
ovs_ports bond10.100
post-up ip netns add vrf100
#
auto vmbr101
iface vmbr101 inet manual
ovs_type OVSBridge
ovs_ports bond40.101
post-up ip netns exec vrf100 bash -c "exec /etc/network/vrf100.sh"
#
auto vmbr200
iface vmbr200 inet manual
ovs_type OVSBridge
ovs_ports bond10.200
post-up ip netns add vrf200
#
auto vmbr201
iface vmbr201 inet manual
ovs_type OVSBridge
ovs_ports bond40.201
post-up ip netns exec vrf200 bash -c "exec /etc/network/vrf200.sh"
#
auto vmbr300
iface vmbr300 inet manual
ovs_type OVSBridge
ovs_ports bond10.300
post-up ip netns add vrf300
#
auto vmbr301
iface vmbr301 inet manual
ovs_type OVSBridge
ovs_ports bond40.301
post-up ip netns exec vrf300 bash -c "exec /etc/network/vrf300.sh"
#
auto vmbr400
iface vmbr400 inet manual
ovs_type OVSBridge
ovs_ports bond10.400
post-up ip netns add vrf400
#
auto vmbr401
iface vmbr401 inet manual
ovs_type OVSBridge
ovs_ports bond40.401
post-up ip netns exec vrf400 bash -c "exec /etc/network/vrf400.sh"
#
auto vmbr500
iface vmbr500 inet manual
ovs_type OVSBridge
ovs_ports bond10.500
post-up ip netns add vrf500
#
auto vmbr501
iface vmbr501 inet manual
ovs_type OVSBridge
ovs_ports bond40.501
post-up ip netns exec vrf500 bash -c "exec /etc/network/vrf500.sh"
...
----------------
---------------------------------------------------------------------------------------------------------
 
Last edited:

coxhaus

Member
Jul 7, 2020
86
32
18
Here the example ( using for your drawing )
(In this example the inter-vlan-traffic is handled on the PFSENSE side - for simplicity)
If pfsense is doing the layer 3 then the switches are running layer 2. It does make for a simpler set up but slower.