Swtiching over Multiple Uplinks on VMware VDS

CA_Tallguy

New Member
May 19, 2020
28
4
3
So I'm trying to cross connect my physical ESXi hosts with 40GbE direct connect cables because that's the only way I can get 40GbE.... (and I don't want to pay data center to host a 1U switch that would be only 1GbE or 10GbE max anyway). I do know that VMware only officially supports 2 node direct connect -- so trying to figure out an unspported way to do this.

Should something like the following drawing work? Right now, it doesn't seem like I have active-active working on ESXi 1.... only one of the links seems to work at a time. Tryhing to figure out if there is some port isolation going on, failover settings I have not properly adjusted, etc.

As these are "virtual switches" after all, shouldn't this work? I'm no networking expert but I think you could connect physical switches together just fine like this. I can install RouterOS or PFSense and setup a switch that way but I'd rather not have to worry about those systems since this seems like basic switching and vSphere should handle it.

Or should I go ahead with RouterOS etc? (or is NSX-T an option? Seems like it woudl be a bloated solution for my needs)

RouterOS's VRRP may work great and provide some resilience... just setup primary/failover routers on 2 of 3 (or 4) nodes and let them handle switching. I just don't want to introduce complexity into the system if there is a way to handle it within VMware.

The blue lines here are my 40GbE links.....

switching.png
 

CA_Tallguy

New Member
May 19, 2020
28
4
3
This is frustrating. Why can't I add a physical nic to a vDS and get it to function like a normal switch and switch port?

Back in 2015, someone on VMware boards did some extensive attempts to get something like this working. He thought he had it working by using several vDS's instead of one (I pondered this idea myself) but it didn't work out in the end.....https://communities.vmware.com/thread/500023

Above link is a good read for anyone attempting this. It is getting to be an older post to things may be different now.

In my case, so far I have learned that with one cable connected I can ping just fine from host 1 to that host (2 or 3). But I can never ping over both cables -- only one cable seems to be active no matter what settings I try. At least with another ESXi. Maybe I'll go plug my laptop in to a port and see if I can get it to ping while another link is active.

(I'm not even trying to ping from host 2 to 3 right now as I can't even get pings over both cables to the common host.)

I've been trying every setting possible and can't find anything that helps. This sure does seem consistent with having only one uplink active and that my active-active intended setup is not working. But based on the failures of other people to do this I am guessing there is more at play. Perhaps even that VMware has safeguards in place to prevent this topology.
 

CA_Tallguy

New Member
May 19, 2020
28
4
3
Shifting gears to investigate options such as RouterOS, pfSense, vyOS etc as I am going to need more than just a port on a vDS anyway.

Maybe Open vSwitch. There is a nice configuration detailed on Proxmox wiki where it looks like an instance is hosted on every physical host and then there is a single uplink active from the switch with others blocked by STP. Open vSwitch - Proxmox VE


Code:
 X     = 10Gbps port
 G     = 1Gbps port
 B     = Blocked via Spanning Tree
 R     = Spanning Tree Root
 PM1-3 = Proxmox hosts 1-3
 SW1-2 = Juniper Switches (stacked) 1-2
 * NOTE: Open vSwitch cannot do STP on bonded links, otherwise the links to the core
         switches would be bonded in this diagram :/

 |-----------------------------|
 | G           G           G   | SW1
 |-|-----------|-----------|---| R
 |-+-----------+-----------+---|
 | | G         | G         | G | SW2
 |-+-|---------+-|---------+-|-|
   | |         | |         | |
   | |         | |         | |
   | B         B B         B B
   | |         | |         | |
|--|-|--|      | |      |--|-|--|
|  G GX--------+-+--------XG G  |
|     X |      | |      | X     |
|------\|      | |      |/------|
   PM1  \      | |      /  PM3
         \     | |     B
          \    | |    /
           \|--|-|--|/
            \  G G  /
            |X     X|
            |-------|
               PM2
 

CA_Tallguy

New Member
May 19, 2020
28
4
3
Couple of points for anyone considering trying this.... don't overlook/underestimate the chicken and egg problems in a setup like this, especially with vSAN.

I got a setup working at like 2.5gbps with Mikrotik Cloud Hosted Router (no optimizations or jumbo frames) and then mindlessly made changes to the interfaces and took down my vSAN and with it my vCenter. Since I hadn't taken care to put things into maintenance mode, my system came back up with big red X's indicating my VM's were corrupted.

I was able to reconfigure networking using ESXCLI back to my physical switch in order to bring the system back up and my VM's were fine after that. (The side benefit of these experiments is learning about resilience of VMware and how to recover should things go haywire when there is more at risk in an operational system.) But this certainly illustrated points of failure I need to avoid.....

(1) DO NOT rely on a single point of failure. Seems obvious but it's easy to forget. (And don't forget that a HARDWARE SWITCH is also a single point of failure.... so I don't think trying to create a more resilient setup in software is entirely ridiculous/dangerous because of redundancy.) Anyway, a single VM soft switch on a single host running the networking is not enough. You can't reboot the VM or host that is running the switching without shutting down the whole environment or toasting/crashing it when you forget and do something dumb as I did yesterday.

(2) Get HA failover in place quickly. For example, the Open vSwitch topology in my last post should be fine, even without the backing switch for uplinks. You could in theory just have a single uplink and in (hopefully) rare circumstances when you needed to reboot a host just physically move the uplink to another box.

(3) You need to be sure your software switch/router VM will be bootable on individual hosts in order to bring up your environment. I may keep one or more virtual switches on local disks, or mirror to local disks, just to avoid any reliance on vSAN because I have seen too many cases already where my VM's are not accessible due to interhost connectivity. Or perhaps there are sufficiently safe methods for forcing vSAN to keep all data accessible on a single host that I have not enabled. In any case, it is obviously critical to be able to bring up at least one method for connecting the cluster from a powered off state.
 
Last edited:

CA_Tallguy

New Member
May 19, 2020
28
4
3
SOME IDEAS ON MOVING FORWARD FROM HERE.....

I'm going to try a hybrid of direct connections and soft swtiches (Open vSwtich, routerOS, pfSense, etc).

GOAL: Since directly connecting 2 nodes DOES work and is a very robust connection, and can provide the best line speeds, it would be a shame to not try to leverage them.

A work-around may be to direct connect 2 nodes and then connect it to another 2 nodes with pfSense, routerOS, Open vSwitch etc. Or maybe without a soft-switch at all if you can figure out another way to get a cross connection that will allow packets to switch somewhere natively within ESXi. If you have 3 nodes, and 2 are direct connected with just a cable, they can come up and form an environment even without your 3rd node. Then you only have 1 node at risk relying on routerOS, Open vSwitch etc.

This becomes a bit like a stretched cluster configuration. I have seen VMware recommendation for using stretched cluster networking for rack isolation at a single location so not terribly different. In theory you are supposed to have a witness for a stretched cluster, however, so may need that or to find a way around it. Especially if you have an even number of nodes.

ISSUES:

(1) I will still need to figure out what platform to use for a soft switch for interconnecting links between sets of directly connected hosts. Unfortunately, Mikrotik's CHR (cloud hosted router) does not currently support SR-IOV and I'm thinking they do not have drivers for my Mellanox cards anyway. So if I want to have any decent speeds I need to find something that I can optimize to leverage my hardware.

(2) In my Mikrotik test where I got 2.5gbps, that was just on a network speed test. When I tried to vMotion, I was prevented by a vSphere check for a physical adapter from every host on the vDS. At least for that one precheck, it did not take into consideration that the other hosts were indeed reachable. They apparently insist on using physical adapters. I believe that vSAN traffic may have been flowing OK. In any case, I'm starting to think there could be some circuit breakers I may need to defeat or work around.
 

CA_Tallguy

New Member
May 19, 2020
28
4
3
Came across this info and wanted to cross link it under my original question for any others trying to understand why a vSwitch is not a switch...


I wonder if there is a way to override some of this default behavior but I doubt it. I guess that's where NSX comes in.... to build out an actual SDN.

NSX seems to be overkill for what I need so I'm continuing to look for a lightweight, efficient (not CPU intensive) solution. Mellanox NIC's have something called an eSwitch that might be perfect. I think that is OpenFlow based and works with standard controllers. Newer cards also have ASAP2 which seems like it can do hardware offfloading for Open vSwitch.

Here's an older page discussing this eSwitch on Connectx-3 card in conjunction with a 3rd party application