Optimal Network Setup / Structure for Virtualized HPC Homelab

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,344
820
113
So... I am currently rebuilding / restructuring one of my labs or rather: My homelab

And because it's my homelab and I'm building everything from scratch, it needs to be perfect (TM) of course!

Let's start with the basics:

Usage: Automatated testing, CI and HPC (numerical simulations)
Networking:
  • All (max 18) servers connected to fully licensed Mellanox SX6036, each server both with a 56G Ethernet and Infiniband connection
  • Another 1G or 10G network for IPMI, possibly with ethernet access? Switch not determined yet
Software / OS: Debian 11 with Proxmox for a variety of VMs and a Kubernetes Cluster running outside the VMs for less overhead

Servers: Wild mixture, ranging from 3TB RAM Quad Socket to single socket E5-2680 v4 machines

Misc:
  • Homelab will often be turned off at night to save power
  • There will be a Raspi / comparable mini computer providing the following services
    • Private CA for homelab (SSL + SSH certificates), based on Hardware RNG (Infinite Noise TRNG) and Yubikey
    • Stratum 1 timeserver with GPS clock
    • DNS + DHCP?

So, now for a couple of points I can't decide on because, you know, it needs to be perfect o_O

So basically, I want to have a certain degree of separation between the VMs because some may be "rented out" to different customers and in case of malicious software one of the VMs
  • Completely separate subnet just for private VM inter-VM communication?
    • Realised with VLAN and SR-IOV Virtual Functions
    • Can be extended to multiple VMs on different servers for customer requirements by adding another VLAN and having spare SR-IOV Virtual Functions
  • Best way to connect the Raspi to everything (servers, VMs)
  • Completely isolate VMs from rest of Cluster? But may need access from VM to other parts of the cluster (Kubernetes, bare metal servers, etc...) for private test VMs, etc...
  • Internet access for VMs via Bridge or NAT / Routed?
  • How to ensure access from workstation to everything (VMs, IPMI, etc...)
    • Basically forbids NAT for VM networking because that would prevent easy access from workstation to specific VMs
  • Tool to manage DHCP and DNS? By hand / Bare metal?

I think many of those points could be solved quite nicely by using the Proxmox firewall, e.g. having more restrictive settings for rented-out VMs and less restrictive settings for internal VMs if necessary
 

Rand__

Well-Known Member
Mar 6, 2014
6,634
1,767
113
Usage: Automatated testing, CI and HPC (numerical simulations)
Networking:
  • All (max 18) servers connected to fully licensed Mellanox SX6036, each server both with a 56G Ethernet and Infiniband connection
Just make sure that thats working with your NIC - I think we discussed before, and u confirmed it worked, but at some point some CX3's were not able to run both types at the same time

  • Another 1G or 10G network for IPMI, possibly with ethernet access? Switch not determined yet
Normally IPMI should not need/have internet access. From a security pov a separate VLAN is recommended (but usually not implemented on home networks). 1G is plenty.

Misc:
  • There will be a Raspi / comparable mini computer providing the following services
    • Private CA for homelab (SSL + SSH certificates), based on Hardware RNG (Infinite Noise TRNG) and Yubikey
    • Stratum 1 timeserver with GPS clock
    • DNS + DHCP?
Just make sure you have backups of the Pi (and or run a cluster) since its no fun to loose base services when you rely on them...

So basically, I want to have a certain degree of separation between the VMs because some may be "rented out" to different customers and in case of malicious software one of the VMs
  • Completely separate subnet just for private VM inter-VM communication?
    • Realised with VLAN and SR-IOV Virtual Functions
    • Can be extended to multiple VMs on different servers for customer requirements by adding another VLAN and having spare SR-IOV Virtual Functions
Why SRIOV over virtual NICs?
Multiple Nics as default means you need a proper name schema for the various nics if you want to use hostname resolution, as well as multiple hostname definitions per box. Usually thats tied to vlans o/c. This can be quite painful (resolution wise) if you have to go to L3 networking as it means you need to properly expose services to specific nics/ips/hostnames only.

This is o/c not a real issue but separation leves tend to deteriorate over time as lazyness settles in.

  • Best way to connect the Raspi to everything (servers, VMs)
L3 routing to separate vlan, but o/c a single box (cluster) servicing everything is also a single entrypoint to everything... Make sure the firewall prevents unwanted services from passing through (ie strict service separation).


  • Completely isolate VMs from rest of Cluster? But may need access from VM to other parts of the cluster (Kubernetes, bare metal servers, etc...) for private test VMs, etc...
not entirely sure what you mean by this? Just separate by (possible) services being consumed at the various levels. Might mean that you need to move boxes between service segments if you want to add unexpected functionality.

  • Internet access for VMs via Bridge or NAT / Routed?
Totally depends on your requirements/internal available services ...
Eg if you provide a local update Server for the OS' of your choice then no access can work for some. If you want remote access to them then reverse proxy might be beneficial. If you dont mind tending double nat rules and all the issues that come with that thats also fine...;)

  • How to ensure access from workstation to everything (VMs, IPMI, etc...)
    • Basically forbids NAT for VM networking because that would prevent easy access from workstation to specific VMs
  • Tool to manage DHCP and DNS? By hand / Bare metal?
Similar access as the pi, L3 routed to separate VLAN, just with different services enabled.
Also can consider jumphosts to the various subnets where you can implement additional security levels (extra logging, 2FA, special users etc)

All in all its a very ... broad... question that depends very much on the amount of work you want to put into this, the actual requirements you have and the tools you're gonna use.
You also could virtualize everything with NSX (vmware) or a similar tool (of which I am sure open source versions exist) ....

Good luck:)
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,344
820
113
Just make sure that thats working with your NIC - I think we discussed before, and u confirmed it worked, but at some point some CX3's were not able to run both types at the same time
Yeah if I am not mistaken, it should work... In the worst case I'd have to prepone the upgrade to ConnectX-4

Normally IPMI should not need/have internet access. From a security pov a separate VLAN is recommended (but usually not implemented on home networks). 1G is plenty.
I was thinking that it might be useful for NTP - But because I will have my own timeserver, it's not really necessary

Why SRIOV over virtual NICs?
Multiple Nics as default means you need a proper name schema for the various nics if you want to use hostname resolution, as well as multiple hostname definitions per box. Usually thats tied to vlans o/c. This can be quite painful (resolution wise) if you have to go to L3 networking as it means you need to properly expose services to specific nics/ips/hostnames only.

This is o/c not a real issue but separation leves tend to deteriorate over time as lazyness settles in.
Because of performance - I need SR-IOV for the Infiniband stuff anyway (because Proxmox does not speak Infiniband). I don't think I would be able to make use of the full 56G (/ 100G at some point) without using SR-IOV

L3 routing to separate vlan, but o/c a single box (cluster) servicing everything is also a single entrypoint to everything... Make sure the firewall prevents unwanted services from passing through (ie strict service separation).
Yeah that's going to be fun..

not entirely sure what you mean by this? Just separate by (possible) services being consumed at the various levels. Might mean that you need to move boxes between service segments if you want to add unexpected functionality.
I'm not really planning to have service segments as it takes away flexibility

Similar access as the pi, L3 routed to separate VLAN, just with different services enabled.
Also can consider jumphosts to the various subnets where you can implement additional security levels (extra logging, 2FA, special users etc)

All in all its a very ... broad... question that depends very much on the amount of work you want to put into this, the actual requirements you have and the tools you're gonna use.
yeah, I see... I will probably need to have another brainstorming session

You also could virtualize everything with NSX (vmware) or a similar tool (of which I am sure open source versions exist) ....
Is there anything like that or do I need to fork Proxmox?