Optimal Network Setup / Structure for Virtualized HPC Homelab

NablaSquaredG · Feb 6, 2022

So... I am currently rebuilding / restructuring one of my labs or rather: My homelab

And because it's my homelab and I'm building everything from scratch, it needs to be perfect (TM) of course!

Let's start with the basics:

Usage: Automatated testing, CI and HPC (numerical simulations)
Networking:

All (max 18) servers connected to fully licensed Mellanox SX6036, each server both with a 56G Ethernet and Infiniband connection
Another 1G or 10G network for IPMI, possibly with ethernet access? Switch not determined yet

Software / OS: Debian 11 with Proxmox for a variety of VMs and a Kubernetes Cluster running outside the VMs for less overhead

Servers: Wild mixture, ranging from 3TB RAM Quad Socket to single socket E5-2680 v4 machines

Misc:

Homelab will often be turned off at night to save power
There will be a Raspi / comparable mini computer providing the following services
- Private CA for homelab (SSL + SSH certificates), based on Hardware RNG (Infinite Noise TRNG) and Yubikey
- Stratum 1 timeserver with GPS clock
- DNS + DHCP?

So, now for a couple of points I can't decide on because, you know, it needs to be perfect

So basically, I want to have a certain degree of separation between the VMs because some may be "rented out" to different customers and in case of malicious software one of the VMs

Completely separate subnet just for private VM inter-VM communication?
- Realised with VLAN and SR-IOV Virtual Functions
- Can be extended to multiple VMs on different servers for customer requirements by adding another VLAN and having spare SR-IOV Virtual Functions
Best way to connect the Raspi to everything (servers, VMs)
Completely isolate VMs from rest of Cluster? But may need access from VM to other parts of the cluster (Kubernetes, bare metal servers, etc...) for private test VMs, etc...
Internet access for VMs via Bridge or NAT / Routed?
How to ensure access from workstation to everything (VMs, IPMI, etc...)
- Basically forbids NAT for VM networking because that would prevent easy access from workstation to specific VMs
Tool to manage DHCP and DNS? By hand / Bare metal?

I think many of those points could be solved quite nicely by using the Proxmox firewall, e.g. having more restrictive settings for rented-out VMs and less restrictive settings for internal VMs if necessary

Rand__ · Feb 14, 2022

NablaSquaredG said:
Usage: Automatated testing, CI and HPC (numerical simulations)
Networking:

All (max 18) servers connected to fully licensed Mellanox SX6036, each server both with a 56G Ethernet and Infiniband connection

Just make sure that thats working with your NIC - I think we discussed before, and u confirmed it worked, but at some point some CX3's were not able to run both types at the same time

NablaSquaredG said:
Another 1G or 10G network for IPMI, possibly with ethernet access? Switch not determined yet

Normally IPMI should not need/have internet access. From a security pov a separate VLAN is recommended (but usually not implemented on home networks). 1G is plenty.

NablaSquaredG said:
Misc:

There will be a Raspi / comparable mini computer providing the following services

Private CA for homelab (SSL + SSH certificates), based on Hardware RNG (Infinite Noise TRNG) and Yubikey

Stratum 1 timeserver with GPS clock

DNS + DHCP?

Just make sure you have backups of the Pi (and or run a cluster) since its no fun to loose base services when you rely on them...

NablaSquaredG said:
So basically, I want to have a certain degree of separation between the VMs because some may be "rented out" to different customers and in case of malicious software one of the VMs

Completely separate subnet just for private VM inter-VM communication?

Realised with VLAN and SR-IOV Virtual Functions

Can be extended to multiple VMs on different servers for customer requirements by adding another VLAN and having spare SR-IOV Virtual Functions

Why SRIOV over virtual NICs?
Multiple Nics as default means you need a proper name schema for the various nics if you want to use hostname resolution, as well as multiple hostname definitions per box. Usually thats tied to vlans o/c. This can be quite painful (resolution wise) if you have to go to L3 networking as it means you need to properly expose services to specific nics/ips/hostnames only.

This is o/c not a real issue but separation leves tend to deteriorate over time as lazyness settles in.

NablaSquaredG said:
Best way to connect the Raspi to everything (servers, VMs)

L3 routing to separate vlan, but o/c a single box (cluster) servicing everything is also a single entrypoint to everything... Make sure the firewall prevents unwanted services from passing through (ie strict service separation).

NablaSquaredG said:
Completely isolate VMs from rest of Cluster? But may need access from VM to other parts of the cluster (Kubernetes, bare metal servers, etc...) for private test VMs, etc...

not entirely sure what you mean by this? Just separate by (possible) services being consumed at the various levels. Might mean that you need to move boxes between service segments if you want to add unexpected functionality.

NablaSquaredG said:
Internet access for VMs via Bridge or NAT / Routed?

Totally depends on your requirements/internal available services ...
Eg if you provide a local update Server for the OS' of your choice then no access can work for some. If you want remote access to them then reverse proxy might be beneficial. If you dont mind tending double nat rules and all the issues that come with that thats also fine...

NablaSquaredG said:
How to ensure access from workstation to everything (VMs, IPMI, etc...)

Basically forbids NAT for VM networking because that would prevent easy access from workstation to specific VMs

Tool to manage DHCP and DNS? By hand / Bare metal?

Similar access as the pi, L3 routed to separate VLAN, just with different services enabled.
Also can consider jumphosts to the various subnets where you can implement additional security levels (extra logging, 2FA, special users etc)

All in all its a very ... broad... question that depends very much on the amount of work you want to put into this, the actual requirements you have and the tools you're gonna use.
You also could virtualize everything with NSX (vmware) or a similar tool (of which I am sure open source versions exist) ....

Good luck

NablaSquaredG · Feb 14, 2022

Rand__ said:
Just make sure that thats working with your NIC - I think we discussed before, and u confirmed it worked, but at some point some CX3's were not able to run both types at the same time

Yeah if I am not mistaken, it should work... In the worst case I'd have to prepone the upgrade to ConnectX-4

Rand__ said:
Normally IPMI should not need/have internet access. From a security pov a separate VLAN is recommended (but usually not implemented on home networks). 1G is plenty.

I was thinking that it might be useful for NTP - But because I will have my own timeserver, it's not really necessary

Rand__ said:
Why SRIOV over virtual NICs?
Multiple Nics as default means you need a proper name schema for the various nics if you want to use hostname resolution, as well as multiple hostname definitions per box. Usually thats tied to vlans o/c. This can be quite painful (resolution wise) if you have to go to L3 networking as it means you need to properly expose services to specific nics/ips/hostnames only.

This is o/c not a real issue but separation leves tend to deteriorate over time as lazyness settles in.

Because of performance - I need SR-IOV for the Infiniband stuff anyway (because Proxmox does not speak Infiniband). I don't think I would be able to make use of the full 56G (/ 100G at some point) without using SR-IOV

Rand__ said:
L3 routing to separate vlan, but o/c a single box (cluster) servicing everything is also a single entrypoint to everything... Make sure the firewall prevents unwanted services from passing through (ie strict service separation).

Yeah that's going to be fun..

Rand__ said:
not entirely sure what you mean by this? Just separate by (possible) services being consumed at the various levels. Might mean that you need to move boxes between service segments if you want to add unexpected functionality.

I'm not really planning to have service segments as it takes away flexibility

Rand__ said:
Similar access as the pi, L3 routed to separate VLAN, just with different services enabled.
Also can consider jumphosts to the various subnets where you can implement additional security levels (extra logging, 2FA, special users etc)

All in all its a very ... broad... question that depends very much on the amount of work you want to put into this, the actual requirements you have and the tools you're gonna use.

yeah, I see... I will probably need to have another brainstorming session

Rand__ said:
You also could virtualize everything with NSX (vmware) or a similar tool (of which I am sure open source versions exist) ....

Is there anything like that or do I need to fork Proxmox?

Search

Optimal Network Setup / Structure for Virtualized HPC Homelab

NablaSquaredG

Layer 1 Magician

Rand__

Well-Known Member

NablaSquaredG

Layer 1 Magician