Overhauling My Homelab

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

a5tra3a

New Member
Jun 11, 2022
7
1
3
Calgary, Alberta, Canada
Hey everyone. This is my first time posting here, though I have been watching the channel and have used many of the articles on the site for a long time now, and signed up to join the community and hopefully get some other ideas, opinions and help with my homelab. I have been running a home for many years now (I started back when I was in high school back in circa 2001 ish) and looking at modernizing the hardware I am running to get a little more performance but more so to reduce the energy consumption. I also apologize for this being a super long post but felt it important to provide some context of the use case and the hardware before sharing the current options I am considering.

My lab currently consists of 7 Proxmox 7 nodes which are mostly HPE DL380 G5s (32GB ram) though there is an older SuperMicro custom-build (96GB ram) and a Dell PE2950 G3 (32GB ram) in there as well. They all have a 250GB SSD for the Proxmox OS and another as a Ceph OSD and are networked together using two DGS-1100-24 port switches with each server having the following gigabit connections:
  • 1x IPMI
  • 1 x Proxmox Management & WebUI
  • 1 x Storage Traffic
  • 3 x Virtual Machine Traffic
On the storage side, I have two QNAP TS-451s, a QNAP TS-853A and two custom-built NAS boxes, one having 8 drives and the other with 24 drives, the 451s are what I use for virtual machine disk storage. The 853A is used for general storage and virtual machine backups, the 8-drive custom-build is used for ISO storage for Proxmox and other general storage and the 24-drive NAS is used only for media storage. These are all networked using a single DGS-1100-24 switch.

The Proxmox nodes and storage side are linked using another DGS-1100-24 that also connects my homelab to the rest of the house and also connects my modem to the homelab on a dedicated VLAN. I use a number of VLANs to separate out traffic with the important ones being:
  • 100 - Core Network (Switches and WAPs)
  • 200 - IPMI Network
  • 300 - Compute Network (Proxmox Management)
  • 400 - Storage Network
  • 500 - Virtual Machine Network
  • 600 - Testing Network
  • 700 - WAN 1 Network
  • 800 - WAN 2 Network
  • 900 - Family Network
On the family side of the network, it is pretty standard with some desktop computers, gaming consoles, smart home devices, etc. I do have multiple wireless networks so I have one for trusted devices, un-trusted devices and guests as well. Now that you have some context of my setup here is where I am at now in planning and researching my overhaul, which will hopefully happen this fall/winter but I plan to start collecting parts once the plan is finalized. I see this going in one of two directions with a third being a sorta mix of the first two.

The first plan was to replace all the Proxmox nodes with five custom-built 4U systems using the following:
  • Rosewill 4U 12-Bay Hot Swap Chassis
  • Ryzen 5700G CPU
  • ATX X570 motherboard
  • 64GB RAM (upgradeable to 120GB down the road)
  • single ATX power supply
  • 2 x consumer SSDs for Proxmox OS using ZFS mirror (250GB ish, could also M.2 or SATA)
  • SSDs for Ceph OSDs (Would start with possibly 2 500GB with room for 12 total)
  • They would either have IPMI on the motherboard or would be connected to a KVM switch that is attached to a Raspberry Pi 4 running PiKVM.
  • 2 x Gigabit connections for Proxmox Management & WebUI
  • 2 x Gigabit connections for Storage (upgradeable to a dual 10 gigabit RJ45 NIC)
  • 2 x Gigabit connections for Ceph (upgradeable to a dual 10 gigabit RJ45 NIC)
  • 4 x Gigabit connections for virtual machine traffic (upgradeable to a dual 10 gigabit RJ45 NIC)

The second idea that came to me was to replace all the Proxmox nodes with multiple 1L mini PC systems that would be configured as follows:
  • Dell 7080 or Lenovo M920q
  • 1 (ideally 2) x M.2 SSD for Proxmox OS using ZFS mirror (250GB ish)
  • 5 x SATA SSD for Ceph OSD (500GB ish to start)
  • 2 x Gigabit connections for Proxmox Management & WebUI
  • 2 x Gigabit connections for Storage (upgradeable to a dual 10 gigabit RJ45 NIC)
  • 2 x Gigabit connections for Ceph (upgradeable to a dual 10 gigabit RJ45 NIC)
  • 4 x Gigabit connections for virtual machine traffic (upgradeable to a dual 10 gigabit RJ45 NIC)
In order to make this work I would be making heavy use of the rear IO ports and have them configured as follows:
  • HDMI / Display Port = Using an adapter would convert it to VGA to make it compatible with my existing KVM, though I might not need this with vPro, though I do not know much about it I believe it is very similar to IPMI in the server world.
  • NIC 1 = Proxmox Management & WebUI
  • WIFI = Unused unless passed through to a virtual machine
  • USB 1 = Mouse/Keyboard connection to the KVM
  • USB 2 = USB Hub 01 (This would be a 4-port hub)
    • Port 01 = USB gigabit NIC for Proxmox Management & WebUI
    • Port 02 = USB gigabit NIC for Storage
    • Port 03 = USB gigabit NIC for Ceph
    • Port 04 = USB gigabit NIC for virtual machine traffic
  • USB 3 = USB Hub 02 (This would be a 4-port hub, details below)
    • Port 01 = USB gigabit NIC for Storage
    • Port 02 = USB gigabit NIC for Ceph
    • Port 03 = USB gigabit NIC for virtual machine traffic
    • Port 04 = External USB SSD for Ceph OSD (500GB ish to start)
  • USB 4 = USB Hub 03 (This would be a 4-port hub, details below)
    • Port 01 = External USB SSD for Ceph OSD (500GB ish to start)
    • Port 02 = External USB SSD for Ceph OSD (500GB ish to start)
    • Port 03 = External USB SSD for Ceph OSD (500GB ish to start)
    • Port 04 = External USB SSD for Ceph OSD (500GB ish to start)
The third plan would be to have a mix using three of the 4U systems and four of the 1L mini PC systems to create a cluster that could have the workload run on either the power-efficient side or the more performance side or balanced across the whole cluster. The downside to the mixed option is I would only be able to get 6 OSDs on the 1L mini PC systems and 12 on the 4U systems and would want each side to have an equal number of OSDs.

Right now the big areas I am researching are:
  • How much of the 1-Gigabit network capacity is truly being utilized based on the traffic on that particular network / NIC. This is to determine if I need that many network connections and or if I truly need to be looking at 10-Gigabit networking in the next 1 to 3 years. I am not sure how to measure this properly and still working on how this will be figured out. Ideally, I would be able to place a device (think Raspberry Pi) or create a virtual machine that would monitor the various networks and or NICs on each machine and report how much traffic they are passing and how much bandwidth they are using to see where any bottlenecks are.
  • Power consumption, right now my homelab is running around 2200 watts which is costing around $100 - $150 a month on power. I know the Proxmox nodes are using around 200 to 250 watts of power each. The NAS devices are a lot more power efficient though the drives themselves consume a noticeable amount of power I am willing to keep that consumption and work on the Proxmox node side to get the usage down.
  • Standardization and upgradeability. I would ideally like all of the Proxmox nodes to be the same as it makes migration and HA failover a little easier as well as I want to get away from the RAID cards that are in the Dell and HPE servers and get direct access to the disks for both ZFS and Ceph. I would also like to get past the 32GB ram limit in my Proxmox nodes as this is mostly where I find myself running into limitations with the number of virtual machines and services I can host come from, though containerizing services into groups and running them on a smaller number of virtual machines has helped vs the only way was one service per virtual machine. The last is getting support for PCI-e passthrough and CPU feature sets for things like hardware encryption for my virtualized pfSense install for example or passing through the onboard GPU for a status monitor mounted in the rack.
  • I am also looking at swapping out the DGS-1100 switches for Unifi models though not sure I want to jump to 10-Gigabit now or have a new Unifi 1-Gigabit setup in the middle. I would love to have my switches and WAPs on a single pane of glass for management, mostly for VLANs.
  • Some of the services I run are pfSense, Unifi Controller, TrueCommand, Nginx as a reverse proxy, Mailcow, Proxmox Mail Gateway, Portainer, Jellyfin, Navidrome, BookStack, Gitea, Vaultwarden, Snipe-IT, OS Ticket, Sonarr/Radarr/Lidarr/Tdarr, etc., WordPress sites, and a few others I am probably forgetting.

I would love to hear any feedback, ideas, thoughts, suggestions, or comments, from the community on this overhaul plan as I want to make sure that this next iteration of my homelab not only brings the hardware up to a more modern age but also the deployment is done with the goal of reducing the energy usage while expanding the capabilities of my homelab and learning and implementing new and better practices along the way.
 
  • Like
Reactions: Jeggs101

oneplane

Well-Known Member
Jul 23, 2021
845
484
63
Wouldn't it start to make sense to look into convergence using something like Harvester HCI or at least Kubernetes + Rook + Kubevirt? Most hardware pooling is beyond the concept of segregating storage, networking and compute and mostly focuses on specialisation where you might have some nodes balanced differently but all nodes having some minimum capabilities.
 

a5tra3a

New Member
Jun 11, 2022
7
1
3
Calgary, Alberta, Canada
I have looked at something like Kubernetes using something like Rancher with Longhorn I believe it was but have not made the jump yet to learn it as it appears to be a big departure from my current setup. Which is I currently run most of my stuff using Docker and originally planned to use Docker Swarm to allow for balancing between docker nodes (planned to create one or 2 nodes as virtual machines on each Proxmox node) but a few services do not play nice with NFS shares (SQL lite databases) so I ended up creating single virtual machines that hosted grouped services such as the Documentation Server hosts BookStack and Gitea and Vaultwarden and Snipe-IT, and my Media Server hosts all the media related docker containers.

I also just started looking to see if the number of OSD disks per Ceph node is normal and it seems to be 4 to 8 ish. I wonder if using a 2.5 SATA to dual M.2 to allow for 4 M.2s in an M920q would be enough for a cluster of 5 to 9 of them with 1 M.2 for Proxmox and the other 3 for OSD disks.

I know USB for NICs and HDDs is not the best solution and is mostly ill-advised but not sure how else to add that many NICs and storage disks to a 1L mini PC system without it.
 

zunder1990

Active Member
Nov 15, 2012
212
72
28
  • 2 x Gigabit connections for Proxmox Management & WebUI
  • 2 x Gigabit connections for Storage (upgradeable to a dual 10 gigabit RJ45 NIC)
  • 2 x Gigabit connections for Ceph (upgradeable to a dual 10 gigabit RJ45 NIC)
  • 4 x Gigabit connections for virtual machine traffic (upgradeable to a dual 10 gigabit RJ45 NIC)
oh boy that is lot of connections. Save the money for a dual port 10gb sfp+ card and switch like icx6610.
 

a5tra3a

New Member
Jun 11, 2022
7
1
3
Calgary, Alberta, Canada
I am leaning towards the 4U system option and possibly going 10Gb ethernet and having fewer nodes, though I still would like to investigate how much I am utilizing the bandwidth of the existing 1Gb connections first. The main reason I have so many connections now was to try and reduce or limit bottlenecks with the 1Gb network that I am currently running and I had a ton of spare 1Gb NICs and lots of PCI-e slots to put them in the various Proxmox Nodes and lots of spare ports on the swiches.
 

oneplane

Well-Known Member
Jul 23, 2021
845
484
63
Technically, any SSD could saturate an 1G link with ease. Using Proxmox's built in Ceph provisions is feasible, three big nodes would work, five medium nodes would work too. With three nodes the downside is that a node failure causes a lot of balancing, but the upside is that fitting each node with a dual 40Gb NIC is enough to interlink all of them, solving that problem, even without a switch.
 

i386

Well-Known Member
Mar 18, 2016
4,247
1,547
113
34
Germany
I would love to hear any feedback, ideas, thoughts, suggestions, or comments, from the community on this overhaul plan as I want to make sure that this next iteration of my homelab not only brings the hardware up to a more modern age but also the deployment is done with the goal of reducing the energy usage while expanding the capabilities of my homelab and learning and implementing new and better practices along the way.
Do you really need that many nodes or would a beefy server suffice?
You never had trouble with the family stuff being on the same infrastructure as the homelab stuff? I know I like to try new things and occasionally it ends with "disasters" :D

When I planned my homelab originally I wanted to use 8 esxi 1u units but scraped that after getting a workstation board that supports 256GB ram and can run these nodes in vms. ( that was around 2014/15, now you could get boards with support up to 2TB ram)
 

a5tra3a

New Member
Jun 11, 2022
7
1
3
Calgary, Alberta, Canada
I originally used to run just 2 mid-tower PCs as my homelab but when I converted to rack-mounted gear I went from 2 to 5 because I was moving from new hardware to used hardware and wanted to have some redundancy in case the aging hardware failed. Then I added the 2 other nodes a year or so later as I was able to pick them up for super cheap and now that I am looking to go back to new hardware I will scale back down from 7 nodes to 5 and possibly 3 depending on what the specifics on each node end up being.

In my mind when it comes to resource planning with 3 nodes, each node would have to be able to cover itself and half of another node's workload. With 5 nodes the workload that each node would have to cover becomes smaller unless you have a second failure and that is also where I see myself still having 5 nodes would be to cover a second hardware failure as my homelab makes up the infrastructure of my home network.

My network is separated across multiple VLANs and I also have a backup all-in-one ASUS router that is connected directly to the modem so if my hardware totally fails we still have access to the internet. I used to have some issues with my tinkering in the homelab affecting the rest of the network but that has been mostly mitigated/resolved with some changes that I implemented last fall.