It doesn't do anything of those except DHCP, VLANs, routing (and PBRs, VRFs etc...).
My .02 You're making this more complex than it needs to be. There's a few principles that keep me sane.
1. Never virtualize your firewall, primary storage or standalone router (not the L3 routing we're talking here).
2. You can go wild with VLANs, but keep asking yourself this.
Do I really need to segment these devices off?
3. Use Figma/Mural/Miro/Visio whatever, and keep a
current network diagram handy. Keyword:
current, i.e you make changes to your network, you update your diagram.
4. Script/Automate as much as you can.
5. Look at 1-4 again.
I'm totally overthinking it, but just from the shell shock of the beginning stages of learning the nuance and ins and outs of new gear / systems. It's part of my process:
Discover the thing >
Research the thing >
Think enough has been learned to be knowledgeable >
Place the orders for the gear > Unbox and connect >
Discover you're basically "Jon Snow" >
Keep throwing good money after bad trying to make the original misconceptions work, like a square peg in a round hole >
Admit defeat >
Finally get it right ~
I'm still admittedly in a Jon Snow phase
*For anyone that comes across this and doesn't know the admittedly dated Pop Culture reference, this gif gives enough of the idea to follow:
- That is one AWESOME looking rack. I'll assume your pushing some pretty impressive applications with a need for that kind of bandwidth and compute!
- I'm also all about that "cheaper for the same performance" life. If it works, it works. And leaves more funds for other performance improvements (or... vacations? pay raises?) My setup is WAY more modest than that, but it wouldn't even be what it is if I were always buying "the ideal" or "top of the line" gear, so I totally get it. Hence the whole threa starter - the cheapness of older Mellanox Gear, and now exploring the cheap 40Gbe Brocade stuff. A lot of times we could just wait a few years and pick it all up at a (dramatic) discount, too. Unless your pushing some kind of Global AI or NASDAQ-listed business assets, I've found that's the way to go (especially when its "your money").
-
That's not to say using more expensive (and hopefully ideal) gear is the wrong approach, but that timing the "buy" is important, too.
I feel like I messed up when I got my first pair of switches with single 10Gbe SFP+ ports. I didn't think I could get a low-cost NIC to 10Gbe SPF+ ports, so I shopped for 2 transceivers instead. It all worked out in the end because I decided to just just a long CAT7 to link 2 parts of the house together and the longest DAC I've ever found under $100 was about 25 ft? I think they go up to slightly longer, but then it starts getting more equal in expense over $100, anyway. Inventory and spares are nice for some things, I guess.
Case and point, I looked high and low for a used/refurb Rack of some kind to start being more legit than "curb candy furniture / cabinets with server chassis mounted to it" and 2-3+ Generation old Compute and Networking, but theres a time and place for everything. I have a 1 Generation old "mini server" that has RIDICULOUS performance but the form factor limits its capacity. Picking and choosing what I actually need the highest performance for in my App Stack and general homelab workload is key to balance out unwasted capacity and optimal performance. But everything eventually evolves, too. Hence the considerations for reduced power consumption, heat/noise reduction and interoperability of "legacy" gear - The reality is most things aren't a "start from scratch", but a slow (ish) evolution. Like my learning curve for 10+Gbe networking!
As for the management side of the house, yes trying to organize everything across a mix of Physical Hardware is most ideal, but I haven't found an affordable 2.5+ Gbe Router so I'm planning and experimenting with a somewhat simple vRouter to drive a segment across the 2.5G+ Switches on the network, buuuuuuut..... also while learning/studying/working with other things like some new tool/cloud platform/ hypervisor/programming language/Security+/CKAD/etc etc al in tandem. So.. I have learned enough to know that keeping it all straight in my recovering coder brain gets REALLY hard since I'm often just like:
Step 1 of the Slow descent into madness: "oh, I need that port on that subnet? - Done. Oh, I need access to that SR? that DB? done."
Step 2: - Oh I need that protocol to be allowed between those two segments at 8:03 am for 2 hours and 36 minutes every 3 days, except Sundays? Done.
Step C; but... wait... what happened now? why did everything break...? What was I doing? How did I do that?
Part (00000100) { #Where am I? What causes the Sun to flare? What is the point of existence?... What have I done...} ;
Tombstone quote: It was working before...
Usually during the whole "skill and career growth" phases of my life, where things start to overlap a bit more often and lightbulb moments start coming faster, it starts to hit me that "There's a lot to this I just don't know... *yet". I'm having that moment again.
The Network Diagrams are a great suggest. I've done a few segments before, application / cloud side, but general rule of "K.I.S.S." goes out the window in the playground phases - I've spent a few weekends trying to figure out why my network segments disconnect every 10 minutes (*shakes fist in the air at Cisco*...) and reconfiguring 5 physical routers to work together again after doing firmware updates to patch the latest CVE. "Tech Debt" is REAL. And it sneaks up on you, every time. I think its showing that I still think it can be mostly avoided with proper research and upfront knowledge, but deep down I know you're right. The only real solution is "learn from your mistakes, keep it simple (,stupid...), and get your hands dirty by *actually* doing it. I'm just trying to keep my blast radius small, and my ramp up time short. Tapping Expert/ "Sr level" folks is as good as it gets (why I'm even here, ranting like a madman on *too much coffee*).
Trying to soak up some of sweet sweet sanity vibes, I'll break it down the way you suggested:
1. Never virtualize your firewall, primary storage or standalone router (not the L3 routing we're talking here).
- I generally don't do a ton of FWing or Static Routing for internal networks/segments, just generally "block it all" at the "tap" (re: modem/Public Internet/ L3) and make a few exceptions if I have a need or desire (going out of town, hosting a service to support a cloud app without paying $1000s for a month of testing something, etc). - So this is a check in the Yes box (1 of 5 is a good start, right?)
But in some cases there might be a need to do something funky like "multi-hop configs" with Static Routes for recovering to a backedup snapshot/golden image if I fubar something so bad I need to start over(this is often when I am experimenting with new topologies or "big tools" like K8s, Observability or Security-related tooling). - The general idea being "break stuff and learn"- but... also NOT breaking my "primary segments" in that process (ever again...), hence the favoring of virtualizing the experimental configs on vRouters/vSwitches.
That said, are you recommending using only one single router with enough of a feature set to slice and dice up your segments - Like Ubiquity's EdgeOS of some kind of 3rd Party/Aftermarket/*"WRT" (assuming non-Enterpri$e Router$ for above said budget reasons)? Or is it the security implications of "soft-Routers" having too many points of entry? Maybe the reliability factors involved where "if any one thing goes wrong, it all breaks" that come with hosting a router in a VM? This is a question I've often pondered but I'm not exactly "Mr Popular" since nerding out in my tech cave is how I've spent a lot of weekends, and even when I am talking about this stuff with anyone that's knowledgeable about it, we don't usually discuss the nuances of multi-router / split trunking switches, etc etc. It'd be nice though! I'm into it.
2. You can go wild with VLANs, but keep asking yourself this.
Do I really need to segment these devices off?
- In my limited experience with VLANs, the answer is NO.
ALWAYS NO.
But... there in lies my lack of confidence that I understand it all "well enough", too.
I've found VLANs are fine for generic Internet Access to "workstations" used by the "normies" of Marketing/Sales, etc. And I understand the use cases from VLANs and how they can help secure a network. But I've certainly gotten WAY more creative in some experimentation in my own homelab scenarios (I keep it REAL basic and easy to understand in a "workplace scenario", for all the reasons you imply, and likely understand VERY well).
I've admittedly been trying to experiment with VLANs to secure Applications and Network Services, which I've found to be a
tragic mistake. I assume its because I just don't know how to properly manage a good "VLAN plan" in a way that I can "hack and track" as the application evolves and the needs for Apps grow and change. I wrestled with the idea that its just my limited knowledge of "advanced networking at scale", but then, when I just segment my CIDRs for the Application stuff without using VLANs, it works pretty well. I've done some trickery with Cloud ACLs/NACLs that work for those "edge cases", too. The obvious "risk" here, is that if I ever encountered an "inside risk", like a Developer/QA person with access to those segments, then they might do something bonkers and whacky "in code" that "unblocks" them from something they "THINK" they need access to... and that has downstream consequences (best case) or could be outright devious and try to hoover up internal/customer/partner/vendor data (worst case).
Do these "experiments and exercises" are typically more about how to not only make applications and dependent services work together in a secure, "least privileged" way, but to also add a few additional layers of observability and understanding in how to identify the differences between good intentions gone wrong (best case) and straight up malice (worst case).
3. Use Figma/Mural/Miro/Visio whatever, and keep a
current network diagram handy. Keyword:
current, i.e you make changes to your network, you update your diagram.
- I'm either N00b-ish or old school, using Diagrams.net / Draw.io and LucidCharts thus far - And I've done as much as I can to get away from Closed Source tools (sometimes they are just better though) so if Visio were already available in work like, maybe? I'm just not sure I ever want to build another GANTT Chart ever again... so I wouldn't opt for that for anyting "personal". Same for Figma "nowadays" (sadly).
Miro and/or Brainboard looks like they might be solid Freemium choices over totally old school, though - I happened upon
this corporate-y but useful-for-starter-research comparison list, too. I definitely need to be better about this adn it'd be good practice to map out the more technical aspects of my homelab segments, which would build better habits of keeping the similar docs straight where "wages are involved".
But yea, to your point (again) I need to settle on the design/topology. There in lies (the biggest) part of my problem,
for sure. The homelab network is constantly in flux, adding, removing and changing systems/nodes, and then also adding, tweaking... and now removing/replacing gear - it needs to stop. And it will, now. Since that "initial n00bish phase" is (mostly) over. I learned a lot though. Like "pick a trunk switch and stick with it, or you're going to have a REAL bad time".
The Brocade learning curve and "hands on experience" nuances will come next, but at least I already know what I'm going to do with it. If the reduction trend continues, I will likely just offload the Brocade in favor of something 100Gbe with less power consumption in a few years, too, and start the cycle all over. We shall see! But that said, I still often question if my actual design choices make the most sense for my mixed bag of "where I am now", to where I'm going in the short term, to what I'm thinking for the end goal.
Ideally?...
*********
Long Term - I'd like to set up a segment with super locked down access to some "cloud networks and services" on my Bare Metals and/or Guest VMs in the homelab to keep my cloud bills as low as possible
Mid Term - a segment for secure backups, ISOs and the like, then a segment for testing/development
Short Term - a segment for all things "local services" and automation (re Locally hosted DNS, NFS/SMB/SCP/SFTP etc, Maybe Ceph? CICD, Network Services, Load Balancers, blah blah; But in a "nested Segment" Databases and "Cold Storage".
Currently: A "Management Plane" - a segment for all things configuration and storage. A path to access any upstream device configs, services and firewalls that effect the rest of the network reaching public internet or "the segments I actually want to talk to each other (or... not)".
Previously: A STABLE, rock-solid segment as the "personal segment" that is for all other things basic, like web surfing, streaming, mobile device wifi, etc. (I think I've sorted this bit out, finally. But it feels like it took longer than it should have.)
Somewhere in the middle of all that might be cool to have a VPN or two, as well, for the things I might want to access externally for some reason or another (and an easy and non disruptive on/off toggle for all that).
That's GREATLY simplified from what I DID have/try, but perhaps its still a bit too ambitious. I'll find out sooner than later, once these missing pieces start showing up in boxes. And perhaps even more obviously I'm not going to broadcast the exact details in a public setting, but this is a high-level gist for "Separation of Concerns". These nuances are where I've made some terrible miscalculations on before, so in my 2nd round of "Enterprise Grade Networking gear" I'm hoping to sort it out faster / cleaner - Less "stuff" to plug in, WAY less complexity, more performance, WAY more stability.
So...
It might be worth asking:
How many "Bare Metal" / Physical Routers do you suggest for such a set up?
I'm trying to get it all down to One Physical Router "at the tap", and drive everything through 1-to-3 vRouters that slice everything up into the segments I've described via their own firewalls / routing configurations, so I'm not having unintended consequences for other parts of the network.
But based on what you've said and implied, maybe thats not a great approach? In my mind, managing each segment as if they were with its own network would allow me make settings like "only only access on port 443 to the Load Balancer" a lot less complicated, but its not exactly "basic" either.
Do you keep a "playground segment" separate? or do you use muliple physical routers?
*************
4. Script/Automate as much as you can.
I try to do this with each "service machine" (usually VMs, but sometimes Bare Metals, too) because I TOTALLY agree. But...
- I haven't mastered a flexible enough use case for anything other than Bash (or Powershell/BAT's - but I'm trying to get off most M$ stuff anyway) so if you have any solid suggests for any Python, maybe Ansible-family (AWX/RunDeck/etc) tools with generic templates or maybe something Terraform/OpenTofu I'd love to learn about it!
Right now I'm doing it all "custom" but its been tedious and slow.
I default to a lot of VM guest backups and dd images, most days. I'd like to get better at automating the homelab stuff, even if its just for restoring backups, but mostly for purposes like deploying, say, a private DNS or NFS server in a specific segment with a configuration template I can make a few changes to, like we do "at work". The "nice thing" about "at work" though, is there are usually some extremely predictable gaurd rails implied by the tools used throughout the workday. In the home lab, though, the sky is the limit, and things get REAL messy. Hence why we often get those yummy exceptions from those departments like those you're likely calling the shots for, boss man.
Maybe there are some "less extensive" tools of the trade than full blown large scaling tools like TF and Ansible I'm not aware of for more basic use cases though? I've been loving how good Xen is at making "instant copies" of guest VMs, so I have a "palette of templates" to draw from that makes it fast and easy to blow away something I've fubarred and start over, or simply make slight tweaks to different purposes. This is a far cry from "automating a stack" though. AWX and RunDeck are pretty heavy handed, and Terraform isn't much easier to maintain, hence why the templates make it "faster easier" to spin up and experiment with, given I already have a few OSes and initial setups that I'm familiar with and like (Network Manager, versus . But maybe there's a better way I'm not aware of?
I know we are straying off the originally explicit topic of Networking with 40Gbe, but that all that rambling above is where the Mellanox VPI/VDI stuff starts coming it (how they are used in the network, anyway).
5. Look at 1-4 again.
Stellar list. Your insights are deeply appreciated!