Thunderbolt 4 ring network between 3-5 node Intel 12th Gen NUC cluster

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

SapphironZA

New Member
Mar 21, 2023
6
1
3
Hi All

I am working on a proof of concept for a micro hosting requirement. The client requires hosting of many small VM’s, about 30-100 of 2GB of RAM each and 30GB of storage each. CPU and disk Load generated by these VM’s are very low.

They are looking to retire their two old Dell R540 servers due to very high datacentre power costs.
rather than buying a single server to replace it and having that single point of failure, we are thinking of going a homelab-ish route of using micro desktops as servers in a cluster. We cant seem to find clustering hardware that is not total overkill for this requirement, or terrible expensive relative to the resources you get.

We are investigating the option of setting up a cluster of Intel NUCs like these ones: https://www.intel.com/content/www/u…nuc-12-pro-kit-nuc12wshi5/specifications.html

We like that it has a 2.5G nic with Vpro. We will likely Vlan public traffic onto a Vlan interface setup in Proxmox and use the base interface for the management lan and backups to external storage.

We are also wondering, is since each NUC has two thunderbolt 4 port, would it be possible to build a 10gig ring network with Thunderbolt cables and avoid having to buy expensive thunderbolt to 10gig adaptors. This Ring network would likely use OSPF like apaird’s video here Fully Routed Networks in Proxmox! Point-to-Point and Weird Cluster Configs Made Easy - YouTube

We are looking at 3 or 5 nodes initially, up to a maximum of about 9 at most if the concept works very well. if we need more than that, we can probably justify buying a supermicro 4 node server to replace it.

Using a SATA SSD for boot and a M.2 SSD for VM data. We know that there is no disk redundancy, but the requirement can tolerate 5min of downtime and a minute or two of data loss in the case of a node failure. We are wondering what would work best for storage.

We are wondering if it is viable to setup the M.2 SSDs in a ceph cluster with 1 OSD per node. We will be using something decent for the M.2 SSD, but at 2.5Gbit or 10gigabit networking, I don’t see the SSD being the performance bottleneck. The shared storage nature should allow for migration and HA in the case of node failures or maintenance. I know general practice is to use at least 4 OSD’s per node, but I am not certain as to the thinking behind that. I have seen people using single OSD nodes in their lab environments.

Anything less obvious that we may be missing, or is someone using hardware other than Intel NUC’s for a similar purpose?
 

amalurk

Active Member
Dec 16, 2016
312
116
43
102
I don't think this is going to work with NUCs and ceph for lots of good reasons.

Proxmox Ceph needs 3 networks in production. 1 for corosync, not negotiable, if public traffic or ceph saturates that you have a bunch of problems, needs to be separate. While you might combine public and ceph networks still bad idea. Public traffic uses up most of bandwidth then IOPS grind to halt. Ceph uses most of it and then whatever the VMs are doing with outside world grinds to halt.

I don't think you will have anywhere close to the IOPS from 1 OSD per node for 30-100 VMS even low IOPS ones. You might only have IOPS in the couple hundreds with 5 nodes.

Many disable the deep sleep C-states to get better IOPS, bad idea on NUCs where you are already thermally limited I think.

What NVME drives are you using, consumer ones with abysmal sync write? There goes your IOPS. Or enterprise with PLP and good sync write? Which form factor? M.2 limited to 2280? Not much choice there. They also use a more watts on read/write and idle power is higher so even more heat in NUC form factor that you maybe already disabled deep sleep C-state in so, processor is making extra heat too. Now you have thermal problem.

A ring network with multiple hops to some nodes adding latency, there goes ceph IOPS too it is sensitive to network latency.

This might be something you have fun at home with and granted that's all I have done with Ceph (building second larger cluster now after test) but this just seems like a really bad idea for a paying customer of yours.

What might work is keeping the storage local, but then you lose HA and migration or build a separate storage server they call connect to.
 
  • Like
Reactions: Amrhn

SapphironZA

New Member
Mar 21, 2023
6
1
3
I don't think this is going to work with NUCs and ceph for lots of good reasons.

Proxmox Ceph needs 3 networks in production. 1 for corosync, not negotiable, if public traffic or ceph saturates that you have a bunch of problems, needs to be separate. While you might combine public and ceph networks still bad idea. Public traffic uses up most of bandwidth then IOPS grind to halt. Ceph uses most of it and then whatever the VMs are doing with outside world grinds to halt.

I don't think you will have anywhere close to the IOPS from 1 OSD per node for 30-100 VMS even low IOPS ones. You might only have IOPS in the couple hundreds with 5 nodes.

Many disable the deep sleep C-states to get better IOPS, bad idea on NUCs where you are already thermally limited I think.

What NVME drives are you using, consumer ones with abysmal sync write? There goes your IOPS. Or enterprise with PLP and good sync write? Which form factor? M.2 limited to 2280? Not much choice there. They also use a more watts on read/write and idle power is higher so even more heat in NUC form factor that you maybe already disabled deep sleep C-state in so, processor is making extra heat too. Now you have thermal problem.

A ring network with multiple hops to some nodes adding latency, there goes ceph IOPS too it is sensitive to network latency.

This might be something you have fun at home with and granted that's all I have done with Ceph (building second larger cluster now after test) but this just seems like a really bad idea for a paying customer of yours.

What might work is keeping the storage local, but then you lose HA and migration or build a separate storage server they call connect to.
Thanks for the info. I see the same challenges you do. we are going to separate the ceph traffic on thunderbolt to SFP+ interfaces connected to a storage switch. The ring network latency wont scale beyond 3 nodes. keeping public traffic on a separate interface. splitting Corosync onto a separate VLAN anyway so its on a separate logical network

Regarding the load of the VM's. The load us currently being served by two 8 core Dual Xeon servers. Dell R540's with spinning rust. They have an IOWait of about 5% and CPU load of 20% each, so the load is really minimal. Easily within the NUC's capabilities. Hell, I think one NUC has more CPU power than those old Xeons.

Regarding the SSD's we will be going as high end as we can within the 2280 form factor. Kingston KC3000 4TB units seems to be the most capable. its sustained write performance once the SLC buffer runs out is still more than good enough for what we need. We are also going with 4TB models, while we only need about 1TB, to allow for better performance and endurance. That said, we are trying to see if we can source some Kioxia XG6 2TB M.2 SSD's in our country.
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
I actually think you could do it. Proxmox requires 3 "networks" as @amalurk described. But there is no requirement that they are physically seperate networks. Yes - there is best practice the says having Storage and Corosync competing on the same wire is not a good idea. But nothing stops you from doing it.

Take a look at how this guy did a 3 and then 5 node cluster using a high-speed "ring" and a central "internet" connection using OSPF to manage routing. His presentation style is not really clear but his approach is spot-on. He doesn't really provide enough info to easily replicate it but he leaves enough bread crumbs that it shouldn't be hard to work it out.
 

SapphironZA

New Member
Mar 21, 2023
6
1
3
I actually think you could do it. Proxmox requires 3 "networks" as @amalurk described. But there is no requirement that they are physically seperate networks. Yes - there is best practice the says having Storage and Corosync competing on the same wire is not a good idea. But nothing stops you from doing it.

Take a look at how this guy did a 3 and then 5 node cluster using a high-speed "ring" and a central "internet" connection using OSPF to manage routing. His presentation style is not really clear but his approach is spot-on. He doesn't really provide enough info to easily replicate it but he leaves enough bread crumbs that it shouldn't be hard to work it out.
Its his video that put me into the idea
 
  • Like
Reactions: PigLover

ano

Well-Known Member
Nov 7, 2022
653
271
63
computing uses watts, easy as that.

yes you can do ceph/proxmox ring

the power cost will be eaten up by setup time here, and the performance per watt is pretty darn good for amd datacenter stuff and other datacenter stuff really. trouble is you have so much more compute on a node theese days, that when you select say a 6354 and tune states, it EATS power, but... lots of compute

to llustrate how bad powercost is theese days, we were doing calculations for new cluster, and 40%!!!!!!!!!! in 3 years is powercost/rackspace. 60% is the drives/nvme/hardware
 
Last edited:

SapphironZA

New Member
Mar 21, 2023
6
1
3
computing uses watts, easy as that.

yes you can do ceph/proxmox ring

the power cost will be eaten up by setup time here, and the performance per watt is pretty darn good for amd datacenter stuff and other datacenter stuff really. trouble is you have so much more compute on a node theese days, that when you select say a 6354 and tune states, it EATS power, but... lots of compute

to llustrate how bad powercost is theese days, we were doing calculations for new cluster, and 40%!!!!!!!!!! in 3 years is powercost/rackspace. 60% is the drives/nvme/hardware
We are also looking into the option of importing some Asrock Rack AMD AM5 barebones 1U server. They have dual 1 and dual 10gig-E and using a Ryzen 7900 non-X we can get the power draw way down. Also gives us the PCI-E slot to install up to 4 M.2 drives for storage
 

BoomBangCrash

New Member
May 21, 2019
21
15
3
Don't use consumer gear for corporate use, you won't be the first that dies from the pain of doing so. If they don't need that much computing power, the quad core Intel servers are basically the same CPU as the consumer gear but with the extra validation and features of enterprise. They won't use much more power than consumer stuff and and cost from power use will be easily offset from time/support savings.
 

oneplane

Well-Known Member
Jul 23, 2021
845
484
63
This can be done with Harvester HCI (or whatever the big boys repackage and 'license'). But you will run into problems when a node fails, and adding more for redundancy gives you problems on the network side of things. The issue isn't CPU or RAM, but everything else. A slightly easier model would be active-passive replicated storage and either Rancher or KubeVirt.
 

SapphironZA

New Member
Mar 21, 2023
6
1
3
Don't use consumer gear for corporate use, you won't be the first that dies from the pain of doing so. If they don't need that much computing power, the quad core Intel servers are basically the same CPU as the consumer gear but with the extra validation and features of enterprise. They won't use much more power than consumer stuff and and cost from power use will be easily offset from time/support savings.
The quad core Intel Xeon's are really poor performance per watt and exceptional poor value for money.

where is the Intel Xeon version of a 12500 or 12700 CPU?

EDIT:

We want to do this the right way, but the hardware choices are crap, unless you need like 1.5TB of RAM and 50+ CPU cores across 3 nodes. We need something 25% of that size.

This is about as close as we can get with enterpise hardware

but the local distributors suck
 
Last edited: