Thunderbolt 4 ring network between 3-5 node Intel 12th Gen NUC cluster

SapphironZA · Mar 21, 2023

Hi All

I am working on a proof of concept for a micro hosting requirement. The client requires hosting of many small VM’s, about 30-100 of 2GB of RAM each and 30GB of storage each. CPU and disk Load generated by these VM’s are very low.

They are looking to retire their two old Dell R540 servers due to very high datacentre power costs.
rather than buying a single server to replace it and having that single point of failure, we are thinking of going a homelab-ish route of using micro desktops as servers in a cluster. We cant seem to find clustering hardware that is not total overkill for this requirement, or terrible expensive relative to the resources you get.

We are investigating the option of setting up a cluster of Intel NUCs like these ones: https://www.intel.com/content/www/u…nuc-12-pro-kit-nuc12wshi5/specifications.html

We like that it has a 2.5G nic with Vpro. We will likely Vlan public traffic onto a Vlan interface setup in Proxmox and use the base interface for the management lan and backups to external storage.

We are also wondering, is since each NUC has two thunderbolt 4 port, would it be possible to build a 10gig ring network with Thunderbolt cables and avoid having to buy expensive thunderbolt to 10gig adaptors. This Ring network would likely use OSPF like apaird’s video here Fully Routed Networks in Proxmox! Point-to-Point and Weird Cluster Configs Made Easy - YouTube

We are looking at 3 or 5 nodes initially, up to a maximum of about 9 at most if the concept works very well. if we need more than that, we can probably justify buying a supermicro 4 node server to replace it.

Using a SATA SSD for boot and a M.2 SSD for VM data. We know that there is no disk redundancy, but the requirement can tolerate 5min of downtime and a minute or two of data loss in the case of a node failure. We are wondering what would work best for storage.

We are wondering if it is viable to setup the M.2 SSDs in a ceph cluster with 1 OSD per node. We will be using something decent for the M.2 SSD, but at 2.5Gbit or 10gigabit networking, I don’t see the SSD being the performance bottleneck. The shared storage nature should allow for migration and HA in the case of node failures or maintenance. I know general practice is to use at least 4 OSD’s per node, but I am not certain as to the thinking behind that. I have seen people using single OSD nodes in their lab environments.

Anything less obvious that we may be missing, or is someone using hardware other than Intel NUC’s for a similar purpose?

amalurk · Mar 21, 2023

I don't think this is going to work with NUCs and ceph for lots of good reasons.

Proxmox Ceph needs 3 networks in production. 1 for corosync, not negotiable, if public traffic or ceph saturates that you have a bunch of problems, needs to be separate. While you might combine public and ceph networks still bad idea. Public traffic uses up most of bandwidth then IOPS grind to halt. Ceph uses most of it and then whatever the VMs are doing with outside world grinds to halt.

I don't think you will have anywhere close to the IOPS from 1 OSD per node for 30-100 VMS even low IOPS ones. You might only have IOPS in the couple hundreds with 5 nodes.

Many disable the deep sleep C-states to get better IOPS, bad idea on NUCs where you are already thermally limited I think.

What NVME drives are you using, consumer ones with abysmal sync write? There goes your IOPS. Or enterprise with PLP and good sync write? Which form factor? M.2 limited to 2280? Not much choice there. They also use a more watts on read/write and idle power is higher so even more heat in NUC form factor that you maybe already disabled deep sleep C-state in so, processor is making extra heat too. Now you have thermal problem.

A ring network with multiple hops to some nodes adding latency, there goes ceph IOPS too it is sensitive to network latency.

This might be something you have fun at home with and granted that's all I have done with Ceph (building second larger cluster now after test) but this just seems like a really bad idea for a paying customer of yours.

What might work is keeping the storage local, but then you lose HA and migration or build a separate storage server they call connect to.

SapphironZA · Mar 21, 2023

amalurk said:
I don't think this is going to work with NUCs and ceph for lots of good reasons.

Proxmox Ceph needs 3 networks in production. 1 for corosync, not negotiable, if public traffic or ceph saturates that you have a bunch of problems, needs to be separate. While you might combine public and ceph networks still bad idea. Public traffic uses up most of bandwidth then IOPS grind to halt. Ceph uses most of it and then whatever the VMs are doing with outside world grinds to halt.

I don't think you will have anywhere close to the IOPS from 1 OSD per node for 30-100 VMS even low IOPS ones. You might only have IOPS in the couple hundreds with 5 nodes.

Many disable the deep sleep C-states to get better IOPS, bad idea on NUCs where you are already thermally limited I think.

What NVME drives are you using, consumer ones with abysmal sync write? There goes your IOPS. Or enterprise with PLP and good sync write? Which form factor? M.2 limited to 2280? Not much choice there. They also use a more watts on read/write and idle power is higher so even more heat in NUC form factor that you maybe already disabled deep sleep C-state in so, processor is making extra heat too. Now you have thermal problem.

A ring network with multiple hops to some nodes adding latency, there goes ceph IOPS too it is sensitive to network latency.

This might be something you have fun at home with and granted that's all I have done with Ceph (building second larger cluster now after test) but this just seems like a really bad idea for a paying customer of yours.

What might work is keeping the storage local, but then you lose HA and migration or build a separate storage server they call connect to.

Thanks for the info. I see the same challenges you do. we are going to separate the ceph traffic on thunderbolt to SFP+ interfaces connected to a storage switch. The ring network latency wont scale beyond 3 nodes. keeping public traffic on a separate interface. splitting Corosync onto a separate VLAN anyway so its on a separate logical network

Regarding the load of the VM's. The load us currently being served by two 8 core Dual Xeon servers. Dell R540's with spinning rust. They have an IOWait of about 5% and CPU load of 20% each, so the load is really minimal. Easily within the NUC's capabilities. Hell, I think one NUC has more CPU power than those old Xeons.

Regarding the SSD's we will be going as high end as we can within the 2280 form factor. Kingston KC3000 4TB units seems to be the most capable. its sustained write performance once the SLC buffer runs out is still more than good enough for what we need. We are also going with 4TB models, while we only need about 1TB, to allow for better performance and endurance. That said, we are trying to see if we can source some Kioxia XG6 2TB M.2 SSD's in our country.

PigLover · Mar 21, 2023

I actually think you could do it. Proxmox requires 3 "networks" as @amalurk described. But there is no requirement that they are physically seperate networks. Yes - there is best practice the says having Storage and Corosync competing on the same wire is not a good idea. But nothing stops you from doing it.

Take a look at how this guy did a 3 and then 5 node cluster using a high-speed "ring" and a central "internet" connection using OSPF to manage routing. His presentation style is not really clear but his approach is spot-on. He doesn't really provide enough info to easily replicate it but he leaves enough bread crumbs that it shouldn't be hard to work it out.

SapphironZA · Mar 21, 2023

PigLover said:
I actually think you could do it. Proxmox requires 3 "networks" as @amalurk described. But there is no requirement that they are physically seperate networks. Yes - there is best practice the says having Storage and Corosync competing on the same wire is not a good idea. But nothing stops you from doing it.

Take a look at how this guy did a 3 and then 5 node cluster using a high-speed "ring" and a central "internet" connection using OSPF to manage routing. His presentation style is not really clear but his approach is spot-on. He doesn't really provide enough info to easily replicate it but he leaves enough bread crumbs that it shouldn't be hard to work it out.

Its his video that put me into the idea

ano · Mar 21, 2023

computing uses watts, easy as that.

yes you can do ceph/proxmox ring

the power cost will be eaten up by setup time here, and the performance per watt is pretty darn good for amd datacenter stuff and other datacenter stuff really. trouble is you have so much more compute on a node theese days, that when you select say a 6354 and tune states, it EATS power, but... lots of compute

to llustrate how bad powercost is theese days, we were doing calculations for new cluster, and 40%!!!!!!!!!! in 3 years is powercost/rackspace. 60% is the drives/nvme/hardware

SapphironZA · Mar 22, 2023

ano said:
computing uses watts, easy as that.

yes you can do ceph/proxmox ring

the power cost will be eaten up by setup time here, and the performance per watt is pretty darn good for amd datacenter stuff and other datacenter stuff really. trouble is you have so much more compute on a node theese days, that when you select say a 6354 and tune states, it EATS power, but... lots of compute

to llustrate how bad powercost is theese days, we were doing calculations for new cluster, and 40%!!!!!!!!!! in 3 years is powercost/rackspace. 60% is the drives/nvme/hardware

We are also looking into the option of importing some Asrock Rack AMD AM5 barebones 1U server. They have dual 1 and dual 10gig-E and using a Ryzen 7900 non-X we can get the power draw way down. Also gives us the PCI-E slot to install up to 4 M.2 drives for storage

BoomBangCrash · Mar 23, 2023

Don't use consumer gear for corporate use, you won't be the first that dies from the pain of doing so. If they don't need that much computing power, the quad core Intel servers are basically the same CPU as the consumer gear but with the extra validation and features of enterprise. They won't use much more power than consumer stuff and and cost from power use will be easily offset from time/support savings.

oneplane · Mar 23, 2023

This can be done with Harvester HCI (or whatever the big boys repackage and 'license'). But you will run into problems when a node fails, and adding more for redundancy gives you problems on the network side of things. The issue isn't CPU or RAM, but everything else. A slightly easier model would be active-passive replicated storage and either Rancher or KubeVirt.

SapphironZA · Mar 23, 2023

BoomBangCrash said:
Don't use consumer gear for corporate use, you won't be the first that dies from the pain of doing so. If they don't need that much computing power, the quad core Intel servers are basically the same CPU as the consumer gear but with the extra validation and features of enterprise. They won't use much more power than consumer stuff and and cost from power use will be easily offset from time/support savings.

The quad core Intel Xeon's are really poor performance per watt and exceptional poor value for money.

where is the Intel Xeon version of a 12500 or 12700 CPU?

EDIT:

We want to do this the right way, but the hardware choices are crap, unless you need like 1.5TB of RAM and 50+ CPU cores across 3 nodes. We need something 25% of that size.

This is about as close as we can get with enterpise hardware

SYS-211SE-31D | 2U | SuperServer | Products | Supermicro

X13 SuperEdge 2U3-Node UP front I/O short-depth server

www.supermicro.com

but the local distributors suck

Search

Thunderbolt 4 ring network between 3-5 node Intel 12th Gen NUC cluster

SapphironZA

New Member

amalurk

Active Member

SapphironZA

New Member

PigLover

Moderator

SapphironZA

New Member

ano

Well-Known Member

SapphironZA

New Member

BoomBangCrash

New Member

oneplane

Well-Known Member

SapphironZA

New Member

SYS-211SE-31D | 2U | SuperServer | Products | Supermicro