Harvester HCI, anyone working with it?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Greg_E

Active Member
Oct 10, 2024
479
164
43
That's actually important information to know. Consider that many new people moving to "something else" will be coming from vmware, and knowing some of the different ways to move VMs to the new system is important. Yes we always want to live migrate from old hypervisor to new, but that's not always reality. In the end, there might be a bunch of overnights with cold migrations happening just to get moved over. Should be a realistic part of the plan, even if it is plan B.

The different network names are always going to haunt this process, something I'll have to work on once I get going deeper.
 

Greg_E

Active Member
Oct 10, 2024
479
164
43
Gave Longhorn v1 a last try, used VirtIO, SATA, and SCSI configurations and drivers. Here is what I saw, labels in descriptions:

sata2_driver.jpg

scsi_driver.jpg

virtio_driver.jpg

Overall, the VirtIO seems to be a good average between max read and max write, but in my lab none of these are going to be faster than an old spinning drive at roughly 130mbps write.

Waiting for new larger OS drives to arrive, then I'll use part of the SATA drive as Longhorn v1, and use the nvme drive as Longhorn v2. I really hope V2 will be faster, but I'm a little skeptical here, the way it does the duplication, I'm not sure how much more I can see. When I'm running these tests, I'm seeing the source host transmit at up to 6gbps on a 10gbps connection, half of that goes to the second and half to the third host. The only ways to speed things up are significantly faster networking, or a change where you send the second copy to the second host, then the second host relays that to the third host. That would be a significantly faster workflow and would allow more bandwidth to each host in the array.
 

Greg_E

Active Member
Oct 10, 2024
479
164
43
I've run out of kidneys to sell, but watching some 25gbe cards on ebay... I have 3 ports open on my Extreme switch which would double throughput across the network more effectively than bonding two ports. I think I'll try bonding both 10g ports as a future test. First need to compare to the LH v1 testing, then I can start bonding ports to see what's what. And then maybe buy some 25g cards, and then...
 
  • Like
Reactions: Marsh and itronin

Greg_E

Active Member
Oct 10, 2024
479
164
43
So... Was the wait and money worth putting into this cluster? At the moment, no.

I have Longhorn V2 running on the nvme disks, and still getting similar or worse performance. The worse performance is on the small side, where we really want the speed for day to day tasks. I am seeing that I'm hitting 7.3 gbps at the peaks, and I'm guessing that some 25gbps cards and cables might help here. I need to think about this a bit ore before dumping more money into this lab. I do happen to have 3 ports on one of my switches for this, so maybe.

Here are the numbers, repeating the Longhorn v1 as a reference:

virtio_driver.jpg

Longhorn v2 virtio

longhorn-v2-nvme.jpg

And Longhorn V2 with SCSI driver

longhorn-v2-nvme-scsi.jpg

I guess there are some small gains, but certainly not what I was hoping to find. Have to think about the 25gbe cards for the management/storage network. I'd really like to see the speeds from the 16k and down about double (or more), that's where a lot of work happens. I'll look into optimizations for V2, and maybe see what the locality = best-effort might do for those low size speeds. At the top end, yes it's nice to go faster, but that's not the real day to day of most of my systems, so what is shown above 256k would be fine (though again faster is better when you are migrating VMs around).
 

Greg_E

Active Member
Oct 10, 2024
479
164
43
Finally got a Rancher node up, it's on opensuse LEAP micro os on a Pi4. Not sure if the OS mattered, but it seems like the container does a lot of stuff where it shuts down and launches many times. It finally started working about 6 minutes after I created it. Now at least I have a target time frame for the host in my rack when I can get back to it.

Also have the 25g cards to install and see what happens, would be nice if the nvme speeds doubled, or tripled down on the small end of things.
 
Last edited:

Greg_E

Active Member
Oct 10, 2024
479
164
43
Going too many directions at once and not using my time the way I should be using it. Ever have those days where it's a struggle to even go into work and pretend you care? Far too many weeks of this lately.

Anyway, Harvester really should have Rancher with it, Rancher really should be running on a Kubernetes cluster, new cluster needs new hardware... When vcluster gets out of experimental and into tech. preview or production this might be sweet if you have enough cpu to handle the increase. You should be able to just install Rancher with a Helm chart onto the same cluster you are using for Harvester. Or harvester might provide a full Rancher within it's framework (most of the management now is Rancher based).

I think I'm just going to build a Pi cluster for this, openSuse LEAP Micro runs on Pi3 or better, is designed for lightweight server application (like Kubernetes), and is still free. I've spent way too much time looking around, trying to find the cheapest way to get the most cores and most ram, and it's just too much with few good options. Pi4 8gb wifi lite CM4 looks like the choice. And maybe down the road I ditch the carrier boards and buy one of the cluster boards for single chassis cluster (4 or 6 modules) with nvme.

Still haven't taken the time to set Harvester back up with the 25g cards, that doesn't help with the mental "duress".
 

Greg_E

Active Member
Oct 10, 2024
479
164
43
Hmmm... The 25g connection didn't seem to do anything. I set this one up with locality as best-effort which did slow things down a little last time.

longhornv2-25g-nvme-virtio.jpg

I'm going to need to verify that the storage network is using the 25g connections, and verify that they are really connected at 25g. And then decide if I bother with the higher power draw xxv710 cards or go back to the x520 cards. Not seeing any benefit from matching the card speed to the slot speed (both are now PCIe 3.0x8).
 

Greg_E

Active Member
Oct 10, 2024
479
164
43
Hmmm... Decided to try something that I should have done a while ago. I set the replication to 1 and locality to best-effort, what this did was to put the storage on a single local disk (nvme) using the Longhorn v2. Not exactly impressed, it was faster, but near what ESXi did to the same local disk, by a very large factor of about 5x. Seeing these results, I have one more thing to test, but I doubt there will be any significant gains, it just looks like Longhorn has some pretty high overhead that I can't recover with these little computer (or maybe any computer).

lhv2-block-local-nvme.png

The above was done in block mode, I'm going to try one with files mode as soon as I can get that set up, and a new VM created (yes created for each new storage). The ESXi test had this moving data up to around 3GBps, which was pretty blazing.
 

Greg_E

Active Member
Oct 10, 2024
479
164
43
This has been kind of resting as I work on methods to get Rancher up, I decided that I'm building a small k3s or rke2 cluster on openSUSE LEAP Micro with some "cheap" mini-PC that will hopefully be "good enough" for small lab use. I was back and forth with setting up RasPi4 and Pi5, but looking at the price to dollar, I went to n95 mini-pc. They are wicked thermally throttled, so not sure I'll be getting better than Pi5 speed out of them, but it is what it is right now since they have been bought. Attached them to a rack shelf with hook and loop, along with POE++ 12v splitters to haul the 30watts needed power. I have an af/at switch that can do 30w each port until I hit around 120w total, I also have a bt switch up to 90w per port in my lab rack with dual 900 watt supplies, so I have lots of power in the big rack.

The twists and turns in this project are big and varied. But the more I dig into Harvester, the more clear it becomes that having a reasonable grasp on Kubernetes and Helm charts is important. Just about everything (maybe really everything) can be configured by scripting in Yaml, Ansible being even more flexible for certain aspects.

Anyone taking this journey, I strongly recommend the book back from page one: Mastering Suse Harvester: The VMware Alternative by Cassian Smith there are a lot of tips to setting this up correctly when looking to move to production at scale. Even if you are familiar with enterprise level projects similar to this, there are tips that will make your project go better that you can plan today. I've almost finished the book, and every chapter has things in it that I would not have thought about. The Kindle version is cheap enough, but it's also available for alternate press sites where it looks like you can buy the epub version and break free of the Kindle stranglehold on your "rented" books. Wish I had seen that before I purchased.

And if there are any Harvester admins reading this, I do have a question: How can I migrate/move a VM from say a Longhorn V2 storage to a Longhorn v1 storage?
 
  • Like
Reactions: wifiholic

Greg_E

Active Member
Oct 10, 2024
479
164
43
Looks like I need to do some work on the networking side. Having trouble getting the card to connect at 25gbps which is why the last results looked the same. It keeps negotiating down to 10gbps, which is odd because I forced the switch out of auto to land on 25g but the connections are still 10g. I think another reboot might be in order.