What is your failover migration strategy in terms of hosts and memory?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

frogtech

Well-Known Member
Jan 4, 2016
1,482
272
83
35
I'm starting to build out my C6100 chassis and kind of getting annoyed with how much RAM costs for 8GB and beyond.

I'm also starting to realize that clustering one C6100 chassis with all 4 nodes in a MSFT Failover Cluster is pointless if you don't have hosts with enough memory available to receive the migrations or check the heartbeat.

Unfortunately this kind of puts a weird dent in my original plan: to have 3 C6100 chassis each used for different hyperconverged implementations, where at least 1 is "home-prod" that has a consistent environment that doesn't change(this was to be 1 C6100 chassis with 4 nodes in a ScaleIO setup). The other 2 chassis would be used to test other technologies and do development.

If I have 4 C6100 nodes in a single chassis that are each hosting individual VMs then it's likely that their memory usage will be at near cap at some point. This means migrating to other nodes in the same chassis wouldn't be possible. So it means I need to have replication targets setup in a separate chassis or additional hosts available in the cluster that aren't really running VMs(so they have the RAM available) ready to receive the migrations. This gives me one less set of nodes to experiment and play around with.

I am wondering, what kinds of strategies or policies do you guys have at home or in the workplace for this? Do you just make sure that your RAM usage/allocated amount never passes a specific threshold or is the industry practice to have empty hosts on hot-standby ready to receive migrations?

I guess another main issue is that the C6100 nodes only have 12 DIMMs, so you can only do 48GB of RAM if you go with 4GB sticks, 96 if you get 8GB, and 192 if you get 16GB sticks. This wouldn't be an issue if 16GB sticks were cheap(er/ish) since I don't think I would ever come close to utilizing 192GB RAM per node. So it limits me to 48/96GB for now.

In anticipation of some possible questions, right now I don't really see myself using a lot of RAM for a load of VMs since you don't need to allocate much memory to things like AD, but, I definitely was planning on implementing a SharePoint farm based on MSFT's requirements(which range anywhere from 12 to 24GB of RAM based on the server) since I've started doing a lot of SP development.

Thanks for your input! (I am sure this same question applies to pretty much any clustered virtualization environment so VMware folks, XenServer, Nutanix, etc, please feel free to chip in!)
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
You can get 16GB quad rank for dirt cheap, and last I saw 2R was ~$48-55, not exactly expensive like it has been for many years... it's more expensive now than last spring, or last fall for that matter but it's far from expensive.

@frogtech -- Figure out exactly how much you can afford, and how much you want/need and e-mail NATEX and tell them you need x# of 16GB or 8GB DIMMS and see if they'll work with you on price a little, never hurts to ask. SAME is true for ebay! 12*8 nodes is 96 dimms that should warrant a price break :)

96GB * 4 = 384GB in 1 chassis. That's a shit load of RAM for a home setup to utilize even 50% off 24/7.
Add in a second chassis and you're pushing 768GB.

Unless you're running a lot of in-memory DBs or cache VMs I just don't see you running out of memory with 96GB per-node at home.

Since you have 2 chassis and it sounds like you're gonna run them both at once have you looked into your power usage? I don't see you utilizing them fully at home due to power unless you have brought in power for the rack. Which obviously won't help completely keep RAM usage down but should keep CPU from bottle-necking a migration.

I personally don't over load the servers, they usually are operating at a fraction of what they can be if they were at 100%.

Just some of my thoughts. I don't manage any large VM clusters, ha clusters, etc...
 

frogtech

Well-Known Member
Jan 4, 2016
1,482
272
83
35
Well I originally wanted to use each chassis for separate things. They won't all necessarily be up. I'll have one main "usually online and basically home production" chassis and then the rest for testing different things like openstack, nutanix, ceph, etc.

Sent from my LG-H811 using Tapatalk
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
Ah, ok. I was thinking you had a huge excess of ALL resources ;)

Maybe specifying what you plan to have up 24/7 and able to migrate to vs. only up sometimes would help others point you in the right direction?

IE: Having 3/8 nodes total up 24/7 vs 6/8 is a rather big deal, and probably would change suggestions.
 

DavidRa

Infrastructure Architect
Aug 3, 2015
329
152
43
Central Coast of NSW
www.pdconsec.net
Sounds like you only need a single 384GB chassis and the others could be much less - which would definitely cut the RAM costs. I'm looking at the same thing - do I grab 24x 16GB 8500R DIMMs for 96GB per node (possibly a touch of power savings) or 48x8GB 10600L? I currently have 47 VMs (41 running) in a 4 node cluster; RAM is the "limit" and it's more that I want to have more memory in some of the VMs than have more VMs. It would mean I could shut down a node to save power and still have better capacity.

There is (was?) a deal for 16x 16GB DIMMs around USD400 floating around - I have looked at that and wondered about C6100 compatibility. You could probably ask that ebay seller for 24x and offer USD 500 or 500 or something; that should almost be line ball price-wise with double the count of the 2 rank 8GB DIMMs.
 

frogtech

Well-Known Member
Jan 4, 2016
1,482
272
83
35
Sounds like you only need a single 384GB chassis and the others could be much less - which would definitely cut the RAM costs. I'm looking at the same thing - do I grab 24x 16GB 8500R DIMMs for 96GB per node (possibly a touch of power savings) or 48x8GB 10600L? I currently have 47 VMs (41 running) in a 4 node cluster; RAM is the "limit" and it's more that I want to have more memory in some of the VMs than have more VMs. It would mean I could shut down a node to save power and still have better capacity.

There is (was?) a deal for 16x 16GB DIMMs around USD400 floating around - I have looked at that and wondered about C6100 compatibility. You could probably ask that ebay seller for 24x and offer USD 500 or 500 or something; that should almost be line ball price-wise with double the count of the 2 rank 8GB DIMMs.
In your cluster, where you have the 41 VMs running, do you have the operational capacity to migrate most of them to another host in the event of 'downtime' or a failed host?
 

DavidRa

Infrastructure Architect
Aug 3, 2015
329
152
43
Central Coast of NSW
www.pdconsec.net
Almost always. Hyper-V clustering marks VMs with a priority. I have a few I don't care about - and enough capacity (N+1) for all the rest then some. With 4x 48GB hosts (think about 46 usable) I have the following available RAM right now:

Host01 20.9GB
Host02 24.6GB
Host03 23.6GB
Host04 12.1GB

So right now - yes, all could run. But I've got a stack of "decommissioned" VMs turned off, and when I build a few new ones it'll vary greatly. I've seen it with 8, 6, 6, 7 available (so 20GB of VMs would have to be offline in a failure scenario).

Edit: I actually have 8 nodes I could (re)build, but one only reports 40GB of RAM sometimes, and the other three have an old VMware install (I should try Proxmox or something but last I looked they wanted $$$ for the way I'd want to configure it - ooh, actually, maybe oVirt could be cool).
 
Last edited:

Connorise

Member
Mar 2, 2017
75
17
8
33
US. Cambridge
I believe that for your case ( migrating VMs when it necessary) you can use any software that can provide you with good, old, replication. From my experience, I prefer block-level replication over file-level. As recommendation on software, it could be either vSAN or HPE VSA. Also, as an option, you can take a look on StarWind vSAN, as far as I remember they have inbuilt Heartbeat that works somehow.
 
  • Like
Reactions: darklight