vSAN sizing reccomandations

sovking

Member
Jun 2, 2011
81
6
8
Hi guys,
I would like to setup a VMware vSAN hybrid in my homelab, by using 3 nodes having 32 EPYC cores each.
Each node has 10 spinning SAS disk 10k of 900GB each, then I have some SSD like PM983 of 960GB, and of course I could by other SSD as needed.
The amount of VM is quite limited: around 20VM running at the same time: the largest one is around 50GB.
The requested failure to tolerate (FTT) is 1 (maxium allowed with only 3 nodes).

I'm uncertain about how to divide the spinning disks in disk group. I'm thinking to the following options:
  • 2 disk group with 4 hard disk and 1 ssd each (leaving out 2 hard disk as spare, 6 hard disk spare in total, 6 ssd required); or
  • 2 disk group with 5 hard disk and 1 ssd each (no spare, 6 ssd required); or
  • 3 disk groups with 3 hard disk and 1 ssd each (leaving out 1 hard disk as spare, 3 hard disk spare in total, 9 ssd required).
Probably the third solution could offer more performance with 3 disk groups, but it's more expensive because require 3 more ssd;
the second solution does not provide spare disks so I should buy at least a couple of more hard disk;
the first one works, but maybe 6 hard disks as spare is too much.

What is your opinion ? Any suggestion ?

Thanks in advance!
 

markpower28

Active Member
Apr 9, 2013
415
103
43
Both 2 & 3 works. Just be aware, unlike all flash, you can only do RAID 1 without deduplication and compression.
 

Rand__

Well-Known Member
Mar 6, 2014
6,084
1,458
113
And dont get your hopes up re performance.

At least write perf will be limited to a single cache disk per vm; total performance capability o/c increases with more disk groups; but that is not actively used with relatively few vms (my personal experience in a homelab environment).

Don't get me wrong, performance is usually "enough" but its not what I would call "fast" (disappointingly so wrt the hardware I use, but at least its HA). O/C ymmv :)
 

sovking

Member
Jun 2, 2011
81
6
8
Ok, I will decide on a budget basis for SSDs :)

@Rand Regarding the performance just now I'm running with all VMs sharing at 56Gbps a TrueNAS core server with plenty of capacity and HGST HUSMR SAS SSD used for SLOG, waiting for HUSMM coming.... but I want to try the vSAN path, to make further experience.

I read the documentation advices about 3 DWPD SSD, but looking at SLOG use until now maybe I could live with less endurance: more interested in IOPS and latency without going to Optane.
Which SSDs I should buy for cache at a reasonable budget ? (interested both in new and used ones)
 

Rand__

Well-Known Member
Mar 6, 2014
6,084
1,458
113
well lets put it this way, I run 4800x and p3600's and think my performance is ok-ish as long as I don't strain it .

I assume I got a lot of reserves (and when moving many VMS back and forth between my TNC box and vSan I see it), but in everydays load its not noticeable that I run an all nvme setup.

Now that does not tell you what you should get, but the problem is that vsan is not made for just a few VMs, but for many.
So it actually provides only a fraction of the total performance capacity to each vm.
Now I have no clue how much of a fraction that is, but for my use case (few vms, few users) that meant getting the fastest drive I could on low QD/Thread count (which meant optane).

Now you might not have the same high expectation set as I do, so youre possibly fine with regular SSDs, then on a budget I'd probably go for 400G s3710s or a bit higher some nice HGST 1640/ 3240/80s SAS ones (for the increased QD of SAS over SATA).
For capacity any larger affordable ones on the HCL should be fine.

If you want to see numbers for various experiments I shoud have a bunch of vsan threads here depicting my journey over time.
 

sovking

Member
Jun 2, 2011
81
6
8
You are suggesting SATA and SSD ssd, that means that going to NVME does not provide much more I/O ?
 

Rand__

Well-Known Member
Mar 6, 2014
6,084
1,458
113
That was purely based on your "budget basis for SSDs" :)

I always picture it as follows: vSan will use x% of total capability (where x is an unknown size).
The only way to speed things up is to increase total capability, because 10% of 100 MB/s is less than 10% of 1GB/s

What you want/need/can afford/want to afford is totally dependent on you and your use case:)
 

sovking

Member
Jun 2, 2011
81
6
8
Recently I paid a Samsung PM983 960GB (NVMe) around 135 Euro each: spec says 1.3 DWDP, 400K/40K IOPS sustained, 85/50us latency at QD1, so a tipycal read intensive drive, not so bad. (HGST HUSMM has 10 DWDP and over 100K IOPS in writing, but I pay it over 200 Euro for 1.6 TB)

I could spend more maybe 1.5 - 2x and staying with NVMe... if it worth... meaning that I can access also Windows VM without having the feeling that they are waiting for I/O.

The use-case of corse is important: my current project is all about external clients that are exchanging information with publish/subscribe mechanism (with some real-time constraint). The information is produced by a large set of emulators running in different VM (and later in containers) and by accessing 4-5 database istances, geographical data (like Openstreet Map data), inventory tables, and so on. Everything spread on 3 servers, 32x3 = 96 cores, each server with 256GiB of RAM.

Now with these more details, do you confirm your suggestions ?
 

Rand__

Well-Known Member
Mar 6, 2014
6,084
1,458
113
First of all, the cache drive does not need to be large, 400G is plenty. Only reason to go larger is because larger drives usually are faster.

Then,
I couldnt really follow your use case description , maybe the meaning got garbled in your translation to english and my conversion back, sorry.

However, this sounds like a different boat than your earlier "maybe 20 smallish vms".
I assume that most of the activity is done in cache , or are we talking about write heavy database activity here?

I would suggest that you try to come up with numbers for the iops you might need.
Remember that you will not be able to divide by 3 boxes since basically you have 2 writes per client iops + metadata on third box so either can do HA takeover.

I also strongly recommend using 4 boxes for vSAN (no need to have a 4th beefy box, just enough so it can hold data).
 

sovking

Member
Jun 2, 2011
81
6
8
However, this sounds like a different boat than your earlier "maybe 20 smallish vms".
I assume that most of the activity is done in cache , or are we talking about write heavy database activity here?
I would suggest that you try to come up with numbers for the iops you might need.
I'm in development phase and something it is still undefined: because emulators will be used mostly, I can suppose a 70/30 read/write scenario is possibile. In real deployment it could be more on 50/50 or even 30/70, but I cannot say and I have not still IOPS number (the cluster is not yet finished, ssd still to buy :))


Remember that you will not be able to divide by 3 boxes since basically you have 2 writes per client iops + metadata on third box so either can do HA takeover.

I also strongly recommend using 4 boxes for vSAN (no need to have a 4th beefy box, just enough so it can hold data).
Yes I know, the 3 box will divide mostly the compute part, one of them is 2x7301@2.2GHz and the last two are 1x7542@2.9GHz.
I know that the fourth box is the standard for vSAN, but I need to complete the three boxes and then look if there is stil some Euro for the fourth.

What is your opinion about the expected performance of PM983 vs s3710 ?
 

Rand__

Well-Known Member
Mar 6, 2014
6,084
1,458
113
Well 960 GB is overkill, but I would hope that nvme works better than the old 3710's (QD primarily)
For double that you might get 900p's though, thats what I would run, but your budget constraints might prevent you from it o/c
 

sovking

Member
Jun 2, 2011
81
6
8
Resuming this thread: currently I could buy 8-10 used SAS SSD HUSMM medium sizes (400/800gb) for about 80-120 Euro each, while finding good used Nvme ones same capacity like Intel P3600 in such quantity is more difficult to find in Europe - mostly come from US and China.
I know that 400gb is enough for caching, but prices for 800gb sometimes are similar to 400gb: it depends on actual avaibility.

While buying new SSDs for the same sizes, the choices are: Micron 7300 Max (usually the cheapest, at 180 Euro/400gb, 250 Euro/800gb), Seagate Nytro 3000 (310/400gb), WD DC SS530 (320E/300GB) Intel P3600 (320 Euro/400gb), P3700 (expensive) and P4610, Kioxia CD6-v (370 Euro/800gb).

So the alternatives for providing caching drives with 400-800gb with at least 3 DWPD are: a) going used, whatever you can find around 100 Euro per drive, and b) going new, you the amount needed start from 200 Euro for each drive, but it's easy to spend over 300 Euro.

For 3 hosts with 3 data groups each, you need at least 9 drives... therefore at least 900 Euro for the best deals with SAS SSD.... over 2500 with new Nvme drives.

This is a pretty economical reasoning that does not take into account the actual differences between nvme drives and ssd sas drives attached to shared sas-3 backplane.

In my opinion, for an hybrid vSAN, ssd sas drives should be a good tradeoff between budget and performance, while for all-flash setup I would take into serious consideration nvme drives for cache and sas ssd for capacity, or optane for cache and nvme for capacity.
 

Rand__

Well-Known Member
Mar 6, 2014
6,084
1,458
113
Everything depends on workload and expectations... but I agree with the general principle:)