HCI Servers Build... Starwind VSAN or VMWare vSAN?

Anton aus Tirol

New Member
Oct 20, 2013
10
2
1
How are you going to feed storage to your VMware cluster? Microsoft talks SMB3 and VMware talks iSCSI or NFS. It's possible to use Storage Spaces Direct for erasure coding and replication and put StarWind VSAN on top to expose VMware HCL-ed iSCSI/iSER but... It's kind of a train wreck :) ...and not very cheap one even assuming StarWind is free and Windows Datacenter licenses for S2D were found on the street.

 
  • Like
Reactions: Net-Runner

phennexion

New Member
Feb 17, 2016
10
0
1
37
Storage would be fed to the VMware (VSAN) cluster via raid controllers with JBOD mode, but if I went Starwind VSAN, I would probably just go Hyper-V for the cost savings. The drive compatibility's a bit higher with windows so I wouldn't have to worry as much about HCLs with commodity drives.

Also looked into S2D a bit more, looks like it has a ways to go before I would want to run it in production... so that's out of the question.
 
Last edited:

AnVil

New Member
Mar 9, 2016
2
2
3
36
Hi all!
StarWind support is officially here :)
So, I just want to shed some more light on the StarWind Virtual SAN and Hyper-V/Vmware issues.
In case of VMware that is looks like something related to VMware virtual network stack. Right now we are trying to figure out how to workaround this issue. Once we will have it we will create the KB article, and will push on VMware so they had the similar one.
As about the Hyper-V, according to the MS support guy, the issue is definitely related to the virtualization stack, but for now they couldn`t identify what exactly is bottlenecking here. Once we will have workaround we will create the KB article. Obviously we will continue pushing MS until they will provide everyone with the fix instead of the workarounds.
 

Net-Runner

Member
Feb 25, 2016
83
24
8
39
Hi, Phennexion.

Starwind is rock-solid in terms of stability if set up properly. Furthermore, Hyper-V would be the best choice here since in that case Starwind is installed directly on physical servers instead of VMs providing you with more simple and straightforward configuration. The fact is that Microsoft fail-over cluster live/quick migration powered with starwind is sometimes so smooth that you might not even notice that one of the hosts went down.

We have a couple of hardware appliances from Starwind and are very happy with them. Not sure about performance issues under ESX since we are using Hyper-V and performance we have on VSAN is quite perfect but AFAIK it's almost the same on any hypervisor and corresponds directly to local storage performance.

S2D is a very promising technology but do not expect to get it in production in nearest future. What I saw in TP4 is quite buggy and as we all know everything that comes from Microsoft could be used in production at least after SP1 :) And their new licensing policy kinda suck since you have to pay much more to get all the new features. Anyways, still very excited about the fact they are moving in this direction.
 
  • Like
Reactions: phennexion

phennexion

New Member
Feb 17, 2016
10
0
1
37
Thanks for the heads up Net-Runner, I also noticed that Starwind VSAN has some measure of fault tolerance? I saw a video on youtube that a tech failed a node and it live migrated off it somehow... have you been able to reproduce this?
 

TuxDude

Well-Known Member
Sep 17, 2011
618
338
63
The fact is that Microsoft fail-over cluster live/quick migration powered with starwind is sometimes so smooth that you might not even notice that one of the hosts went down.
Ugh - I've seen claims like this one so many times...

If a host goes down unexpectedly your VMs are going to have some downtime too - you and your users will notice. (Possible exception for VMware fault tolerance users, but we can ignore that for now.) All of the live migration technologies out there on all of the hypervisors that have them do NOT provide high availability to VMs - if a host crashes all of the VMs on that host went with it and it is way too late to migrate them.

If a host crashes your best case is that all of the VMs are automatically re-started on other hosts. But with the time it takes for something to realize that a crash occured, get the VMs assigned to other hosts, power them on, and wait for guest OS's and applications to boot, its probably not something that will go un-noticed unless you have a very low usage.
 

phennexion

New Member
Feb 17, 2016
10
0
1
37
Yeah I agree, total power failure with Starwind creates a "quick migration" from the looks of it, but a reboot of a node will allow live migration.
 

TuxDude

Well-Known Member
Sep 17, 2011
618
338
63
Yeah I agree, total power failure with Starwind creates a "quick migration" from the looks of it, but a reboot of a node will allow live migration.
No... there is NO migration if there is a crash or power failure. Any hypervisor, any storage technology. Once the host goes down the contents in RAM are gone, there is nothing to migrate, its already too late. With HA the VMs may be automatically restarted on other hosts.

You do your migrations BEFORE the machine goes down, whether it is a planned reboot or not.
 

phennexion

New Member
Feb 17, 2016
10
0
1
37
From the videos i've seen, with starwind in a windows cluster, they say there is a quick migration if there is a power failure on host... which my guess is a reboot.
 

TuxDude

Well-Known Member
Sep 17, 2011
618
338
63
From the videos i've seen, with starwind in a windows cluster, they say there is a quick migration if there is a power failure on host... which my guess is a reboot.
Your video was wrong - I'm pretty sure its still impossible for software to read the contents of RAM of a powered-off server*. I don't know how else I can word this - anything involving 'migrate', quick, live, or anything else, needs to move from the source to the destination. It can only happen before "bad stuff" happens to the source - after "bad stuff" has happened to the host there is no source to migrate from anymore.


*Note: theoretically it is possible to retrieve the contents of RAM after a system was shut down, but any methods of doing that that I've heard about involve cooling the RAM with liquid nitrogen or something else exotic to slow down the loss of data after power is removed.
 

dwright1542

Active Member
Dec 26, 2015
371
73
28
49
Yeah, I can confirm that either way, if you crash one of the boxes, it's basically a power on, after a "hard" power off. You'll get the "why did you shut down this machine" dialog.
 

dwright1542

Active Member
Dec 26, 2015
371
73
28
49
Hi, Phennexion.

Starwind is rock-solid in terms of stability if set up properly. Furthermore, Hyper-V would be the best choice here since in that case Starwind is installed directly on physical servers instead of VMs providing you with more simple and straightforward configuration. The fact is that Microsoft fail-over cluster live/quick migration powered with starwind is sometimes so smooth that you might not even notice that one of the hosts went down.

We have a couple of hardware appliances from Starwind and are very happy with them. Not sure about performance issues under ESX since we are using Hyper-V and performance we have on VSAN is quite perfect but AFAIK it's almost the same on any hypervisor and corresponds directly to local storage performance.

S2D is a very promising technology but do not expect to get it in production in nearest future. What I saw in TP4 is quite buggy and as we all know everything that comes from Microsoft could be used in production at least after SP1 :) And their new licensing policy kinda suck since you have to pay much more to get all the new features. Anyways, still very excited about the fact they are moving in this direction.
I'm not sure how many IOPS you're trying to push, but they've confirmed in the previous message that any kind of VM stack, either ESX or HyperV is going to have issues with anything higher than (50k?) IOPS.
 

phennexion

New Member
Feb 17, 2016
10
0
1
37
Yeah, but there's not much else out there for the right price range to overcome that 50k-80k IOPs iSCSI limit problem? hrm..
 

Net-Runner

Member
Feb 25, 2016
83
24
8
39
If one of the hosts participating in a properly set up cluster fails/power offs/crashes/bsods - a quick migration occurs. Basically, it's not a migration in common sense, VM that was running on the failed host also goes down (obviously) and then starts on the other host that is alive and the downtime is less than a minute (depends on fail-over timeout and how fast the VM boots).

According to my internal tests starwind immediately synchronizes not only the storage itself but it's own cache also, so if the physical node fails - you still have all the writes and data safe on the other one, what i find quite awesome.

Another problem is that each hypervisor limits a single VM performance by design so a single VM is not capable to push out all the possible performance granted by underlying hardware/storage/vsan and so on. Since a hypervisor host is designed to run multiple VMs at the same time, this should not be a problem. Having 4 or 6 VMs limited by let's say 50k-80k IOPs each means you have 200-300k in total. Just split the roles/applications into several VMs instead of running everything in a single one (do some load balancing) and you should be fine :)
 

phennexion

New Member
Feb 17, 2016
10
0
1
37
Hmm... I agree net-runner, but does it really work that way? Doesn't the problem relate to iSCSI's design itself? would that mean it's 50k IOPs limit per iSCSI connection?
 

TuxDude

Well-Known Member
Sep 17, 2011
618
338
63
According to my internal tests starwind immediately synchronizes not only the storage itself but it's own cache also, so if the physical node fails - you still have all the writes and data safe on the other one, what i find quite awesome.
Yes - that is standard operating procedure for all of the HyperConverged options on the market. Though it's not really a cache sync, but instead is ensuring that every individual write operation is completed on multiple nodes before responding to the OS/application that the write is complete. That is the only way to guarantee data protection - without that level of redundancy you couldn't use it in production. How do you think an important financial database server would feel if even a single transaction was lost in a node failure?

That is also one of the downsides to hyperconverged IMHO - because all writes need to be mirrored across multiple hosts, your write latencies will always involve at least a couple network hops (from the host running the VM, to a second host for redundancy, and back), and possibly many more (eg. a Nutanix install on ESX/HyperV is host -> local CVM -> remote CVM -> local CVM -> host to get the write acknowledgement back), probably including passing through multiple TCP stacks, etc.

Hmm... I agree net-runner, but does it really work that way? Doesn't the problem relate to iSCSI's design itself? would that mean it's 50k IOPs limit per iSCSI connection?
There is no inherent limitation in iSCSI related to IOPS - there are plenty of million-dollor-plus iSCSI arrays out there pushing way more than 100K IOPS. A hyperconverged configuration just has a lot of layers and moving parts and they can all affect performance - whatever is limiting you is somewhere in all of that.
 

dwright1542

Active Member
Dec 26, 2015
371
73
28
49
The latency is kept low using DA SFP+ between nodes.

My end testing was with SW on a physical server (Haven't done Stormagic yet), with a Fusion IO card that could locally push 150k IOPS. From ESXi over 10G network the most I could squeeze out of it was 60k (from inside a MS guest on the same server, using the MS Iscsi initator I got 90k) and from a Windows server on that same network to the same target I could get 145k IOPS.

This isn't per target, this is total for the whole software initiator, no Round Robin. just single 10GBE.

All of my testing points to an ESXi software initiator limitation, which both Starwind and Stormagic have verified as well.
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
Yes - that is standard operating procedure for all of the HyperConverged options on the market. Though it's not really a cache sync, but instead is ensuring that every individual write operation is completed on multiple nodes before responding to the OS/application that the write is complete. That is the only way to guarantee data protection - without that level of redundancy you couldn't use it in production. How do you think an important financial database server would feel if even a single transaction was lost in a node failure?

That is also one of the downsides to hyperconverged IMHO - because all writes need to be mirrored across multiple hosts, your write latencies will always involve at least a couple network hops (from the host running the VM, to a second host for redundancy, and back), and possibly many more (eg. a Nutanix install on ESX/HyperV is host -> local CVM -> remote CVM -> local CVM -> host to get the write acknowledgement back), probably including passing through multiple TCP stacks, etc.



There is no inherent limitation in iSCSI related to IOPS - there are plenty of million-dollor-plus iSCSI arrays out there pushing way more than 100K IOPS. A hyperconverged configuration just has a lot of layers and moving parts and they can all affect performance - whatever is limiting you is somewhere in all of that.
Just to chime in i have been testing s2d on server 2016 tp4 over 2x40gb ethernet with 4 nodes and using both ssds and hdds in hybrid pools and nvme only pools and I have exceeded 50-90,000 iops by alot in in both cases. To clarify this hyper converged setup up doesn't use a vm to manage the storage but the native os handles the storage spaces direct storage pools.