HCI Servers Build... Starwind VSAN or VMWare vSAN?

Anton aus Tirol · Mar 9, 2016

How are you going to feed storage to your VMware cluster? Microsoft talks SMB3 and VMware talks iSCSI or NFS. It's possible to use Storage Spaces Direct for erasure coding and replication and put StarWind VSAN on top to expose VMware HCL-ed iSCSI/iSER but... It's kind of a train wreck

...and not very cheap one even assuming StarWind is free and Windows Datacenter licenses for S2D were found on the street.

cesmith9999 said:
Storage Spaces Direct
Introduction to Storage Spaces Direct (IDF 2015) (Channel 9)

Chris

phennexion · Mar 9, 2016

Storage would be fed to the VMware (VSAN) cluster via raid controllers with JBOD mode, but if I went Starwind VSAN, I would probably just go Hyper-V for the cost savings. The drive compatibility's a bit higher with windows so I wouldn't have to worry as much about HCLs with commodity drives.

Also looked into S2D a bit more, looks like it has a ways to go before I would want to run it in production... so that's out of the question.

AnVil · Mar 9, 2016

Hi all!
StarWind support is officially here

So, I just want to shed some more light on the StarWind Virtual SAN and Hyper-V/Vmware issues.
In case of VMware that is looks like something related to VMware virtual network stack. Right now we are trying to figure out how to workaround this issue. Once we will have it we will create the KB article, and will push on VMware so they had the similar one.
As about the Hyper-V, according to the MS support guy, the issue is definitely related to the virtualization stack, but for now they couldn`t identify what exactly is bottlenecking here. Once we will have workaround we will create the KB article. Obviously we will continue pushing MS until they will provide everyone with the fix instead of the workarounds.

markpower28 · Mar 9, 2016

IOmeter is one of the best tool out there pushing system to the limit.
iometer-Benchmark.icf

Marsh said:
@markpower28
Mind share the 4K IOMeter test cf file and configuration?
or any diskspd testing configuration?

Net-Runner · Mar 10, 2016

Hi, Phennexion.

Starwind is rock-solid in terms of stability if set up properly. Furthermore, Hyper-V would be the best choice here since in that case Starwind is installed directly on physical servers instead of VMs providing you with more simple and straightforward configuration. The fact is that Microsoft fail-over cluster live/quick migration powered with starwind is sometimes so smooth that you might not even notice that one of the hosts went down.

We have a couple of hardware appliances from Starwind and are very happy with them. Not sure about performance issues under ESX since we are using Hyper-V and performance we have on VSAN is quite perfect but AFAIK it's almost the same on any hypervisor and corresponds directly to local storage performance.

S2D is a very promising technology but do not expect to get it in production in nearest future. What I saw in TP4 is quite buggy and as we all know everything that comes from Microsoft could be used in production at least after SP1

And their new licensing policy kinda suck since you have to pay much more to get all the new features. Anyways, still very excited about the fact they are moving in this direction.

phennexion · Mar 10, 2016

Thanks for the heads up Net-Runner, I also noticed that Starwind VSAN has some measure of fault tolerance? I saw a video on youtube that a tech failed a node and it live migrated off it somehow... have you been able to reproduce this?

TuxDude · Mar 10, 2016

Net-Runner said:
The fact is that Microsoft fail-over cluster live/quick migration powered with starwind is sometimes so smooth that you might not even notice that one of the hosts went down.

Ugh - I've seen claims like this one so many times...

If a host goes down unexpectedly your VMs are going to have some downtime too - you and your users will notice. (Possible exception for VMware fault tolerance users, but we can ignore that for now.) All of the live migration technologies out there on all of the hypervisors that have them do NOT provide high availability to VMs - if a host crashes all of the VMs on that host went with it and it is way too late to migrate them.

If a host crashes your best case is that all of the VMs are automatically re-started on other hosts. But with the time it takes for something to realize that a crash occured, get the VMs assigned to other hosts, power them on, and wait for guest OS's and applications to boot, its probably not something that will go un-noticed unless you have a very low usage.

phennexion · Mar 10, 2016

Yeah I agree, total power failure with Starwind creates a "quick migration" from the looks of it, but a reboot of a node will allow live migration.

TuxDude · Mar 10, 2016

phennexion said:
Yeah I agree, total power failure with Starwind creates a "quick migration" from the looks of it, but a reboot of a node will allow live migration.

No... there is NO migration if there is a crash or power failure. Any hypervisor, any storage technology. Once the host goes down the contents in RAM are gone, there is nothing to migrate, its already too late. With HA the VMs may be automatically restarted on other hosts.

You do your migrations BEFORE the machine goes down, whether it is a planned reboot or not.

phennexion · Mar 10, 2016

From the videos i've seen, with starwind in a windows cluster, they say there is a quick migration if there is a power failure on host... which my guess is a reboot.

TuxDude · Mar 10, 2016

phennexion said:
From the videos i've seen, with starwind in a windows cluster, they say there is a quick migration if there is a power failure on host... which my guess is a reboot.

Your video was wrong - I'm pretty sure its still impossible for software to read the contents of RAM of a powered-off server*. I don't know how else I can word this - anything involving 'migrate', quick, live, or anything else, needs to move from the source to the destination. It can only happen before "bad stuff" happens to the source - after "bad stuff" has happened to the host there is no source to migrate from anymore.

*Note: theoretically it is possible to retrieve the contents of RAM after a system was shut down, but any methods of doing that that I've heard about involve cooling the RAM with liquid nitrogen or something else exotic to slow down the loss of data after power is removed.

dwright1542 · Mar 10, 2016

Yeah, I can confirm that either way, if you crash one of the boxes, it's basically a power on, after a "hard" power off. You'll get the "why did you shut down this machine" dialog.

dwright1542 · Mar 10, 2016

Net-Runner said:
Hi, Phennexion.

Starwind is rock-solid in terms of stability if set up properly. Furthermore, Hyper-V would be the best choice here since in that case Starwind is installed directly on physical servers instead of VMs providing you with more simple and straightforward configuration. The fact is that Microsoft fail-over cluster live/quick migration powered with starwind is sometimes so smooth that you might not even notice that one of the hosts went down.

We have a couple of hardware appliances from Starwind and are very happy with them. Not sure about performance issues under ESX since we are using Hyper-V and performance we have on VSAN is quite perfect but AFAIK it's almost the same on any hypervisor and corresponds directly to local storage performance.

S2D is a very promising technology but do not expect to get it in production in nearest future. What I saw in TP4 is quite buggy and as we all know everything that comes from Microsoft could be used in production at least after SP1 And their new licensing policy kinda suck since you have to pay much more to get all the new features. Anyways, still very excited about the fact they are moving in this direction.

I'm not sure how many IOPS you're trying to push, but they've confirmed in the previous message that any kind of VM stack, either ESX or HyperV is going to have issues with anything higher than (50k?) IOPS.

phennexion · Mar 11, 2016

Yeah, but there's not much else out there for the right price range to overcome that 50k-80k IOPs iSCSI limit problem? hrm..

Net-Runner · Mar 11, 2016

If one of the hosts participating in a properly set up cluster fails/power offs/crashes/bsods - a quick migration occurs. Basically, it's not a migration in common sense, VM that was running on the failed host also goes down (obviously) and then starts on the other host that is alive and the downtime is less than a minute (depends on fail-over timeout and how fast the VM boots).

According to my internal tests starwind immediately synchronizes not only the storage itself but it's own cache also, so if the physical node fails - you still have all the writes and data safe on the other one, what i find quite awesome.

Another problem is that each hypervisor limits a single VM performance by design so a single VM is not capable to push out all the possible performance granted by underlying hardware/storage/vsan and so on. Since a hypervisor host is designed to run multiple VMs at the same time, this should not be a problem. Having 4 or 6 VMs limited by let's say 50k-80k IOPs each means you have 200-300k in total. Just split the roles/applications into several VMs instead of running everything in a single one (do some load balancing) and you should be fine

Hank C · Mar 11, 2016

does the starwind free vsan have any limitation on cache?

phennexion · Mar 11, 2016

Hmm... I agree net-runner, but does it really work that way? Doesn't the problem relate to iSCSI's design itself? would that mean it's 50k IOPs limit per iSCSI connection?

TuxDude · Mar 11, 2016

Net-Runner said:
According to my internal tests starwind immediately synchronizes not only the storage itself but it's own cache also, so if the physical node fails - you still have all the writes and data safe on the other one, what i find quite awesome.

Yes - that is standard operating procedure for all of the HyperConverged options on the market. Though it's not really a cache sync, but instead is ensuring that every individual write operation is completed on multiple nodes before responding to the OS/application that the write is complete. That is the only way to guarantee data protection - without that level of redundancy you couldn't use it in production. How do you think an important financial database server would feel if even a single transaction was lost in a node failure?

That is also one of the downsides to hyperconverged IMHO - because all writes need to be mirrored across multiple hosts, your write latencies will always involve at least a couple network hops (from the host running the VM, to a second host for redundancy, and back), and possibly many more (eg. a Nutanix install on ESX/HyperV is host -> local CVM -> remote CVM -> local CVM -> host to get the write acknowledgement back), probably including passing through multiple TCP stacks, etc.

phennexion said:
Hmm... I agree net-runner, but does it really work that way? Doesn't the problem relate to iSCSI's design itself? would that mean it's 50k IOPs limit per iSCSI connection?

There is no inherent limitation in iSCSI related to IOPS - there are plenty of million-dollor-plus iSCSI arrays out there pushing way more than 100K IOPS. A hyperconverged configuration just has a lot of layers and moving parts and they can all affect performance - whatever is limiting you is somewhere in all of that.

dwright1542 · Mar 11, 2016

The latency is kept low using DA SFP+ between nodes.

My end testing was with SW on a physical server (Haven't done Stormagic yet), with a Fusion IO card that could locally push 150k IOPS. From ESXi over 10G network the most I could squeeze out of it was 60k (from inside a MS guest on the same server, using the MS Iscsi initator I got 90k) and from a Windows server on that same network to the same target I could get 145k IOPS.

This isn't per target, this is total for the whole software initiator, no Round Robin. just single 10GBE.

All of my testing points to an ESXi software initiator limitation, which both Starwind and Stormagic have verified as well.

Chuntzu · Mar 11, 2016

TuxDude said:
Yes - that is standard operating procedure for all of the HyperConverged options on the market. Though it's not really a cache sync, but instead is ensuring that every individual write operation is completed on multiple nodes before responding to the OS/application that the write is complete. That is the only way to guarantee data protection - without that level of redundancy you couldn't use it in production. How do you think an important financial database server would feel if even a single transaction was lost in a node failure?

That is also one of the downsides to hyperconverged IMHO - because all writes need to be mirrored across multiple hosts, your write latencies will always involve at least a couple network hops (from the host running the VM, to a second host for redundancy, and back), and possibly many more (eg. a Nutanix install on ESX/HyperV is host -> local CVM -> remote CVM -> local CVM -> host to get the write acknowledgement back), probably including passing through multiple TCP stacks, etc.

There is no inherent limitation in iSCSI related to IOPS - there are plenty of million-dollor-plus iSCSI arrays out there pushing way more than 100K IOPS. A hyperconverged configuration just has a lot of layers and moving parts and they can all affect performance - whatever is limiting you is somewhere in all of that.

Just to chime in i have been testing s2d on server 2016 tp4 over 2x40gb ethernet with 4 nodes and using both ssds and hdds in hybrid pools and nvme only pools and I have exceeded 50-90,000 iops by alot in in both cases. To clarify this hyper converged setup up doesn't use a vm to manage the storage but the native os handles the storage spaces direct storage pools.

HCI Servers Build... Starwind VSAN or VMWare vSAN?

New Member

New Member

New Member

Active Member

Member

New Member

Well-Known Member

New Member

Well-Known Member

New Member

Well-Known Member

Active Member

Active Member

New Member

Member

Active Member

New Member

Well-Known Member

Active Member

Active Member