HCI Servers Build... Starwind VSAN or VMWare vSAN?

phennexion

New Member
Feb 17, 2016
10
0
1
35
Hey Everyone,

I'm hopefully going to build a few HCI node servers, with this chassis:
Supermicro | Products | SuperServers | 2U | 6028UX-TR4

I'd like to do 3 nodes, hyperconverged with a VSAN software.

2 E5-2680v3
128gb DDR4
2 in RAID 1(Storage Spaces Mirror) Samsung 950 Pro 512gb M.2 w/ PCIe 3.0 x4 DT-120 (specs show 400TBW)
6 in RAID10 Hitachi 7K4000 3TB 3.5" 7200RPM SATA w/ 64GB Cache
6 in RAID10 Samsung 850 Pro Series 1TB 2.5" SATA Solid State
C612 chipset

I said RAID1 and 10 because Starwind lays on top of a servers RAID whereas VMWareVSAN just does JBOD style

I'm having a hard time deciding if i want to use Starwind VSAN and VMWare. I work in education and we get Microsoft OS's at a huge discount, and I've played with TP4 of Server 2016 and it doesn't look too bad. VMWare I feel is far superior, but i'd have to pay for a truckload of other VMware software at the same time...

First question: Any experience with Starwind VSAN? Seems to be pretty awesome from everything I've researched on them, for the price too.

Second Question: See any caveats with the drives? My gb/day writes wont be that high on the 750s to worry about write-wearing prematurely.

Third Question: Anyone got experience using the 950 SSDs in a production environment and have any reason I should spent an extra 1500$cdn per SSD to get the P3700? Like I said my writes per day on it will not be very high, but would like to use them to run our local DBs.

Thanks!!
 
Last edited:

markpower28

Active Member
Apr 9, 2013
409
103
43
For hyperconvergence solution, there is no such thing as RAID setup per node anymore. It's more like all the local disks will become a bigger pool of disks that will be served across multiple nodes. So if you have 3 nodes, each node with 2 x Intel 750, 6 x 3TB SATA and 6 x SSD. It will be 2 disk groups in vSAN with 1 x 750 + 6 x 3TB and one 1 x 750 + 6 x SSD per node.

First question: Any experience with Starwind VSAN? Seems to be pretty awesome from everything I've researched on them, for the price too.
Starwind vSAN is a great solution but it's running on top of Windows (keep that in mind), personally I will do VMware vSAN or Microsoft Storage Space Direct (4 nodes requirement)

Second Question: See any caveats with the drives? My gb/day writes wont be that high on the 750s to worry about write-wearing prematurely.
This depends on workload. For lab 750 should be more than enough, for production I will use at least 3600.

Third Question: Anyone got experience using the 750 SSDs in a production environment and have any reason I should spent an extra 1500$cdn per SSD to get the P3700? Like I said my writes per day on it will not be very high, but would like to use them to run our local DBs.
This is one of the design questions for hyperconvergence solution. If data is across multiple nodes, then it will be ok if one node fail. But it will always recommended to use enterprise SSD for production.
 
Last edited:
  • Like
Reactions: Patrick

Net-Runner

Member
Feb 25, 2016
84
24
8
37
Hi, Phennexion.

"We have been using Starwind vSAN software solution over a long time already and successfully virtualized our branches with Starwind hardware appliances half a year ago. What made me very happy is that implementation of each appliance was done in literally less than hour without downtime at all. And we did not touch them ever since the implementation so I am quite confident in this product based on my experience."

https://www.reddit.com/r/sysadmin/comments/46gf6k/hci_servers_build_starwind_vsan_or_vmware_vsan
 

phennexion

New Member
Feb 17, 2016
10
0
1
35
Thanks for the replies!!

How do you like Starwind in terms of stability? any problems? and for the hypervisor you use, do you connect via iSCSI to the local LUNs? Any performance issue?
 

dwright1542

Active Member
Dec 26, 2015
364
70
28
47
Thanks for the replies!!

How do you like Starwind in terms of stability? any problems? and for the hypervisor you use, do you connect via iSCSI to the local LUNs? Any performance issue?

Be EXTREMELY careful with this in an ESXi environment with high IOPS. We've been fighting a battle since September with both SW and Stormagic, where the higher IOPS PCIE cards just don't perform at all. We've got FusionIO cards that will put out 200k locally, but only 60k in an ESXi environment.

SVsan will only do 50k IOPS and then the single CPU maxes out.
Starwind can do better, but there seems to be major issues with the ESXi software target. I get 60-70k in ESXi using the software iSCSI, but I can get 90-110k if I pass it thru to the GUEST VM and use the MS iSCSI.

Crazy hunh? We've also reproduced this on AMD / HP and Intel / Dell servers. Doesn't matter.

Just today, Starwind confirmed that they have finally reproduced the problem and are working with VMware to correct.

Also be aware that L2 write caching to SSD has been disabled since the summer.
 
  • Like
Reactions: phennexion

markpower28

Active Member
Apr 9, 2013
409
103
43
dwright:

That's very interesting feedback. Mind share your hardware configuration?

In general, during IO intense operations, CPU does become the bottleneck of the system because of the random IO. What kind of apps are you running in your environment?

Mark
 

dwright1542

Active Member
Dec 26, 2015
364
70
28
47
dwright:

That's very interesting feedback. Mind share your hardware configuration?

In general, during IO intense operations, CPU does become the bottleneck of the system because of the random IO. What kind of apps are you running in your environment?

Mark
DL385 G8, 256GB RAM, P420i, 8x10k 600GB, Intel X520-DA2 adapters.

-0r-

C2100, 144GB RAM, 9260-8i, 12x10k 600GB, Intel X520-DA2

No apps yet, it never ran right to put anything on yet. This is all IOMETER testing, measuring actual VS tested throughput.

CPU is definitely NOT the bottleneck with SW. It is however with Stormagic.
 

markpower28

Active Member
Apr 9, 2013
409
103
43
Is RAID being done at the 9260 or P420 level? If that's the case, the bottleneck actually is at RAID controller card.

IOmeter 4K testing will put any storage to it's knee (looking at the CPU utilization), in general I do see better IO numbers from HBA vs. RAID controller.
 

dwright1542

Active Member
Dec 26, 2015
364
70
28
47
Is RAID being done at the 9260 or P420 level? If that's the case, the bottleneck actually is at RAID controller card.

IOmeter 4K testing will put any storage to it's knee (looking at the CPU utilization), in general I do see better IO numbers from HBA vs. RAID controller.
It's being done on the FusionIO level.
 

dwright1542

Active Member
Dec 26, 2015
364
70
28
47
So here's an update. Starwind and Stormagic have now both officially said that 50-80k is about the max IOPS that their virtualized storage can support. Seems as though we're the first company to try and push something over that in a VSA environment with either software.
However, there are different reasons:

Stormagic is a single threaded CPU VM. We hit 50k IOPS, and then our CPU's get pegged. Apparently they have some faster CPU's, and their testing can hit 70k or so.

Starwind isn't a CPU problem. Seems an actual iSCSI limit. They don't have a good reason yet, but are working with both MS and VMware on nailing down the problem.
 

Chuntzu

Active Member
Jun 30, 2013
383
98
28
I will after I get the servers setup.

Chris
I have a four node s2d cluster set up with two intel 750s per node and running mirrored and get 50,000 to 100,000 write iops per node to the pool. So between 200,000 and 400,000 write iops with some cheap nvme drives. When using reads I am hitting 400,000 to 700,000 read iops per node. I actually hit the same read iops in my hybrid ssd/hdd array due to ram caching while used with storage bus cache and s2d. Write speeds on mirrored and parity hybrid array with 4x 400gb s3700 and 8x 6tb hdds are rough 50-100,000 iops. The sequential read speeds are absurd, like 24 gigabytes per second per node again thanks to storage bus caching on both nvme and hybrid array. Truth is refs is really pissing me off because as soon as I go over 16 threads ie start utilizing both e5-2670s iops cap out at 350,000 iops.

I am impressed with how well s2d works even compared to scaleio. I was able to hit 9 gigabytes per second reads from scale io on and all nvme pool. And something like 7 gigabytes per second writes. Iops weren't very fantastic, but I would have to reset up and run again because I don't remember those numbers off the top of my head.

I am writing up a main site post but due to my terrible work schedule right now it is very slow going. Hope this is useful
 

phennexion

New Member
Feb 17, 2016
10
0
1
35
That's very interesting Chuntzu! Thanks for all the great input guys, much appreciated.

I read somewhere s2d will be roughly 6500$/node per licensing though... and min 4 nodes is costly...

Also an update for this build... the Samsung 950's have almost double the rated TBW for 512gbs of M.2, just need a DT-120 by lycom to adapt them, so thinking about going that route.
 

Evan

Well-Known Member
Jan 6, 2016
3,191
538
113
Yes the cost per node for S2D is over $5k. Each node needs a datacenter license.
Shame really, it's not like standard is cheap and this may limit MS products being used for storage.
 

phennexion

New Member
Feb 17, 2016
10
0
1
35
Ahh, but we are education in canada, and we get DC licences for 45$ a pop lol... so that's actually really interesting now. price i could save on HC software could buy me that 4th node.... hmmm!