VMware vSAN Performance :-(

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Yves

Member
Apr 4, 2017
65
15
8
38
Hi there,

After successfully building my SDS Cluster I tinkerd around for about 3 days configurating everything how I wanted (jbod / raid0 mode, updating all the fw's and bioses, checking all the vibs for all the components etc.) today I finally was had all 3 nodes up and running esxi 6.5u1... made some inital baseline benchmarks on all of the systems.

This is a Win2k16 DC with pvscsi on the datastore (vmfs6) located on the Intel Optane 900p


This is the same Win2k16 DC with pvscsi on the vsan Datastore located arcoss 3 nodes with 3x 900p cache tiers and 6x intel dc s4600....


Write is HORRIBLE.... I can't understand what is going wrong here... everything is passed and everything is green in the vsan menu... and yes I know its not HCIBench but sorry.... 148 writes... on 512b? seriously? An old qnap ts-2xx over iscsi is faster...

Any ideas?

Thanks guys... I am a bit frustrated and tierd from all the tinkering around... with this endresult...

Ah btw. this is the storage the vsan cluster should replace.... (my good old TS-EC1279U-RP with 2x 10GB SFP+ iSCSI Uplinks - which seam to beat the shit out of the vsan...):
 
Last edited:

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
I would never expect that bad and I am sure something can’t be right but certianly don’t expect a 3 disk group vsan system to run amazingly even with Optane cache drives, it’s just not build like that, it doesn’t do data locality etc, it’s designed for equal performance for all VM’s. Having said this all flash config is ‘ok’
 

Dean

Member
Jun 18, 2015
116
10
18
48
Poop... I am literally about to do the same, 3 node setup, but with some 12gb sas drives and ssd for cache. Not giving me the warm and fuzzies.

Sent from my Moto Z (2) using Tapatalk
 

hlhjedsfg

Member
Feb 2, 2018
38
8
8
34
I recently build a 3 nodes SDS with Proxmox + Ceph (2 x DS S3500 for OSD + 1 DC S3700 for logs runing 10Gb/s direct attachment by node), and I was about 7000 IOPS in 4k random on IOMETTER, don't do a ATTO. Was able to run 72 windows 2016/10 VM for my student, and all was pretty smooth. Your number seems pretty low to me in 4k sequantial (14Mo/s). I will give a try with ATTO and after will try with VMware software.
 

Yves

Member
Apr 4, 2017
65
15
8
38
@Evan Thanks for your reply. I am as surprised as you are...

@Dean yaaappp! that's how I feel right now... after spending quite a bit on my homelab sds...

I know we are not talking dual P4800X cache tier with P4600 as capacity drives here... and 40GB interconnects but.... 148... and the rest... that can't be right... but to be honest... that's exactly also my biggest fear of vmware vsan... not that it performs not how I want... that I can not explain why it is not performing how I want... even after watching all the vSAN Network Architecture: vSAN Architecture Series youtube videos where elver explains the complete vsan (very good) you still have almost a black box where you save all your data... and if I read threads like "Garbage vSAN performance from an all-flash HCL-approved rocketship" or my "My All Flash VSAN Nightmare" on reddit you are getting a wierd feeling in your stomach...

I will start posting some pictures of my vsan software settings... maybe something sticks out...
 

Yves

Member
Apr 4, 2017
65
15
8
38
This is the policy I created for benchmarking its not even raid1 its only on the one system... I know I could change number of disk stripes per object to 2 to make is use both capacity disks...


According to the vSAN health everything is in order


Also a pretty common issue is the queue depth of the raid controller (hba0) which in my case is good enough


This is the result I get (right now the drives are in RAID0 mode as it should be according to the HCL... before they where in JBOD mode)
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
VSAN is a steaming pile so these results don't really surprise me. Try turning on EC, dedupe, and checksums!
I think he means turn it off;) Especially checksums is known to have significant impact.

I have had somewhat better results with similar hardware (40GBe, Optane + Intel 750s capactiy), also tried P3700's.
It should definitely better than what you have now, but don't expect miracles. I have similar complain/pleas for help threads in this forum.

Lessons learned until know
-vSan is designed to give consistent performance to many concurrent users; it is *not* good at providing great performance for few users.
-vSan does not make great use of NVMe (but no system I have tested yet does)
-vSan speed is governed only by Cache drives x Disk Groups. If you need more speed and you have fast cache already you need to add more disk groups that are written to concurrently. For a time I ran 6 P3700's and 6 750's in 6 Disk groups using 2 concurrent writes - better but still far from good. I am still contemplating running 12 disk groups (1 S3700 cache, 1 S3700 capacity) to see what that will give me...
-vSan is great for ease of implementation - integration into ESX is really simple, its fairly easy to set up and to maintain, but its not made for speed with small deployments.
 
  • Like
Reactions: Yves
Dec 30, 2016
106
24
18
44
Look at all the things a person has to consider with VSAN:

- Cache drives
- Disk Groups
- Drive types
- Controller types
- VSAN HCL
- FTT
- Erasure Coding
- Dedupe
- Compression
- Encryption
- Checksums
- on and on and on

How is this simple? Why not just get a simple shared storage array and not worry about all this complexity and time waste that ultimately doesn't perform very well anyway.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
It totally depends on the use case;)
If you are happy running 2 extra nodes to your compute cluster thats perfectly fine.

I'd totally love a well performing hyperconverged 2 or 3 node cluster. Just haven't found one yet:p

Whats your take on a 2 node full synced HA capable shared storage system with low power utilization that doesn't cost an arm and a leg? I have not found one yet unfortunately.
 

Yves

Member
Apr 4, 2017
65
15
8
38
@Rand_ Thanks a lot for your very detailed response. Its exactly what I feared most. That I have to abandon this idea of a vSAN cluster... since it performance is way of compared to my NAS this vSAN cluster actually should replace... Only strange thing is that I really really think it should perform better. But I dont need vSAN... I have a compute Cluster on a BC C7000 but I thoughtlets give vSAN a spin... But I guess this is why u wont find benchmark results about vSAN.

Does anyone have another High Performance solution I have 3 good servers who can run anything u throw at them... Nutanix / RH Ceph Cluster / etc...?
 

Yves

Member
Apr 4, 2017
65
15
8
38
Look at all the things a person has to consider with VSAN:

- Cache drives
- Disk Groups
- Drive types
- Controller types
- VSAN HCL
- FTT
- Erasure Coding
- Dedupe
- Compression
- Encryption
- Checksums
- on and on and on

How is this simple? Why not just get a simple shared storage array and not worry about all this complexity and time waste that ultimately doesn't perform very well anyway.
Totally agree... And I really thought I was thinking about everything... But does not seam like that...


Sent from my mobile
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
I still think its not too bad what you have to think about, most of these things have a preconfigured setting and are just options. If you have a RAID Box you have options too;)

And no I dont know of a High Perf solution with 3 boxes yet - Scale IO was the closest I found until now but thats dead (free version). Have not tried Nutanix due to their calling home functionality. Compuverde has that too but they at least tell you what they are sending.

Thats why I wondered what @child of wonder suggests as option :)
 

Yves

Member
Apr 4, 2017
65
15
8
38
I still think its not too bad what you have to think about, most of these things have a preconfigured setting and are just options. If you have a RAID Box you have options too;)

And no I dont know of a High Perf solution with 3 boxes yet - Scale IO was the closest I found until now but thats dead (free version). Have not tried Nutanix due to their calling home functionality. Compuverde has that too but they at least tell you what they are sending.

Thats why I wondered what @child of wonder suggests as option :)
True true... But I still think numbers are way off. If I look at 1 Node tests from Florian Gehl at virten.com or the VMworld Hackaton Setup of William Lam... Still trying to find the bug...

Alternatives... What about Starwinds vSAN / SmartX Halo or Redhat Ceph Cluster?
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
Ceph is same as vSAN for small deployments.
Starwind or other 1-node and replicate solutions probably do best in a 2 or 3 node situation.

MS storage replica another option except for license costs unless your already enterprise licensing for the VM’s anyway.

I pretty much just gave up on any workable and cheap/free solution in a 2/3 node config that I actually likes to use and play with.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Starwind was next on my list but didnt have Raid Cards... might go for all NVMe instead, but depending on write strategy many drives might be of benefit (thats why ScaleIO was good, it distributed writes to a lot of drives).
Wanted to wait for feedback on Compuverde before I went on and dismantle the test systems.

MS was an option but I read bad things about 2 node cluster with it so I am not sure its really an option

And yes your numbers are off, check my other thread for comparison values (fio&cdm usually)
 

Yves

Member
Apr 4, 2017
65
15
8
38
Starwind was next on my list but didnt have Raid Cards... might go for all NVMe instead, but depending on write strategy many drives might be of benefit (thats why ScaleIO was good, it distributed writes to a lot of drives).
Wanted to wait for feedback on Compuverde before I went on and dismantle the test systems.

MS was an option but I read bad things about 2 node cluster with it so I am not sure its really an option

And yes your numbers are off, check my other thread for comparison values (fio&cdm usually)
Did Dell/EMC not „create“ a new free programm called Dell EMC ECS Community Edition? Or am I mistaken?
 

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
Did Dell/EMC not „create“ a new free programm called Dell EMC ECS Community Edition? Or am I mistaken?
ECS is not scaleIO though... atleast not as I know but would need to read up on that.
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Yes when I said MS I meant S2D; replication was only possible to a storage server i think. On top I would have needed to lay an iscsi layer to go back to ESX, sounded like quite a hassle.

Have not heard about ECS, let me read up.

Hm, the fancy stuff seems to be in Cloud Array which is not freely available :/
 
Last edited: