SSD performance - issues again

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
hmm..
Your values are too bad. Tuning can help but not for these values

I have just made a Crystal benchmark on the machine I have access to,
Similar RAM, ESXi 6.5u1, OmniOS 151024 with vmxnet3 and sync=standard, Windows Server 2012 (test c:), one 3way mirror from 3 Intel SSDs DC-S3610-800

The DC S3610 is the cheaper brother of the DC S3700, nearly as fast with less overprovisioning.

Write: 246 MB/s
Read: 350 MB/s
 

Rand__

Well-Known Member
Mar 6, 2014
4,491
877
113
Can a corrupt primary disk label cause this?

Else I am looking at my network atm; testing with a FreeNas Filer with optane and different ESX hosts. Maybe my experiments with PFC (and flow control on non PFC capable switches) broke something...
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
As local disk values are far higher and as traffic is internal ESXi <-> OmniOS NFS, neither disks nor external networking can affect this, only ESXi settings and OmniOS networking/NFS or system related items can be the reason.
 

Rand__

Well-Known Member
Mar 6, 2014
4,491
877
113
Normally I'd totally agree, but its weird.
Ran tests on Box A (NappIt Host) with NappIt hosted VM and FreeNas Hosted VM - both slow
On Box B (FreeNas Host) the FreeNas hosted VM was fine and the NappIt Hosted VM was also bad
(Its all the same VM just moved around).

Maybe its ESX settings, maybe its network - but not sure anymore its NappIt;)
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
Very strange.
As you have maybe done a lot of things in the meantime, I would do a clean reinstall of ESXi (propably 6.7) and OmniOS or FreeNAS, optionally on both boxes.
 

Rand__

Well-Known Member
Mar 6, 2014
4,491
877
113
Alright, took me a while and I actually forgot to do the CDM values in between, but here it goes
Basically there is no big difference between the different ESX versions for this scenario (SSD + Optane slog).
Didn't have the time to pursue other interesting questions ( impact on NVMe)

6.5 GA
upload_2018-5-19_15-29-10.png

6.5 U1 (latest 2017 build)
upload_2018-5-19_15-29-30.png

6.5 U2
upload_2018-5-19_15-29-45.png

6.7
upload_2018-5-19_15-47-3.png

At least the fresh install seems to have helped with CDM values ...
upload_2018-5-19_15-54-4.png

So lessons learnt
-Write performance does not scale up when adding devices (or only up to a certain level) -> No advantage to get many small drives over a few bigger ones
-Write performance on SSDs can benefit from Optane regardless of number of drives -> better get a few big drives + slog



But the current perf is still not what I was looking for (see vSan/Scale IO values from earlier experiments as comparison) (single optane + Intel 750 on vsan and maybe an additional S3700 on scale IO on the right)

upload_2018-5-19_15-51-17.png

I think I will revisit ScaleIO with 3x8 or 4x6 setup now that we found that its not gone for good.
Also will check on an all Optane Box (got 4 480's now) and a SSD pool with a bunch of S3610's I got with Optane slog.

@gea - Thanks a lot for your help !!!
 

Rand__

Well-Known Member
Mar 6, 2014
4,491
877
113
Update - had fun with the optane's today...

6.5 GA (4 Optane 480's => 4x 400GB files eager zeroed passed through to Napp-IT VM)
upload_2018-5-20_17-55-48.png

CDM
upload_2018-5-20_17-56-8.png

CDM Native on Optane as reference

upload_2018-5-20_17-56-28.png

Moving VM from local Optane to Napp-It (nfs attached)
upload_2018-5-20_17-57-6.png


6.7 GA

upload_2018-5-20_17-57-42.png
CDM
upload_2018-5-20_17-57-49.png


Windows on native Optane
upload_2018-5-20_17-58-0.png



Lessons learnt
Optane and large SSD pools are limited severely by ZFS - o/c you get amazing features as payback but you pay dearly in performance for it
Optane has enough reserves so that the potential impcat of Spectre/Meltdown is not visible (maybe also compensated by newer nvme drivers)
 

gea

Well-Known Member
Dec 31, 2010
2,485
837
113
DE
sequential value from Windows (CD) to a directly attached Optane (asume Windows ntfs)
read: 2692 MB/s
write: 2268 MB/s (async)

is this a single Optane or a Raid-10?

sequential value form OmniOS (Filebench) to a per ESXi vdisk attached Optane (ZFS Raid-10)
read: 2800 MB/s
write: 2064 MB/s (async)

btw: vdisks must be slower than native barebone OmniOS to Optane

at least: quite similar values.
to compare ntfs vs ZFS you must compare same disks on a barebone setup (or ReFs on Windows with data checksums enabled for quite similar basic features)

In a virtualized environment you must additionally care about
- ESXi limitations
- virtual nic limitations

- OmniOS network limitations
- NFS limitations

- Windows VM limitations
 

Rand__

Well-Known Member
Mar 6, 2014
4,491
877
113
Testsetup was as follows:

Windows on native Optane - Windows VM was residing on one of the 4 480s and then run was on local disk (yes ntfs), also not peak performance since majority of space in use.

All Napp-It tests were with a 4x400GB virtual disk passed through to Napp-It VM
Windows on Napp-It was residing on a datastore exported via nfs but local on the same box (exported from Napp-It o/c)


In the end I am content with the 4 Optanes ;) but its obvious that with the factors you mentioned (which ever specifically it might be) performance (from Napp-It via ESX) is limited at around 2.5 GB/s (best case, which is not bad o/c) but still about what one could get from a single nvme drive locally attached.


sequential value form OmniOS (Filebench) to a per ESXi vdisk attached Optane (ZFS Raid-10)
read: 2800 MB/s
write: 2064 MB/s (async)

upload_2018-5-21_0-25-19.png

Thats exactly the point - that one is a Single Raid 10 ZFS Raid (6.5GA), it has the same Filebench values as the Dual Raid10

At some point it just does not scale up any more, we have seen that with the SSDs and we see it with the Optane.
Yes SyncWrites get faster, but thats still far from the 2.5 - 3GB/s limit.