Ceph Benchmark request /NVME

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

velocity08

New Member
Nov 30, 2019
2
0
1
well my tests trying to satisfy my (low qd, high iops) requirements have failed miserably with everything I have thrown at it.
If you have a recommendation how to satisfy 10G (for starters) at qd1/t1, 64K (esxi nfs -> sync) while maintaining at least a 2 node HA setup then please share :)
Hey Rand

If you're willing to give up VMware and move over to Proxmox you can use DRBD in an HA pair which should deliver the performance you're looking for.

It wouldn't hurt to test ;)

you can have a single NVME or layer LVM over multiple NVME drives to create a larger partition, since the replication is 1:1 in real time you'll benefit from the same native performance of local disks without the latency introduced by Ceph, vSan, any other object storage that then layers block storage on top.

Using RAW disk for the VM instead of QEMU file will also deliver better performance than a VMDK.

Theres always a copy of your data on another host ready for failover, can have 2 or more hosts it's really up to you.

It's a very simple and elegant solution that will deliver maximum return on NVME performance.

No Raid overhead either as you can't really natively raid an NVME drive on enterprise hardware, there is the Intel Raid 1 mirror but that's limited to a single mirror and why bother if you have a realtime replica on another host. (but if you want that little additional protection go the Raid 1 mirror if you have the right controller)

Could also use software raid to create 0,1,5,10 etc if you like, i would test with 1 drive replication and then you can always extend the raid or LVM.

Since the block layer isn't impeded by object storage layer 1st and doesn't need to be changed to iSCSI or connect via network as NFS or other network protocol to be pushed out to another host you cut a lot of latency and layers.

DRBD (Linbit), Linstore have a plugin for Proxmox, Proxmox is really well designed and based KVM virtualisation (which i believe has a lot less bloat than VMware) it's free to use and if you wish to support Proxmox can subscribe to one of their support tiers.

Putting all of this to the side Proxmox also offer ZFS natively and have done some tuning with NVME but it may still need further tuning, so you may still run into the same issues you are seeing now with NVME on Freenas at some stage.

give it a try :)

Open-source virtualization management platform Proxmox VE

How to setup LINSTOR on Proxmox VE

""Cheers
G
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Heya @velocity08, welcome to StH :)

Moving away from VMWare at this point is difficult since I use it for my VDI environment (Horizon). I really got used to a silent desk, regardless of whether I browse the web or play a game - now even the quiet fan of my emergency physical access station is irritating me;) [Oc I could get used to it again, but I have not given up yet].

I also don't have the best experiences with drbd; we had it running at work for some appliances and sooner or later there always were issues (not necessarily triggered by drbd, but it was affected quite often).

So at this time probably not, but thanks for the recommendation :)
 
Last edited:
  • Like
Reactions: velocity08

velocity08

New Member
Nov 30, 2019
2
0
1
Heya @velocity08, welcome to StH :)

Moving away from VMWare at this point is difficult since I use it for my VDI environment (Horizon). I really got used to a silent desk, regardless of whether I browse the web or play a game - now even the quite fan of my emergency physical access station is irritating me;) [Oc I could get used to it again, but I have not given up yet].

I also don't have the best experiences with drbd; we had it running at work for some appliances and sooner or later there always were issues (not necessarily triggered by drbd, but it was affected quite often).

So at this time probably not, but thanks for the recommendation :)

Hey @Rand_ i think your best option is the simplest, if ultimate performance is what your chasing then local NVME storage will win hands down every time.

Have a cold spare host to flick over to in case of emergency and have a solid backup policy ;)

Im not sure what vSan experience you've had but from what I've seen Mirrors are going to be better than Erasure code for performance, in saying this it will really come down to the workload.

little files slow and draining, larger files better.

We use 3Par extensively on our Cloud Deployments and it's been a solid performer, it's really built for random reads and rights due to its chunking methodology, have you looked at 3Par's before?

I'll have a bit more of a browse through the other posts to see if i can glean more information, haven't had time yet but will slot it in as the thread is interesting to read.

Otherwise maybe hold out for ZFS NVME to be tuned and this may be your silver bullet.

Maybe post something on the ZoL group to see if they have any movement on the NVME tuning?

I'd be very interested in the discussion as we love ZFS, at this stage it's performance for our use case is satisfactory as it's just going to be a backup target for Veeam backups and replication off site between DC's so even maxing out at 1GBps is fine for our current requirements.

It's hard to go past the data scrubbing and verification features built in to ZFS for data safety.

looking forward to your updates :)

""Cheers
G
 

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
Nah never really looked into any of the professional offerings since
  1. They almost never specify low qd/jobs benchmarks (so hard to guess what they can do for my use case), especially since my capacity requirements are rather small (max 10gig)
    So happy to see a qd1/j1 fio run if you can manage ;)
  2. Are usually horribly expensive for high performance setups
    1. Initial acquisition cost
    2. licenses to run advanced features
    3. expansion options are not really in the 'Great Deal' category
Local NVME (or actual HW Raid) is always an option, but the HA part is missing then....
Hm, one option I had not considered before would be to try to go stateless on the client VMs (i.e. move all data to a more secure/less performant remote storage and have the actual VM be inter-exchangeable)

Re ZFS NVME - the latest post in the discussion you linked in the FN forum actually was not that old and they claimed progress, so i expect to see something in like q1/20 ;) In the end I have been trying to get this to run since 2017, so a couple of months won't matter. I stopped tinkering on the main ESXi environment so I can wait;)
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
@velocity08 ever played around with BeeGFS?
Doesn't look half bad even with low procces counts https://www.beegfs.io/docs/whitepapers/Huawei_Systems_ZFS_report_ThinkParQ.pdf

O/c thats streaming, but only a few drives on the ssd system...
The interesting part is that its a parallel filesystem (so if i get it correctly it should distribute write slices to all storage nodes (similar to what scaleIO claimed)) and it can use IB, any source fs (inc zfs) and re-export as nfs

Also see https://www.beegfs.io/docs/whitepap...argets_per_Server_for_BeeGFS_by_ThinkParQ.pdf
They play with numtargets in the end which is "the desired number of storage targets for each file".

Now given that zfs is currently performance limited to small amount of drives (1-3 mirrors in my tests) adding BeeGFS with numtargets on top might actually alleviate the zfs scaling issue (pure theory at this point o/c)
This actually might work on a single box (since its beefy enough)... 6 targets with zfs mirrors, numtargets=6 and off I go...
should be easy enough to test

Only downside is I need 6 slog slices so probably gotta fall back to optane
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
6,626
1,767
113
So initial tests have not been too encouraging ... not sure if I did sth wrong, but I didn't see a speedup on multiple targets as hoped...