ZFS pool with 4x SSD

JimPhreak · Feb 25, 2016

I'm looking to create a ZFS pool (first time using ZFS) with my 4 x Intel 730 480GB SSDs to store my VMs and my Docker containers/appdata. What kind of performance can I expect for a 4 disk (2 mirrored vdevs) pool given some tweaking? The connection between this server and my VM server will be 10gig so my network won't be a bottleneck.

rubylaser · Feb 25, 2016

Do you mean on sequential transfers or as VM storage? Also, will you leave sync settings at their defaults?

JimPhreak · Feb 25, 2016

rubylaser said:
Do you mean on sequential transfers or as VM storage? Also, will you leave sync settings at their defaults?

I don't think I'll be doing sequential transfers very often as this pool will be strictly for VM storage and for my Docker containers/appdata.

What are the advantages/disadvantages of configuring my sync settings to other than default in Solaris? And how does the way Proxmox handles vdisk writes differ from ESXi? I ask because I came across this post with regard to FreeNAS + ESXi.

rubylaser · Feb 25, 2016

If you are going to have your VM machine storage on a separate box from your VM host, you will likely be mounting the volume via NFS which will default to always doing sync writes for maximum safety (a good practice with VM disks). Unfortunately, you need a good ZIL device to overcome this (something that has very low latency like NVMe, DRAM based storage, or an Intel S3700).

I keep my ZFS array for Proxmox right on the host to avoid this slow sync behavior over the network. I'm sure @gea will have great recommendations if you are planning on going with ZFS on OmniOS.

gea · Feb 25, 2016

Basically its a question performance vs security.

Performance wise, ZFS uses a writecache to collect small random writes for a few seconds and writes them the as a single large and fast sequential write. As ZFS is a copy on write filesystem, write actions are done in a consistent way (completely or discarded). A power outage in this situation does not affect pool consistency but a few seconds of last writes may be lost.

If you use databases with transactions or old filesystems like ext4 or ntfs on ZFS and the database or the OS writes data, they do not not care about atomic consistent writes or the ZFS writecache. So it can happen that dependent transactions (like financial transaction ex remove money from one account and add it then to another) of filesystem updates (modify data and then update metadata) are done only partly. In such a case you have transferred money to nirwana or your filesystem metadata is corrupt. This happens additionally to a corrupt file or database where you need additionally journaling to be protected. For a SMB filer, this is never a problem. On a power outage, the currently written file is damaged as well but ZFS remains always consistent and valid.

With hardware raid, you can use cache and BBU to reduce this problem. With ZFS you use sync write and a ZIL that logs all transactions. On a power outage all writes that are not on stable storage despite a commit to the database or OS are then written on next bootup.

So if you decide, you need such a security level, you need sync write - no discussion. Propably you discover, that secure sync write is much slower than the regular sequential write over the writecache, especially if you do not use fast SSD only pools.
This is where a Slog device helps. It allows to put the sync log files to a fast SSD and use the pool for regular sequential writes.

This behaviour affects local writes (Proxmox etc), NFS and iSCSI. With sync=default, a client can decide to use sync or not. ESXi requests sync over NFS. You can override this with sync=disabled. If your clinet does not request sync but you want you can enable with sync=always.

With iSCSI you have basically the same problem. If you use a zvol on Solaris and share it via iSCSI, the according sync setting is writeback-cache for the logical unit. If you disable write back cache sync write is forced and you want a Slog.

about 4 x Intel 730 480GB SSDs Performance
Asuming that a single SSD can give around 300 MB/s constant sequential performance
and around 15000 iops:

A raid - 10 of 4 SSD can give
600 MB sequently (2 x SSD) on writes and 30000 iops (2 vdevs)

compared to a Raid-Z1
900 MB/s sequentially (3 x datadisk) and 15000 iops (like a single ssd)

With that many iops, a Raid-10 construct is mainly for spindels.
With SSD only iops is mostly more than enough with Raid-Z (any)

T_Minus · Feb 25, 2016

I would see how you like performance w/out a SLOG on your SSD only pool (VM usage only it sounds like), and then if it's not fast enough get a 200GB S3700, not fast enough get a NVME drive they're <200$ for some of the Samsung 951 now. You're going to over provision them A LOT, and I doubt in your home lab you're going to have heavy usage -- but if so maybe step it up to NVME

Always an upgrade path

JimPhreak · Feb 25, 2016

@gea That was an excellent breakdown I really appreciate in the depth and detailed response. My Intel 730's are capable of 550MB/s adn 470MB/s sequential read and write respectively (see pic below). I may take @T_Minus's advice and see what performance is like with default settings and then if I feel like I'm not getting the performance I need I can add a faster SLOG device after the fact.

rubylaser · Feb 25, 2016

What @gea gave you are pretty darn accurate from my personal experience with four of the same drives in ZFS raid10. They can peak faster than that, but under a constant multi VM workload, what he quoted is probably closer to real world results. The best thing to do is try it out and see how it performs for you in real life. That's the only benchmark that really matters

JimPhreak · Feb 25, 2016

rubylaser said:
What @gea gave you are pretty darn accurate from my personal experience with four of the same drives in ZFS raid10. They can peak faster than that, but under a constant multi VM workload, what he quoted is probably closer to real world results. The best thing to do is try it out and see how it performs for you in real life. That's the only benchmark that really matters

Yea I think that's the plan.

@gea Can I run napp-it off passed through mirrored USB drives in a VM and if so how large would they need to be?

T_Minus · Feb 25, 2016

JimPhreak said:
@gea That was an excellent breakdown I really appreciate in the depth and detailed response. My Intel 730's are capable of 550MB/s adn 470MB/s sequential read and write respectively (see pic below). I may take @T_Minus's advice and see what performance is like with default settings and then if I feel like I'm not getting the performance I need I can add a faster SLOG device after the fact.

Those are 'MAX' # not steady state

Keep in mind those are aimed at desktop market, and are really a higher clocked S3500 but sustained performance will be similar.

JimPhreak · Feb 25, 2016

T_Minus said:
Those are 'MAX' # not steady state

Keep in mind those are aimed at desktop market, and are really a higher clocked S3500 but sustained performance will be similar.

Yes that I'm aware of. I'm not expecting enterprise level performance out of these SSD's but I still want to get the most out of them that I can. They should work fine for my home network.

gea · Feb 29, 2016

JimPhreak said:
@gea Can I run napp-it off passed through mirrored USB drives in a VM and if so how large would they need to be?

Maybe possible if you get it booting
but I would not do.

JimPhreak · Feb 29, 2016

gea said:
Maybe possible if you get it booting
but I would not do.

So what do you recommend as a redundant boot drive for the VM appliance? The board I'm looking to purchase has an onboard LSI controller but I'll need to use that for my bulk storage VM (12 disks). Therefore I'm stuck with the 4 onboard SATA ports unless I add in a second HBA which I might have to do in this case. I do have an M1015 laying around I could use.

gea · Feb 29, 2016

You need a local datastore.
While USB is difficult but not impossible, the best is using Sata for local datastores
and in your case two HBAs in pass-through mode.

How to use USB as a datastore in ESXi 6
USB Devices as VMFS Datastore in vSphere ESXi 6.0

you can use the new free ESXi webclient to create a datastore in ESXI (stop usbarbitrator per CLI)
ESXi Embedded Host Client – VMware Labs

JimPhreak · Feb 29, 2016

gea said:
You need a local datastore.
While USB is difficult but not impossible, the best is using Sata for local datastores
and in your case two HBAs in pass-through mode.

How to use USB as a datastore in ESXi 6
USB Devices as VMFS Datastore in vSphere ESXi 6.0

you can use the new free ESXi webclient to create a datastore in ESXI (stop usbarbitrator per CLI)
ESXi Embedded Host Client – VMware Labs

Is your napp-it supported on Proxmox or only ESXi?

gea · Feb 29, 2016

I support ESXi only

JimPhreak · Feb 29, 2016

gea said:
I support ESXi only

Hmmmm, that makes my decision tougher. I'll have to think about going this route now. I love ESXi but for my home network I want to start using clustering / HA and I have no desire to pay for VMware licensing.

T_Minus · Feb 29, 2016

Why not pick up 2x S3500 80gb or 120gb for Napp-IT and 1 or 2 other small VMs like FireWall, run them in RAID1 on the onboard SATA chips ?? and then use Napp-IT for ZFS to manage your 4xSSD pool for other VMs?

JimPhreak · Mar 1, 2016

T_Minus said:
Why not pick up 2x S3500 80gb or 120gb for Napp-IT and 1 or 2 other small VMs like FireWall, run them in RAID1 on the onboard SATA chips ?? and then use Napp-IT for ZFS to manage your 4xSSD pool for other VMs?

That's probably what I would wind up doing. I'll need to use 3 controllers for this setup. I thought for a second I couldn't use Napp-it since I'm planning to create a Proxmox VM cluster but my storage server could use a standalone ESXi install since it shouldn't need to be part of the Proxmox cluster.

Search

ZFS pool with 4x SSD

JimPhreak

Active Member

rubylaser

Active Member

JimPhreak

Active Member

rubylaser

Active Member

gea

Well-Known Member

T_Minus

Build. Break. Fix. Repeat

JimPhreak

Active Member

rubylaser

Active Member

JimPhreak

Active Member

T_Minus

Build. Break. Fix. Repeat

JimPhreak

Active Member

gea

Well-Known Member

JimPhreak

Active Member

gea

Well-Known Member

JimPhreak

Active Member

gea

Well-Known Member

JimPhreak

Active Member

T_Minus

Build. Break. Fix. Repeat

JimPhreak

Active Member