ZFS Advice for new Setup

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

humbleThC

Member
Nov 7, 2016
99
9
8
48
Only some Brits in Europe believe that you can have your cake and eat it.

In real world you have cost, reliability and performance
and can only optimize two of them.

I would skip the idea of an Slog or buy a used Intel S3700-100
Just purchased a pair of Intel S3700-200's for $73 ea!! even cheaper than the ZOTACs i bought earlier today... so now i'm going to have 4x new SSDs, all with PLP. And cheaper than I could find the S3700-100s on Ebay used for.. So I'll just have to deal with the extra ZIL capacity.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I vote for the two pools as well, BIG fan of seperate capacity and performance pools, capacity pool 8+2 raidz2 and performance pool 4 ssd stripped mirror config for VM's (or as many as you can afford hah). Grow as you go :-D

EDIT: Imma' sit back and enjoy the show...

@humbleThC , you certainly seem to have been around the block a time or two and are obviously quite knowledgeable (eerily similar background to me in some respects) but I do have to at least throw this out there...been there done that w/ wanting to 'have cake/eat a BIG ole' slice of it too' and unfortunately what I think you will find and you have already eluded to (your gut feeling above) is that you can brave this path and piece humpty dumpty (or frankenzombie as I like to call it) a solution together but MANY a times a mixed use pool will leave a bad taste in your mouth or leave you wanting more. Cool/fun to experiment though and I encourage you to do so to your hearts content, we luv that sh|t around here, just share any breakthroughs or 'ah ha' moments so we may either laugh/cry w/ you (that zotac impulse buy and then s3700 proper purchase had me LMFAO). Your wallet will thank you later hangin' round here long enough! BWA HAH

BIG SMILES :-D

Take care buddy, eagerly following for epicness or epic failure (hoping for the former of course)
 
Last edited:
  • Like
Reactions: humbleThC

humbleThC

Member
Nov 7, 2016
99
9
8
48
(that zotac impulse buy and then s3700 proper purchase had me LMFAO)
You noticed that? Yeah... there's that :)

I'm not sure the ZOTACs are all that terrible, they seem like middleware between consumer and enterprise grade, with the label claiming 'enterprise grade'. But the perf reports coming out of multiple quality SSD reviewers say they are within the top 16% of all enterprise SSDs in the same category. So unless they are just plain unreliable or plagued with firmware issues, they still might do the job.

I have a lot of options going forward now... 10x HDDs and 8x SSDs

Design/Thought of the morning:

Pool0 - 10x HDDs [raidz 2(4+1)] w/ 2x Intel 3700s to start for L2ARC and 2x ZOTAC for ZIL
- Will grow by (5) packs of Hitachi 4TB NAS
- Will monitor L2ARC and ZIL for overutilization, and add SSDs as required.
- Really depends on which set of SSDs here end up being faster.
- If the Intels are faster, L2ARC
- If the Zotacs are unreliable , then L2ARC (because I wouldnt want a downed ZIL device)

Pool1 - 4x SSD [raid10 2(1+1)]
- Will grow this pool by 2x or 4x Samsung Evo 850s to increase cap+bw
 

Davewolfs

Active Member
Aug 6, 2015
339
32
28
Serious question. Is ZOL stable and ready for prime time?

I've been on OmniOS for a while but the lack of FUSE or ability to use other cloud backup solutions is becoming aggravating. I'm thinking of moving the pool to Linux myself but am nervous about doing so.
 
Last edited:

ttabbal

Active Member
Mar 10, 2016
747
207
43
47
I would use the 3700s for ZIL. They are known to work well in that role.

ZOL uses the same ZFS code as the other open source implementations. So the only real concern is the interface to the kernel. I have not seen any significant issues reported on it for a while now. I ran it on a backup box as a pool for Crashplan for a couple years without issue, and it's been my production environment for about a year.
 
  • Like
Reactions: wildchild

gea

Well-Known Member
Dec 31, 2010
3,161
1,195
113
DE
OmniOS includes now LX zones (lightweight Linux container) from SmartOS.
Currently in beta state, stable in next 151022 LTS release later this year.

This may be an attractive option to add Linux services to OmniOS
 

Davewolfs

Active Member
Aug 6, 2015
339
32
28
OmniOS includes now LX zones (lightweight Linux container) from SmartOS.
Currently in beta state, stable in next 151022 LTS release later this year.

This may be an attractive option to add Linux services to OmniOS
I test drove it. The program I am running is giving me OS 13 errors for some reason. Also LOFS doesn't seem to handle sparse files.
 

humbleThC

Member
Nov 7, 2016
99
9
8
48
Hardware Architecture Update :)

So this NAS is the SuperMicro 36x Bay chassis, and it came with all 36x sleds for all slots, but they are 3.5" based.
- 24 Front Facing Slots (4U)
- 12 Rear Facing Slots (2U) (which means the server portion only consumes 2U as well - go Low Profile or home 7x PCI 2.0 x8 -in x16 slots)

My goal originally was to save all 36x bays for 4TB Hitachi drives, and just literally velcro a stack of (5) SSDs on the inside of the chassis (as to not block air as possible) and cheat with 5x "internal" SSD slots. Using a 4-1 QSFP to SATA3 breakout cable and borrowing 12v and splitting it a bunch for SATA internal power :)

With the "new plan" adding 4x more SSDs, there just isn't enough room to keep jamming more floating SSDs. And realistically I don't see myself growing beyond the front 24x slots for the next 5 yrs+ so i'll be fine.

Now to move all of my SSDs to the back 12x rear facing slots.

Step 1) Order 8x 2.5" SSD SAS to 3.5" SATA Hard Disk Drive HDD Adapter CADDY TRAY Hot Swap Plug | eBay

Step 2) Rewire the LSI 9211-8i's QSFP cables to the backplane for SSDs

Step 3) Split up the SSDs across 2x LSI Adapters for bandwidth >= redundancy

LSI9211-8i #1
Controller 1 - Port 0 - Intel S3700 200GB --> Pool 0 - Mirror Slog [maybe swap with ZATOC]
Controller 1 - Port 1 - ZATOC 250GB --> Pool 0 - L2ARC
Controller 2 - Port 0 - Samsung Evo 850 240GB --> Pool 1 - Mirror 1
Controller 2- Port 1 - Samsung Evo 850 240GB --> Pool 1 - Mirror 1

LSI9211-8i #2
Controller 1 - Port 0 - Intel S3700 200GB --> Pool 0 - Mirror Slog [maybe swap with ZATOC]
Controller 1 - Port 1 - ZATOC 250GB --> Pool 0 - L2ARC
Controller 2 - Port 0 - Samsung Evo 850 240GB --> Pool 1 - Mirror 2
Controller 2 - Port 1 - Samsung Evo 850 240GB --> Pool 1 - Mirror 2

LSI9211-8i #3
5x Hitachi 4TB NAS 7.2k --> Pool 0 - RAIDz (4+1)

LSI9211-8i #4
5x Hitachi 4TB NAS 7.2k --> Pool 0 - RAIDz (4+1)

My Assumptions are:
- Split the Intel S3700s up on separate adapters in Raid1 for additional hardware redundancy
- Split the ZATOCs as well, non mirrored, but for additional bandwidth
- Split each individual RAIDz (4+1) on to it's own LSI adapter
---- This is only an assumption that the background scheduler prefers each RAID component to be on the same controller when possible for less overhead.
---- I still lose the entire pool if I lose either LSI of course, but in my case I dont have 5x LSI's to split 1 drive per controller for HDDs alone, so i'll never really achieve LSI redundancy.

That should leave me (4) more Rear slots for SSDs to grow (either to speed up Pool 0 with more PLP SSDs or grow Pool 1 with two more mirror components) --- Otherwise i'm back to internal velcro action :)

i.e. Rust in the front, party in the back.
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I dunno what type of backplane you're dealing with but I don't think QSFP is the tick...LSI 9211-8i's use minisas 8087 cables and you can either run those minisas HBA to minisas backplane or minisas HBA to sata breakout to backplane.

Also I think there is a better 3.5 to 2.5 SuperMicro tool-less tray that some have touted around here and I know I have seen a video of I believe, they looked slick.

Here they are (MCP-220-00118-0B): another source, couldn't find it here but I know it's hiding somewhere round' here.


EDIT: Found it.

The Best 3.5" to 2.5" adapter that I've seen from Supermicro
 
  • Like
Reactions: humbleThC

humbleThC

Member
Nov 7, 2016
99
9
8
48
Those do look snazzy... I need to be less impulsive, and learn to ask before buying :) as I immediately ordered the ones I found.

Hopefully my eBay seller lets me cancel the order (hasn't shipped), because I just Amazon'd 8x of these :)
SO NO ONE LINK ME TO BETTER ONES...
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I need to be less impulsive, and learn to ask before buying :) as I immediately ordered the ones I found.
Why yes...yes you do grasshopper, appreciate the enthusiasm though! Can't even deny I have not been a victim of my own over zealousness before hah

EDIT: P.S. Can I have some of your lab budget??? :p
 

ttabbal

Active Member
Mar 10, 2016
747
207
43
47
- Split the Intel S3700s up on separate adapters in Raid1 for additional hardware redundancy
I think you mean you're splitting them up just to have them on different controllers, no issue there.

Just in case you plan to use the card's hardware RAID to mirror them, don't do that. Let ZFS manage that stuff. Just add a partition on each of them to the pool as a log device. ZFS will take care of it. If a log device fails, ZFS will stop using it and let you know. Same with L2ARC devices. I only mention this because you wouldn't be the first enterprise storage guy I've seen do it. :)

There's no problem putting each vdev worth of disks on it's own controller. When I was doing raidz, I split them over controllers, but didn't really have a good reason to. If you lose a controller, the pool goes offline, but you can get the data back by re-importing it after the controller issue is resolved.
 

humbleThC

Member
Nov 7, 2016
99
9
8
48
Why yes...yes you do grasshopper, appreciate the enthusiasm though! Can't even deny I have not been a victim of my own over zealousness before hah

EDIT: P.S. Can I have some of your lab budget??? :p
Heh.. i've pieced together this lab over a few quarterly bonuses, and a tax return or two :)

Was going to ask what's the rated interface speed on your CX3s? and of that what do you actually see (iperf -vs- real life ZFS in a best case scenario). Also was curious if you are playing with ESX inside ESX? Noticed your lab setup is pretty similar to mine.

I'm planning to do a "undercloud/overcloud" build out.
i.e. (2) Physical ESX servers getting storage from this NAS, as my physical "undercloud".

Then, i plan to use vRA/vRO to basically auto-deploy "virtual Labs" within my physical lab, i.e. the "overcloud".
- This will be able to spin up 1-8x vESX servers as VMs, a VCSA and a Domain Controller to basically bring up an "twice virtualized" lab.
- I don't have any real expectations of performance in the "overcloud" (best effort)
- But i can do stuff like test VSAN 6.2 on vESX servers, where i can spin up many separate dedicated internal networks on separate virtual adapters to the vESX hosts.
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
The CX3's I have are the FDR 56Gb IB and 40/10 eth VPI cards, I'm hooked to a 10G ethernet switch so right now I have them negotiating @ 10G from the switch up to vSphere ESXi hosts. iperf between two VM's on different hypervisor hosts yields near line rate speeds 9Gbps or so...if I bump parallel streams up to two or four it saturates 10G. I typically use FreeNAS graphs or cacti host w/ SNMP to my Juniper switch and watch interface graphs there. Either works. Fan of fio for benchmarking.

upload_2017-1-6_18-43-11.png

Psstt, I've been doing nested ESXi since the vSphere 4.x days for sure. Hell maybe even earlier, when you remember GSX you know you're old'ish

For nested vSAN you'd just have to have a LSI HBA pased thru to each vESXi and attach disks, easy peasy! I have not done this but was debating it at one point, I have three hosts so I just use a poor man's vSAN AFA comprised of a single 200gb s3700 for cache tier and 800gb s3610 for capacity tier in each host. Definitely interested in cutting my teeth more on vRA/vRO, NSX is next up to bat though in my vSphere studies. I've managed several rather large vCD infrastructures over the last 4-5 years, ever since vCD 1.5

Sounds like you have a lot on your plate for the next several months/years :-D Kiss your fam goodbye...j/k find that balance, it's a MUST!
 
Last edited:
  • Like
Reactions: humbleThC

humbleThC

Member
Nov 7, 2016
99
9
8
48
I ended up getting the Mellanox 4036-E (36port 40Gb QDR based switch) for $225
The Mellanox CX2 adapters for $38 ea
And 5/7meter QSFP cables for $10 ea
So all in all for (4) hosts + Switch & cables = $457 to have dual 40Gb interfaces all-around.

Current IPERF3 from Windows 10 (my main desktop) to OmniOS peaks around 15-15.5Gbs around 4-5 sockets
[SUM] 0.00-10.00 sec 18.1 GBytes 15.5 Gbits/sec sender
[SUM] 0.00-10.00 sec 18.1 GBytes 15.5 Gbits/sec receiver

I remember GSX :) Juuuust finished my VCP6-DCV & VSAN6.2 certs, but my company keeps me storage focused mostly, so i'm picking up VSAN/ScaleIO , for now, and plan to take a look @ CEPH and AWS S3/Glacier soon. (and why not a little OpenStack / CloudForms). i.e. Undercloud/overcloud means I can spin up clouds within clouds... it'll be raining up in here soon™
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
NICE, I know a few weeks back through my MLX IS5022 using IPoIB between CentOS7 phys hosts I was getting abt 21Gbps and in CentOS7 VM's w/ vt-D HCA's 18Gbps so your right on target it sounds like.
 

humbleThC

Member
Nov 7, 2016
99
9
8
48
BTW~ Anyone looking for a used Intel S3700-200, the dude I bought mine from has 1 more. $73
DELL INTEL SSD DC S3700 200GB SATA SSDSC2BA200G3T 06P5GN | eBay

Tried to get my buddy to buy it "just cuz deal", but guess he wasnt interested. He originally had 4, and someone bought 1 before I found the listing. (otherwise i'd have all 4, and be returning the ZATOCs)... But tbh i'm not upset about the ZATOC impulse buy yet... Not until I hit them real hard and see how they actually do -vs- the S3700.
 

humbleThC

Member
Nov 7, 2016
99
9
8
48
Question(s) relating to ZFS using 'whole disk' -vs- 'partitions'.

Is it true, that ZFS will only use the onboard disk Cache if the device is added as 'whole disk'?
If not....

Whats the general assumption on splitting disks in to partitions and stripe/mirroring across partitions for 'plaiding'.
i.e.

Instead of a two whole disk mirror, what if you partitioned each of the SSDs, lets say, in to two partitions.
And then created a RAID1+0 set, where you mirror the 1st two partitions, and the 2nd two partitions and then stripe across.
Thus providing more vDEVs to the underlying scheduler, and potentially using more internal drive controller threads.

I know for HDDs this is considered a no-no (due to platter/head contention), but i'm just wondering if ZFS can't squeeze every IOPs out of a 2 disk SSD mirror, what-if you RAID1+0 it. My thought is, if there is any perf boost, keep going, partition it in to 4 partitions and get 4vDEVs out of 2x SSDs and try again, until you find the point of no gain.
 

humbleThC

Member
Nov 7, 2016
99
9
8
48
My Chassis Slot > SAS Controller > Disk > Function Layout/Design

Splitting L2ARC & ZIL across all (4) HBAs for BW
Splitting Pool 1 - SSD RAID1+0 - across all (4) HBAs [different onboard ASIC from L2ARC/ZIL]
Splitting Pool 0 - HDD RAID5+0 - across all (4) HBAs [split across all 4 controller / 8 ASICs]

upload_2017-1-7_13-8-26.png

Cosmetically the drives are going to be split up all over the chassis front & back, so it wont look pretty :( But I dont get to pick the relationship between the back of the SAS expander and the front :( So i'll design for performance > cosmetics..

Suppose if I had 1 more LSI HBA I could split up the 10x HDDs so no more than one RAIDz component is on any adapter for controller redundancy across the HDD pool.
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Question(s) relating to ZFS using 'whole disk' -vs- 'partitions'.

Is it true, that ZFS will only use the onboard disk Cache if the device is added as 'whole disk'?
If not....

Whats the general assumption on splitting disks in to partitions and stripe/mirroring across partitions for 'plaiding'.
i.e.

Instead of a two whole disk mirror, what if you partitioned each of the SSDs, lets say, in to two partitions.
And then created a RAID1+0 set, where you mirror the 1st two partitions, and the 2nd two partitions and then stripe across.
Thus providing more vDEVs to the underlying scheduler, and potentially using more internal drive controller threads.

I know for HDDs this is considered a no-no (due to platter/head contention), but i'm just wondering if ZFS can't squeeze every IOPs out of a 2 disk SSD mirror, what-if you RAID1+0 it. My thought is, if there is any perf boost, keep going, partition it in to 4 partitions and get 4vDEVs out of 2x SSDs and try again, until you find the point of no gain.
It is generally considered a best practice to NOT use partitions for ZFS cache devices and just give it the whole device (KISS principle), I know some do it but it's kinda a no no, least it was always frowned upon to most of the ZFS zealots/like-minded folks I have bumped into over the years. I'll punt and let @gea explain the nitty gritty technical details down in the weeds if he feels it warrants further discussion or if things have simply changed that I am unaware of.