ZFS/FreeNas : Large pools design with a HGST 4U60 rack

nephri

Active Member
Sep 23, 2015
535
104
43
42
Paris, France
I have a HGST 4U60 bay that i'm being fill it with 60 HGST 2Tb SAS 6Gb/s

I would to have advice on ZFS topology for achieve:
- 1 large pool for main storage
- 1 large pool for backup of the main storage

I thinking things like this:
- Main pool
- Stripping 15 "mirror" vdevs (30Tb usable with 30 disks)
- Backup pool
- Stripping 5 "raidz2" vdevs of 6 hdds (40 Tb usable with 30 disks)

I have also S3700/S3710 SSD to use for SLOG.
I would set a mirrored S3700 100 Gb/s SSDs on the Main pool.
I didn't think that i need an SLOG for the backup pool.

Any advices are welcomes.
 
Last edited:
  • Like
Reactions: mstrzyze

i386

Well-Known Member
Mar 18, 2016
1,888
490
83
31
30 drives in a raid 10/SAME config?
I think that's too dangerous, especially with used(?) older hdds. 1 mirror pair has to fail and you can lose the whole pool.

Backup pool:
Why not use 2 raid z3 vdevs? 15 drives of which 3 (random) can fail without losing data and more capacity (48tb).
 

nephri

Active Member
Sep 23, 2015
535
104
43
42
Paris, France
Yes it's used hdd. I have 10 hdd as cold spare.

It's for home lab. i have the backup if a pool failure arise. i can live with pool interrupt.
But, i thinked also that a resilvering with stripped mirror vdevs will not stress a lot of disks (only one).

If you don't go the stripped-mirror, what you would do for the main pool ?
For the backup pool, i read often to keep vdev with a small amount of hdd and try to keep below 8 hdd per vdev.

Another question, on the HGST 4U60, we have 2 controllers that handle each 30 disks.
Do you strip each pool equally on both controllers (probably best performance) or dedicate each pool to one controller (probably best resiliency) ?
 

nitrobass24

Moderator
Dec 26, 2010
1,083
127
63
TX
For the main pool I would do the 6-disk Z2 config since it will provide you more redundancy and capacity. The cost will be the time on resilvers, but with 2TB disk it's a great trade off in my opinion.

SLOG - I wouldn't split it across two pools, the results will be suboptimal. Also do you even need it on the backup pool?


Sent from my iPhone using Tapatalk
 

nephri

Active Member
Sep 23, 2015
535
104
43
42
Paris, France
So, everyone recommand a 5x raidz2 of 6 disks for both pools (40Tb usable for each)

I don't think to put an SLOG on the backup pool.
I want to set an SLOG only on the main pool. The slog will be a mirror of S3700 100gb SATA

In term of performance, what will be the best between strip/raidz2 vs strip/mirror ?

The server is built with:
- The HGST JBOD chassis is connected to a storage server with a LSI 9300-8e HBA
- The storage server use a 40Gb/s NIC (Chelsio T580-CR)
- The server has 64Gb RAM but i'm thinking to upgrade to 128 Gb
- The server has 2x Intel SSD DC S3700 100Gb that will be used as SLOG for the main pool
- The server has 4x Intel SSD CD S3710 400Gb that for a strip/mirror pool for hosting proxmox vm (on iscsi)
- The server has also 8x Seagate 3To SAS HDD (but i will try to resell it)
- The server has also 8x HGST 2To SATA HDD (but i will try to resell it)
 

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
Not sure if id'd go 5x raidz2 vs. 15x mirrors for the main pool.
In terms of performance the raidz2 in theory should end with 1/3 of the IOPS, as its 5 vdev vs. 15 vdev.
Sure, if you lose a whole mirror the pool is gone, but as you said you can live with this and restore from backup this shouldn't matter.
With hot-spares chances this will ever happen should be low, but definitely not zero.
[edit: saw you have cold spares, maybe put in 1 or 2 hot-spares ?]
Also resilvering is definitely faster on mirrors, and 2Tb drives should do the job quite fast.

For the controllers, why not put disk 1 of each mirror on first controller, the second on the other ?
Would balance the load for the main pool and protect from controller failure.

Another option could be 3-way mirrors, 10x each 3 disks, unfortunately ending with 1/3 of raw capacity (20Tb usable) but still 10 vs 5 raidz2 vdev and could loose 2 disks, no matter what vdev they belong to.
 
Last edited:

nephri

Active Member
Sep 23, 2015
535
104
43
42
Paris, France
Hi Alex,

It's not hot-spares but cold-spares. I will have to handle manually the disk replacement.
I didn't really like the hot-spares feature for spinning disks.

For the controllers, it's exactly what i'm thinking.
But admitting you have this topology:

Controller A Controller B
HD1a HD1b
HD2a HD2b
HD3a HD3b
HD4a HD4b


The pool is a strip of 4 vdev like:
- HD1a mirrored with HD1b (vdev1)
- HD2a mirrored with HD2b (vdev2)
- HD3a mirrored with HD3b (vdev3)
- HD4a mirrored with HD4b (vdev4)

If a read's IO need to read blocks along theses 4 vdevs, is ZFS enough smart ? for by example:
- It read blocks from vdev1 and vdev2 from the controller A
- It read blocks from vdev3 and vdev4 from the controller B

In order to optimize controllers throughput.
For 4 vdevs it seeems to be useless but for 15 vdevs+ it's another story.
I didn't know enough ZFS internals and behaviours for knowing how it will do under the hood.

The 3-way mirror is appealing but its' a bit costly in term of capacity. 50% is already a costly trade-off but a 33% is a nogo for me.
At this time a ZFS setup cost me about 1/8 of capacity of a raw storage
- 50% : main / backup pools
- 50% : resiliency of pools
- 50% : using only half of capacity of each pools for best performance/health of pools (zfs recommendations)

I will probably go for the strip/mirror for the main pool if nobody else convince me it's a big mistake.

Séb.
 
Last edited:

_alex

Active Member
Jan 28, 2016
874
94
28
Bavaria / Germany
Hi Séb,
for the way zfs would read, i'm not sure how this is handled.
This is not really dependent on the controllers the disks are attached, but more generally if ZFS reads from both disks in a mirror.
I'm quite sure mdadm does so in a RAID10 setup.

Yes, the 3-way mirror is sort of a waste of disks.
As total capacity is lower, you could go with more than 30 disks / 10vdev on the main pool and have fewer disks in the backup-pool then. This would balance usable capacity of the whole box a bit more. But totally agree that this is maybe too costly.

With 2-way mirrors i'd check SMART of the disks closely and build the mirrors with older/newer hdd if ever possible
i.e. 6k running hours with 25k running hours. Or mix different vendor HDD's. Just to do everything possible to prevent the loss of a whole mirror.

I'm also curious what others think, just my thoughts on this.
In the end it totally depends on the performance, capacity and fault tolerance needs.
 
Last edited:

nephri

Active Member
Sep 23, 2015
535
104
43
42
Paris, France
I just installed 36 disks inside the 4U60

Dual port is enabled and FreeNas show them into a menu "Storage / View Multipaths"
I looked up theses disks on "Storage / View Disks" and i was a bit confused when i didn't saw my disks" ....

Maybe it can help someone else, i wanted to determine where disks are located in the enclosure in order to locate them when a failure will arise.

So, i found a way like this:

I get all disks detected on the enclosure using sas3ircu
  • sas3ircu 1 display
that give by example for each disks

Device is a Hard disk
Enclosure # : 3
Slot # : 58
SAS Address : 5000cca-0-1b3d-c29e
State : Available (AVL)
Manufacturer : HITACHI
Model Number : HUS72302CLAR2000
Firmware Revision : C442
Serial No : YFH2YY8D
GUID : N/A
Protocol : SAS
Drive Type : SAS_HDD



The Slot # provide the number of the slot in the 4U60 enclosure.
The Serial No provide the serial number of the disk located in this slot.

Now, we have to determine which device is bound to this disks
In "Storage / View Multipaths", i saw each disk like this

[-] multipath/disk10
da36 PASSIVE
da35 ACTIVE


So, i lookup info on the active disk (it's the same from the passive one)

smartctl -a /dev/da35

give the Serial No of the disk bound to this device.

Now, i'm running smartctl and badblocks on them....
 
  • Like
Reactions: T_Minus