First time building a NAS - I have lots of questions before I start

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

MrCupHolder

New Member
Jun 20, 2023
4
0
1
Australia
I have 2 reasons for wanting to build my first NAS/SAN

1) I have a commercial grade SAN that I've used for iSCSI for labs and I'M finally tired of the noise it puts out and I'd like to replace it with something more consumer so it's quieter. Currently I only power it on when I want to do lab stuff and the noise is actually annoying me enough that I don't feel motivated to do any of that stuff.
2) I'd like to get my data off my PC and onto a NAS


Unless someone can point to a good reason why I shouldn't, this is the hardware I'm considering using for the NAS/SAN

CPU: i7-4820K (For those not in the know, this is a HEDT CPU that has something in the realms of 44 PCIe lanes, this will be important later)
RAM: 64GB

I'm not sure what NAS Software I should use and I'm looking for suggestions based on my requirements below. Or maybe none of them meet my requirements, please tell me if I'm dreaming!

1) I'd like to add 10Gbe Networking to this.
2) It would be nice if it could make use of an NVMe drive as a cache
3) Needs to support iSCSI for my lab
4) I want to also store my regular data on this.


How do you backup your NAS?
Can NAS software do it's own backups?

Currently my data is stored in HDD/SSD in my desktop.
I currently have 2x 3TB drives in RAID 0 in my desktop that stores my data.
I regularly use a Robocopy script to copy all that data to a 10TB drive that's in a drive caddy.
I have 2x 10TB drives for backup and one of them lives off-site and I simply swap them around as I create noteworthy chunks of data.
When those 2x 3TB drives become full, I'll buy 2x 20TB drives for the backup and put the 2x 10TB drives in RAID0 in the desktop.
I'd like to continue that idea but with the NAS. Obviously I become screwed when my data hits more than the size of a single available drive, but I'll just that hurdle if I ever get there.
 

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
Many options, I would suggest

- use ZFS with a decent redundancy ex Z2 (two disks can fail without dataloss) and regular snapshots (readonly data versioning)
- if possible use ECC RAM or even with ZFS you can have undetected data corruptions

- do a disaster backup ex daily or weekly to removeable disks , best with ZFS replication to sync files
- ZFS use RAM as cache, not disks. This means RAM is relevant for ZFS performance.

Most resource efficient is a Solaris/OmniOS based Unix NAS (where ZFS comes from, up from 2B RAM, rest is used as cache) with the unique kernelbased SMB server (zero config), mainstream is Linux with SAMBA.

For iSCSI Solaris or OmniOS comes with Comstar an enterprise grade iSCSI stack,

Care about disk controller (best is Sata AHCI or one with an LSI 9300 chip) and nics

For webmanagement you can use my napp-it
free version is ok at home, napp-it // webbased ZFS NAS/SAN appliance for OmniOS, OpenIndiana and Solaris : Manual
 
Last edited:

rtech

Active Member
Jun 2, 2021
304
108
43
- do a disaster backup ex daily or weekly to removeable disks , best with ZFS replication to sync files
With removable disks you cannot automate your backups
If you want to automate backups to offline storage
- get third PC
- setup NAS/SAN to wakeup this offline backup NAS
- run backup script
- when backup script finishes turn off offline backup NAS
- ???
- profit

With this setup your disks are not protected from lighting strikes though.

I would definitly go for something with:
- IPMI
- ECC ram preferably RDIMM (cheap) LRDIMM and RDIMM dont mix

What for do you need 44 pcie lane CPU?
 

mattventura

Active Member
Nov 9, 2022
447
217
43
ZFS is good at all this, but the usual recommendation is to just use mirrors if there's any doubt as to what the best pool layout is. This becomes more true if you have a mixed or unknown workload.

Also you can upgrade from the 4820k to an E5-1650v2 to get 2 more cores for $10. The 1660v2 and 1680v2 are also options, but cost a little more. I'd recommend ECC RAM if you haven't already purchased memory.

ZFS can make use of an NVMe drive as an L2ARC. You can also use NVMe for SLOG or a dedicated metadata device, but you would want mirrored NVMes for that.

You can do iSCSI by creating a zvol and sharing that via iSCSI.
 

MrCupHolder

New Member
Jun 20, 2023
4
0
1
Australia
Thank you for the feedback so far. But I think some of you are suggesting that I take this much further than I'm prepared to take or feel that it needs to be taken.
I'm well aware that you can setup 2 NAS devices and have them sync over the internet. However I have no where to put the 2nd NAS and also I feel it's overkill.
Also I don't care for running any redundancy in the NAS. So long as I'm able to easily take a backup (and continue to have 2 backups, one onsite and one offsite) I don't feel the need for running a redundant array. JBOB or RAID0 is fine. I'm not setting this up for enterprise, it's for 1 person. If a drive dies, buy a new one and restore from backup.

Whilst I'm sure that using RAM as cache is great, it's expensive and also limits how much can be stored in cache which is why I suggested NVMe it's still pretty quick and can hold a lot more files for the price as compared to RAM

RAM is already purchased.

Thanks for the suggestion of the COMSTAR thingy, by that statement you can probably guess how much time I spent looking at it. I'm really looking for a solution I can setup with the help of a no more than 2 hour tutorial video. That document looks massive and is way more time than I feel I want to invest in this.

What would be real nice is if I could get a Home version of NetApp because that's what we run at work and I already know how to set that all up. But alas I don't think there's anything like that.

With regards to backup I feel it's a bit sad that none of the NAS solutions include a way of doing backup other than to mirror to a 2nd NAS.

Is it possible to virtualise a NAS so that then the backup could be done through a 2nd VM on the same box?
I'm currently using a script to do backups. But all that does is make a copy of my data, no version and no keeping of deleted files. I'm only aware of commercial solutions for backup. Is there nothing better for home? (Just did a google search and so many of them want you to do a cloud backup, nope I don't want nor need a cloud backup) - Investigating Veeam right now.

If I have to I'll just run the backup on my PC. Don't really want to have to have the expense of everything a 3rd PC will involve.

ZFS is good at all this, but the usual recommendation is to just use mirrors if there's any doubt as to what the best pool layout is. This becomes more true if you have a mixed or unknown workload.

Also you can upgrade from the 4820k to an E5-1650v2 to get 2 more cores for $10. The 1660v2 and 1680v2 are also options, but cost a little more. I'd recommend ECC RAM if you haven't already purchased memory.

ZFS can make use of an NVMe drive as an L2ARC. You can also use NVMe for SLOG or a dedicated metadata device, but you would want mirrored NVMes for that.

You can do iSCSI by creating a zvol and sharing that via iSCSI.
Matt you've lost me. I'm building my FIRST NAS. This means that terms an acronyms that are specific to NAS are possibly going to go right over my head. I've worked with NetApp and EMC devices and some of these terms escape me.

What do you mean by "the usual recommendation is to just use mirrors if there's any doubt as to what the best pool layout is". Disk mirroring? So RAID1? or NAS mirroring? Either way it's more expense that I feel adds no benefit.

What is L2ARC?
What is SLOG and why is it necessary to run RAID1 NVMe?
 

mattventura

Active Member
Nov 9, 2022
447
217
43
Also I don't care for running any redundancy in the NAS. So long as I'm able to easily take a backup (and continue to have 2 backups, one onsite and one offsite) I don't feel the need for running a redundant array. JBOB or RAID0 is fine. I'm not setting this up for enterprise, it's for 1 person. If a drive dies, buy a new one and restore from backup.
The risk isn't so much losing an entire disk (drives usually sit somewhere around a 0.4% annual failure rate), but instead smaller corruption from bit flips, unclean shutdowns, and the like. If you have a redundant drive, ZFS can passively correct nearly all of those issues. Most filesystems just don't bother with data integrity - you just hope that it didn't corrupt anything important, and there's usually no obvious indication of such!

Whilst I'm sure that using RAM as cache is great, it's expensive and also limits how much can be stored in cache which is why I suggested NVMe it's still pretty quick and can hold a lot more files for the price as compared to RAM
The good news is that you can do both. ZFS supports both RAM cache (ARC) and a disk cache (L2ARC).

Is it possible to virtualise a NAS so that then the backup could be done through a 2nd VM on the same box?
That's possible, but I'd wonder what you're really looking for. It sounds like what you're actually seeking is just snapshotting so that you can retrieve previous versions of a file (which ZFS can do within one box).

Matt you've lost me.
Sorry! I know it can be a lot.

Okay, some terminology:

ZFS pools (zpools) consist of one or more vdevs. Each vdev can be:
* A single drive. If you have multiple single drive vdevs, it would be the equivalent of RAID0. You can expand capacity by replacing an individual drive with a larger one, or by adding another vdev down the line.
* A mirror - the equivalent of RAID1. If you have multiple mirrors, it's like a RAID10. You can expand capacity by replacing both drives in a mirror with larger ones, or by adding another mirror.
* RAIDZ - the equivalent of RAID5. If you have multiple mirrors, it's like a RAID50. Expanding these later is harder, due to there being 3 or more drives per vdev, all of which would need to be upgraded or added.
* RAIDZ2 - equivalent to RAID6, or RAID60 for multiple. Same caveats apply.

The benefits of using mirrors compared to RAIDZ are:
* More consistent performance (RAIDZ is better for some things, worse than others)
* Capacity is easy to expand (you would only need to buy two drives to expand capacity - you can add a new mirror or replace an existing mirror)

Compared to single-drive vdevs (the equivalent of RAID0), you'll have less capacity and probably less write performance, but can survive failures and repair damaged data.

L2ARC is the feature that lets you use an SSD into a read cache. You can use one SSD, or multiple, and you can control which data sets are eligible to be in the L2ARC. Since this just caches data that lives on the hard drives, it can fail and you won't lose data.

SLOG is sort of like a dedicated write cache drive, but has some differences. You wouldn't want to run it on a single drive - you'd want redundancy so that it doesn't become a single point of failure.

There's also a third way to use an SSD, which is as a metadata-only device. Since reading or writing metadata tends to be a less sequential and more random, it's good to have it on an SSD. This leaves the HDDs free for the more sequential workloads. But since the metadata would only live here, rather than being a cache, you would want redundancy here.

One more thing to keep in mind - more drives = more performance. So instead of 2x20TB, consider 4x10TB, 10x4TB, or some other combination.

Hoping that's not too much detail!
 

OrlyP

Member
May 16, 2023
31
12
8
I'll go out on a limb and suggest TrueNAS. It can do everything you need and more.... ZFS, iSCSI, backup, etc.

Up to you to validate if it fits your needs and if your hardware will play nice with it.
 

MrCupHolder

New Member
Jun 20, 2023
4
0
1
Australia
The risk isn't so much losing an entire disk (drives usually sit somewhere around a 0.4% annual failure rate), but instead smaller corruption from bit flips, unclean shutdowns, and the like. If you have a redundant drive, ZFS can passively correct nearly all of those issues. Most filesystems just don't bother with data integrity - you just hope that it didn't corrupt anything important, and there's usually no obvious indication of such!
This gives me good reason to save some extra dollars for more HDD's and adds a benefit that is worth the extra effort. I have suffered from files becoming corrupt after the fact.
To me running RAID5/6 adds no benefit. As someone who has used RAID cards in their PC's in the past it just feels like adding more complication which then feels like it negates the benefit. It's just for me, can handle the downtime and if I have good backups I'll just restore.

The good news is that you can do both. ZFS supports both RAM cache (ARC) and a disk cache (L2ARC).
Nice

That's possible, but I'd wonder what you're really looking for. It sounds like what you're actually seeking is just snapshotting so that you can retrieve previous versions of a file (which ZFS can do within one box).
It sounds like you're suggesting that I shouldn't do backups?
You mention above the annual failure rate of HDD's. Let's just say I don't remember all the HDD's that I've had fail mostly because only the first one hurt me. After that I learnt to keep good backups. I have 2 backups, one onsite and the other offsite and there's no way I'm changing that.
Now if I were to use snapshotting alongside backups I'm open to learning more about snapshotting. However my experience with snap shots comes from VMWare where they're best used temporarily. If you create too many of them it chews up disk space and can have a significant effect on performance and then can take a long time to clean up. So to me they're not something you want long term. Is that the same with ZFS?
If it's able to keep versions of my files that's great, but then how would that be backed up because that sort of thing is usually hidden from the user and you have to dig into deep menus to access them.
My current backup is simple, it's just a copy script and as such I doubt that it could also copy snapshots to the backup.
I have recently downloaded the free version of Veeam but not yet installed it. I'll play with that soon, maybe it will offer me what I'm looking for.
Sorry! I know it can be a lot.

Okay, some terminology:

ZFS pools (zpools) consist of one or more vdevs. Each vdev can be:
* A single drive. If you have multiple single drive vdevs, it would be the equivalent of RAID0. You can expand capacity by replacing an individual drive with a larger one, or by adding another vdev down the line.
* A mirror - the equivalent of RAID1. If you have multiple mirrors, it's like a RAID10. You can expand capacity by replacing both drives in a mirror with larger ones, or by adding another mirror.
* RAIDZ - the equivalent of RAID5. If you have multiple mirrors, it's like a RAID50. Expanding these later is harder, due to there being 3 or more drives per vdev, all of which would need to be upgraded or added.
* RAIDZ2 - equivalent to RAID6, or RAID60 for multiple. Same caveats apply.

The benefits of using mirrors compared to RAIDZ are:
* More consistent performance (RAIDZ is better for some things, worse than others)
* Capacity is easy to expand (you would only need to buy two drives to expand capacity - you can add a new mirror or replace an existing mirror)

Compared to single-drive vdevs (the equivalent of RAID0), you'll have less capacity and probably less write performance, but can survive failures and repair damaged data.
Failed HDD's I'm not worried about, that's what backups are for. But corrupted data on the other hand is something I am interested in.
Right now I'm using a copy script that simply mirrors your data drive to your backup drive without actually using backup software that can do comparisons and versioning. That means I'll copy corrupted data with no way of going back if I don't detect the corruption before doing the backup which is what I have suffered from.
L2ARC is the feature that lets you use an SSD into a read cache. You can use one SSD, or multiple, and you can control which data sets are eligible to be in the L2ARC. Since this just caches data that lives on the hard drives, it can fail and you won't lose data.

SLOG is sort of like a dedicated write cache drive, but has some differences. You wouldn't want to run it on a single drive - you'd want redundancy so that it doesn't become a single point of failure.

There's also a third way to use an SSD, which is as a metadata-only device. Since reading or writing metadata tends to be a less sequential and more random, it's good to have it on an SSD. This leaves the HDDs free for the more sequential workloads. But since the metadata would only live here, rather than being a cache, you would want redundancy here.

One more thing to keep in mind - more drives = more performance. So instead of 2x20TB, consider 4x10TB, 10x4TB, or some other combination.

Hoping that's not too much detail!
L2ARC - Great
SLOG and metadata-only - If it's best to have them on RAID1 can I put them on the same RAID1 mirror
Also as I type this the saying I've heard so much comes to mind, "RAID is not a backup". So I find myself asking the question, "is RAID really necessary if I'm doing backups"? Also, if it's so important that it should be on RAID1 shouldn't I be backing it up then, and if so, how?

I guess to sum up, I'd rather spend money on HDD's for backup than RAID.
RAID doesn't get my data back if the house burns down or someone breaks in and steals the PC. An offsite backup on the other hand will get me my data back.
Now if there's total data loss on the PC (theft, fire or something else) and total data loss to the offsite backup as well, I likely have much bigger problems in my life to worry about.
I'll go out on a limb and suggest TrueNAS. It can do everything you need and more.... ZFS, iSCSI, backup, etc.
Up to you to validate if it fits your needs and if your hardware will play nice with it.
Thanks, I'll check it out. I'm not using the i7-4820k system for anything right now so I can load it up and have a squiz.
 

mattventura

Active Member
Nov 9, 2022
447
217
43
It sounds like you're suggesting that I shouldn't do backups?
I would certainly advocate for backups, but if you're asking "Is it possible to virtualise a NAS so that then the backup could be done through a 2nd VM on the same box?", that sort of "backup" isn't going to give you anything that snapshots wouldn't - the data is still on the same physical device, dependent on the same storage.

If you create too many of them it chews up disk space and can have a significant effect on performance and then can take a long time to clean up. So to me they're not something you want long term. Is that the same with ZFS?
Yes, they will use disk space (but only the data that has actually changed since the snapshot - it's not a full duplicate copy). ZFS snapshots don't really slow you down - they're essentially free from a performance standpoint due to how ZFS works internally.

If it's able to keep versions of my files that's great, but then how would that be backed up because that sort of thing is usually hidden from the user and you have to dig into deep menus to access them.
ZFS's built in send/receive functionality actually only works on snapshots - you make a snapshot and then send it to whatever is receiving the backup (could be another pool on the same machine, or another machine over SSH/other protocols). You can use something like zrepl to automatically make periodic snapshots and replicate them to another box or another pool.

To manually access the files, you'll find a hidden .zfs directory wherever the dataset is mounted, which lets you browse the snapshotted versions of that dataset.

L2ARC - Great
SLOG and metadata-only - If it's best to have them on RAID1 can I put them on the same RAID1 mirror
Also as I type this the saying I've heard so much comes to mind, "RAID is not a backup". So I find myself asking the question, "is RAID really necessary if I'm doing backups"? Also, if it's so important that it should be on RAID1 shouldn't I be backing it up then, and if so, how?
You can use the same drives, yes. The only real requirement is that they should be significantly faster than your main storage drives (i.e. you want SSDs for those).

RAID is not a backup, but a backup isn't RAID. Restoring from a backup can be a long and time-consuming process, whereas RAID lets you avoid that for most failures. ZFS also obsoletes most of the other things you'd have to deal with on a storage stack - no need for manual partitioning (except as mentioned above), no LVM, no LUKS, no fstab.

But yes, ideally you'd want both.
 

MrCupHolder

New Member
Jun 20, 2023
4
0
1
Australia
I would certainly advocate for backups, but if you're asking "Is it possible to virtualise a NAS so that then the backup could be done through a 2nd VM on the same box?", that sort of "backup" isn't going to give you anything that snapshots wouldn't - the data is still on the same physical device, dependent on the same storage.
Oh I think I've figured out why we're misunderstanding each other.
My reason for suggesting running backup on a VM wasn't for the purpose of storing the data on the NAS. But rather if the NAS software lacked the capability to do backups. Hence run a 2nd VM with a different O/S that could handle software that can do backups.
If possible I would prefer the backups be able to be done on the same computer as the NAS so I either don't have to have a 3rd computer or have my desktop handling the backups. My order of preference would be as follows.

1) NAS box handles it's own backups
2) My daily desktop handles the NAS backups
3) 3rd PC handles backups for NAS.

My intention regardless of method for doing backups is for the backup to be stored on a HDD that is kept in a removeable drive bay. That way it has a SATA connection to the motherboard for best transfer speed and is easily removed for storing offsite.
Currently I use 2x 10TB HDD's for backup.
One remains onsite and the other is offsite.
I take the onsite one offsite and bring the offsite one home. I make sure to minimise how often I have all of my backup drives at home with me.
Yes I know that SATA connections aren't designed for lots of disconnecting and reconnecting. However I typically swap my onsite and offsite disks over monthly. So each disk only has 6 connections a year.

Penny just dropped on the above and so I figured I'd post it before I went to bed. I'll read the rest later. Thanks
 

name stolen

Member
Feb 20, 2018
50
17
8
Hi MrCupHolder,

There are some very knowledgable people giving great advice here, but reading along I see how you basically want to build a NAS to backup devices TO, and here come the professionals wanting you to backup your new NAS to at least two other sites. (Sorry professionals!) For you (and me), the NAS is the backup target, and maybe also a fileserver with redundancy.

Start with your 4820K and TrueNAS, either Core or Scale. You can always add another TrueNAS system later on for replication. :) ECC is great, and I use it, but it's not strictly necessary, especially if your RAM is known good on the platform you're using. Most of my own data has been subjected to overclocked CPUs, non-ECC RAM, cloned SSDs, Realtek NICs, NTFS, maybe a USB drive here or there - before being stored on my ECC ZFS NAS. It may not be perfect by the time it reaches TrueNAS.
 

i386

Well-Known Member
Mar 18, 2016
4,245
1,546
113
34
Germany
How do you backup your NAS?
Can NAS software do it's own backups?
I don't backup my nas as its the target backup system for my devices at home (soembody mentioned something similar already in this thread :D). Really important stuff (photos + videos, documents, licenses/keys) gets manually backed up to usb sticks, external harddrives/ssds or other places depending on the size.
The answer to the second question depends on the software. I'm using windows server 2022, I think it has some intergrated stuff and definitely third party backup solutions.
 

ecosse

Active Member
Jul 2, 2013
463
111
43
I'd probably go with what you are comfortable supporting, unless part of your aims is to learn a new technology and the risks beholden in that choice. There's a lot of talk about how things can destroy your data in this thread but human f-ups are far higher on the probability scale than say the chance of lightning strikes.
 
  • Like
Reactions: nexox

gea

Well-Known Member
Dec 31, 2010
3,163
1,195
113
DE
Thanks for the suggestion of the COMSTAR thingy, by that statement you can probably guess how much time I spent looking at it. I'm really looking for a solution I can setup with the help of a no more than 2 hour tutorial video. That document looks massive and is way more time than I feel I want to invest in this.

What would be real nice is if I could get a Home version of NetApp because that's what we run at work and I already know how to set that all up. But alas I don't think there's anything like that.

Comstar is enterprise class iSCSI software. The Comstar manual is ok when you want to understand how it works in details.

For a manual setup:

Step 1. create a logical unit (raw disk, file or ZFS zvol)
Step 2. create a target (this is what a client wants to connect to)
Step 3. create a view from the target to the logical unit to make it visible


With the GUI:
For daily use, you just enable iSCSI sharing for a filesystem (as you do with SMB, NFS or S3), set a size and thats it.

For backups of Luns, you can simply use zfs replication. If you want to recreate the iSCSI Lun from a backup zvol, just re enable iSCSI sharing with the Lun Guid intact.

iscsi.PNG

btw
Netapp sued Oracle ZFS years ago for beeing too similar (sometimes better) than NetApp
 

Attachments

Last edited:

louie1961

Active Member
May 15, 2023
164
63
28
How do you backup your NAS?
Can NAS software do it's own backups
I don't use iSCSI and I use two NAS devices in my setup, so this may not work for you. But what I do is I run a Synology and all the PCs in my home run the Synology drive client. So if I lose a PC I don't lose any data. My Synology backs up to two places: First, my second NAS built on a raspberry pi which is running rsync as a server. Synology hyperbackup backs everything up to that NAS so I have a second copy on site (technically third or more copy on site if you count the data on the PCs). I also have my Synology backing up nightly to AWS Glacier, giving me my offsite copy.

Before I bought the Synology I would back up my pi NAS to to another USB drive using Rsync, and back up offsite to Glacier using Rclone. You should be able to do that with any Linux based NAS software (and probably BSD stuff as well, but I have no first hand experience)

I also have a Proxmox server running in my home lab and it backs up to the raspberry pi NAS as well, but I don't run anything critical on that server, so I don't replicate it to the cloud. If I do start putting "production" stuff on that box, I will just use Rclone from the pi NAS to backup to Glacier.
 

BoredSysadmin

Not affiliated with Maxell
Mar 2, 2019
1,053
437
83
* A mirror - the equivalent of RAID1. If you have multiple mirrors, it's like a RAID10. You can expand capacity by replacing both drives in a mirror with larger ones, or by adding another mirror.
* RAIDZ - the equivalent of RAID5. If you have multiple mirrors, it's like a RAID50. Expanding these later is harder, due to there being 3 or more drives per vdev, all of which would need to be upgraded or added.
Minor correction of a copy-and-paste issue - See bolded. It should've been vdevs not mirrors.