How to build a OpenSolaris derived ZFS Storage Server

Discussion in 'Solaris, Nexenta, OpenIndiana, and napp-it' started by gea, Dec 31, 2010.

  1. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    Basics

    ZFS is a revolutionary file-system with nearly unlimited capacity and superior data security thanks to copy on write, raid z1-3 without the raid5/6 write hole problem, with a online filecheck/ refresh feature and the capability to create nearly unlimited data snapshots without delay or initial space consumption. ZFS Boot snapshots are the way to go back to former OS-states. ZFS is stable, used in Enterprise Storage Systems.

    Features like Deduplication, online Encryption (from ZFS V.31), Triple Parity Raid and Hybrid Storage with SSD Read/ Write Cache Drives are State of the Art and just a included ZFS property. Volume- Raid and Storage Management are Part of any ZFS System and used with just two commands zfs and zpool. ZFS is now part not only in Solaris derived Systems, but also available in BSD, OSX and Linux under the roof of Open-ZFS.

    But Solaris derived Systems are more. ZFS is not just a file-system-add-on. Sun had developed a complete Enterprise Operating System with a unique integration of ZFS with services like a real AD Windows ACL compatible and fast SMB and a fast NFS Server as a ZFS property. Comstar, the included iSCSI Framework is fast and flexible, usable for complex SAN configurations. With Crossbow, the virtual Switch Framework you can build complex virtual network switches in software, Dtrace helps to analyse the system. Service-management is done the via the unique svcadm-tool. Lightweight virtualisation can be done on application level with Zones (KVM, Solaris Zones, LX/ Linux Container). All these features are developed and supported from Sun- now Oracle or Illumos, perfectly integrated into the OS with the same handling and without compatibility problems between them - the main reason why i prefer ZFS on Solaris derived systems-.

    Since Oracle bought Sun and closed the OpenSolaris project, there are now the following Operating Systems based on (the last free) OpenSolaris that was forked to Illumos

    1. some commercial options

    - Oracle Solaris 11

    the fastest and most feature rich ZFS server at the moment, the only one with encryption
    I support it with my free napp-it Web-Gui

    -NexentaStor Enterprise Storage-Appliance (based on llumos)

    2. some free options

    -OpenIndiana Hipster (the free successor of OpenSolaris), based on Illumos
    always dev state, usable for Desktop or Server use. I support it with my napp-it Web-Gui.
    download http://openindiana.org

    The Illumos Project
    is a fork of the OpenSolaris Kernel with the Kernel/ OS-functions and some base tools. Illumos is intended to be completely free and open source. Illumos is not an distribution but the common upstream of the main distributions NexentaCore, OpenIndiana, OmniOS and SmartOS.

    3. Use cases:

    Although there is a desktop-option with OpenIndiana, Solaris was developed by Sun to be a Enterprise Server OS with stability and performance at first place, best for:

    NAS
    -Windows SMB Fileserver (AD, ACL kompatible, Snaps via previous version )

    SAN
    -NFS and FC/ iSCSI Storage

    Web
    -AMP Stack (Apache, mySQL, PHP)

    Backup and Archiv
    -Snapshots , Online File-Checks with Data Refresh , Checksum

    Some Systems can be used as appliances and managed remotely via Browser and Web-Gui. They run on real hardware or virtualized, best on ESXi with pci-passthrough to SAS Controller and disks.


    4. Hardware:

    See my build examples
    http://www.napp-it.org/doc/downloads/napp-it_build_examples.pdf


    5. manual ZFS Server Installation

    Download the ISO or USB image, boot from it and install the OS to boot-drive.
    Use the whole boot disk. Installation is easy.
    see http://napp-it.org/doc/downloads/napp-it.pdf

    You can also install the OS as a virtualized SAN.
    (All-In-One, Virtual Server + SAN + virtual network switch in a box)
    see http://napp-it.org/doc/downloads/napp-in-one.pdf

    6. After OS setup, setup the storage appliance (CLI as root)

    wget -O - www.napp-it.org/nappit | perl
    You can now manage your NAS-appliance via http://ip:81

    thats all. Install + setup your ZFS-Server -ready to use- in about 30 min.

    7. napp-it to Go
    As an alternative to the manual setup, you can use preconfigured images, either a template for ESXi or system images that you can clone to a new Sata boot SSD.

    Gea
     
    #1
    Last edited: Oct 11, 2016
  2. layerbreak

    layerbreak New Member

    Joined:
    Dec 31, 2010
    Messages:
    28
    Likes Received:
    0
    Nice to see you here too. :)

    Now I have to sell my i3-560 and buy a x3440 first for ESXi.
    After chancing I want to check this at your website out.
     
    #2
    Last edited: Dec 31, 2010
  3. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    #3
    Last edited: Jan 26, 2011
  4. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    #4
  5. PigLover

    PigLover Moderator

    Joined:
    Jan 26, 2011
    Messages:
    2,757
    Likes Received:
    1,098
    Great writeup, very helpful. I've been planning to do some experimenting and the parts I have fit your writeup perfectly. Two questions:

    I love SuperMicro MBs and IPMI. Found this tonight: http://communities.vmware.com/thread/280988. Have you had any similar problems using IPMI with esxi?

    The CPU I have available is an L3426. Is this going to be enough CPU to do an All-in-one like you describe?
     
    #5
  6. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,425
    Likes Received:
    4,367
    If you use the Realtek for IPMI 2.0, then assign another NIC for the VMware management, that fixes that problem.
     
    #6
  7. fblittle

    fblittle New Member

    Joined:
    Apr 5, 2011
    Messages:
    19
    Likes Received:
    0
    I Have set up a Solaris11 NAS server with SMB shares. napp-it has truncated all my passwords to 8 characters in length. This was a problem until I discovered using LastPass that it had recorded the new passwords and showed that they were all 8 chr long. Is this normal? Some of my passwords are longer. I am using the console on a Win7 machine so windows should'nt be the limitation. I can't find any documentation on this. Other than that,I love napp-it. It makes it so easy to administer Solaris, I almost feel guilty using it.
     
    #7
    Last edited: Apr 11, 2011
  8. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    Current napp-it nightly allows to set passwords up to 16 characters.
    But if you use Nexenta, only the first 8 characters are used.
    (The reason, i set the pw-form to 8 char. max).

    OpenIndiana and SE11 are ok with longer passwords.

    Gea
     
    #8
  9. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    Why using ZFS?

    ZFS is Software raid + raid management + dynamic volume management (storage virtualisation).
    If you compare to traditional raid 5/6, look at the following:

    1.
    Raid 5/6 write hole problem: If a problem occurs during a write, you have partly updated data. You can reduce the problem with a battery pack but it can happen that you have a damaged filesystem. You need at least a offline filecheck that can last days on large storage.

    You can only solve the problem with a CopyOnWrite filesystem


    2.
    Bad block problem
    You do not need a whole drive failure. A single bad block (producing a read delay or error) can set a disk to failed. Two bad blocks can kill a Raid 5 if they happen on two disks. A good Raidcontroller can handle this with the price of a offline filecheck, that can last days.

    You can only solve the problem on filesystem level with a self healing filesystem that can handle a lot of bad blocks and repair them on the fly without a disk failure on timeout.


    3.
    Silent error problem or disk errors due to cabling or driver problems
    With a statistical rate, you encounter data errors by chance or errors occur to cabling or driver problems. Only some of them can be detected on a offline filecheck, that can last days.

    You can only solve the problem on filesystem level with data-checksums paired with regular online checks for data integrity.


    4.
    Data validity
    If you, someone else or a virus accidently deletes or modify data, you usually need a backup. If you discover the problem later, your backup is wrong as well as your second backup.

    You need versioning or a file history that may go back for week, months or years.
    Only stable way to do this are snapshots on a CopyOnWrite filesystem.


    5.
    Data is growing and hardware dies some day
    In my own environment, I have a 50% grow of data per year. I need the option to increase capacity up to the petabyte range without rebuild raids or extend raid stripes (you know, last days..) . I cannot allow that data access depends on a raidcontroller from a special brand. Raid must be controller independent.

    This can only solved with software raid and a pooled storage with dynamic filesystem sizes (storage virtualisation).


    6.Performance
    Every NAS can deliver pure disk or raid performance. But pure disk performance is bad.

    You need RAM caching, additional SSD caching or dedicated log-devices for fast and secury sync writes.
    These features are part of ZFS. With Solaris any otherwise unused RAM is used for caching. No RAM free.
    This is not because ZFS is RAM hungry - this is because your RAM is used to increase performance.
    If pure disk performance is enough, ZFS is stable with 1-2 GB RAM.


    Summary:
    All these problems are daily problems, affects data security or data availabilty or data validity. The problems are huge with large storage in the Multi-Terabyte area and they are the reason for the need of modern filesystems like ZFS as the first and most stable option and Btrfs and ReFS some day. Older filesystems like ext, hfs+, ntfs or xfs cannot handle these problems at all or in a comparable way .

    Stay or go with ZFS, newest and only option on Solaris/Illumos where development is done, a stable alternative on BSD and more and more an option in Linux.
     
    #9
  10. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    Affordable high-end storage

    You can build a nice cheap ZFS based Solarish home-fileserver on a HP microserver N36/40/54
    or a mid-size all-in-one or fileserver based on a SuperMicro X10SL7-F with included LSI SAS HBA 2308.


    You can also build real high performance ZFS storage or All-In-Ones with quite minimal costs.
    One of the current best offers (Mid 2013) is the following config that I build for some tests.


    Appliances based on a SuperMicro X9SRH-7TF
    with 2 x 10 GbE and SAS 2308 in IT mode (=LSI HBA 9207) onboard, about 500 $/€


    CPU: Xeon 2620-2 GhZ 6core (optional 4 or 8core)
    RAM: 32 GB ECC RAM (4 x 8GB, max 256 GB)


    Step 1:
    flash onboard LSI 2308 to IT mode
    http://www.napp-it.org/doc/manuals/flash_x9srh-7tf_it.pdf

    Step 2:
    Bios setting.
    - enable vt-d (advanced-chipset-nortbridge-I/O)

    step 3:
    Install ESXi 5.1 build 799733 onto a 50 GB SSD on Sata

    step 4:
    enable pass-through (ESXi vsphere: advanced settings) for 2308, reboot

    step5:
    upload OmniOS ISO to local datastore on the 50 GB SSD (use ESXi Filebrowser)

    step6:
    create a OmniSAN-VM (20 GB) with e1000 nic
    add the LSI 2308 as a pci adapter
    cd=bootup-iso, stable from may

    boot and setup

    step7:
    setup network according to:
    napp-it // webbased ZFS NAS/SAN appliance for OmniOS, OpenIndiana and Solaris

    step8:
    install napp-it via wget -O - www.napp-it.org/nappit | perl

    step 9:
    start napp-it via http://ip:81
    - create a pool (raid-z2 from 6 x Sandisk extreme2 - 480 GB), add a highspeed SSD with supercap (SLC or Intel s3700) or Dram ZIL (ZeusRAM)
    - create a filesystem /tank/vm
    - share the filesystem via NFS

    step 10:
    connect the filesystem /tank/vm via ESXi (NFS)

    step 11:
    - upload a win8 iso to the NFS datastore
    - create a new Win8 VM on the NFS datastore on OmniOS with DVD connected to Win8 iso
    - setup Win 8
    - Win8 ok


    next steps,
    - update ESXi to 5.1U1 build 1065491: ok
    - update OmniOS to newest stable build b281e50: ok
    - install vmware tools, setup vmxnet3 (http://napp-it.org/doc/ESXi-OmniOS_Installation_HOWTO_en.pdf): ok


    my "desktop testbed"
    [​IMG]
     
    #10
    Last edited: Jul 19, 2013
  11. mrkrad

    mrkrad Well-Known Member

    Joined:
    Oct 13, 2012
    Messages:
    1,234
    Likes Received:
    49
    Can you tell us what happens when an array has a misbehaving consumer drive? Say it has a bunch of bad sectors. It decides go into 180 second deep cycle recovery.

    1. Does the volume mounted on the drive lag at all? if so how long?
    2. How long does it try before giving up? A few hours? minutes? 8 seconds?
    3. Sometimes drives take a while spin up, does this impact the share?
    4. Modern sas drives have PI - sas drives with these features can provide both detection of error and partial read good. Does this support this? It may be able to say this sector is good, this sector is good, this sector is known bad but here is what i've got for data. (allows some controllers to reconstruct the bad sectors first whilst the remaining sectors that not bad still read from the damaged drive to reduce multi-disk rebuild failure).
    5. Does this support clustering? Two raid controllers to a sas expanders to dual ported drives for no single point of failure
    6. Is the failover in (iscsi,nfs) good enough for esxi? it is rather picky about timing?
    7. How's the VAAI support? Any tips? the main thing is thin-reclamation
    8. What is a good hardware setup for a D2D storage (inline dedupe,compress optional)?
    9. Is the smb fully compatible with windows? what happens with permissions and metadata?
    10. How bad is it to use a raid controller? Can you use hardware raid and build a giant unprotected volume?
    11. I ask this because PI and one other feature makes the LSI megaraid do checksum verify reads (same as zfs) in hardware. I'm guessing this might give your vm a bit more cpu for dedupe/compress

    12. If you had a DL180 G6 would you just throw in say 72-96gb of ram (ecc), a cheap 5520 cpu [or numa two sockets of cheap 5520] and two LSI controllers and 10gbe nic. If you had 6gbps to 12 sata drives what is a good balance of ssd? SLC is expensive and very small (x25-e) and quite slow compared to modern MLC/TLC. What's a good balance per buck?

    Can you alter the use of the ssd ie. 512GB of TLC 840 for read only cache and perhaps a raid-1 of 100gb old school samsung slc drives for write back? ZIL? Does the SLC raid-1 for ZIL/LOG need to be a particular speed?


    I don't have an IT controller but I could use the P420/1gb FBWC hp smart array and just do raid-0 of single disks to simulate jbod.
     
    #11
  12. Jeggs101

    Jeggs101 Well-Known Member

    Joined:
    Dec 29, 2010
    Messages:
    1,441
    Likes Received:
    209
    Interesting to see you have a cooler on the 10 gbase-T NIC heatsink
     
    #12
  13. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    A lot of question, i will try to answer

    1.
    A consumer drive is ok with ZFS if you can accept a delay on errors until the drive responds. This can happen on enterprise disks as well but not as often. If ZFS detects too many errors the pool goes to a degraded state (error: too many errors) with the disk offline. A hotspare can replace the faulted disk immediatly.

    2. For exact time, you should ask at irc #ZFS
    But I had e ESXi timeout on a NFS datastore after 140s while the pool was intact with a few error messages in the systemlog. In any case, you do not need these special TLER disks like with hardware raid.

    3. The share is accessable when the disks are up

    4. I do not know but disk internal error mechanism are independant from ZFS. ZFS checks validity not on disk level but end to end in RAM

    5. Solaris supports multipathing on SAS disks

    6. I do not use multipathing, maybee another person can answer but would expect its similar to other solutions

    7. VAAI is not a ZFS feature. You can use NexentaStor. They support VAAI

    8. Too many options without knowing use cases or finances.

    9. Solaris CIFS is the only SMB server outside Windows that can support Windows ACL, Windows SID, Windows previous versions and Windows share level ACL. SAMBA does not.

    10 Very bad. You loose self healing features what means that ZFS can detect errors but cannot repair. You also loose data security on sync writes due the controller cache. ZFS needs full control.

    11 Do not know

    12 Even with consumer SSDs you can build stable storage with Raid Z2/3. You may need enough hotspare disks and should replace them after 3 years or so. If you need a ZIL, use the fastest that you can afford but with a supercap (ex ZeusRAM, Intel s3700-200 GB or a fast SLC) or its worthless.

    For heavy write workloads, consider Intel s3700 as well. For sync write always add a very fast ZIL to consumer SSDs to reduce small writes. see http://www.napp-it.org/doc/manuals/benchmarks.pdf for some basic behaviours.

    Using slow or old SSDs for ZIL gives you a worser performance than without a dedicated ZIL.
     
    #13
    Last edited: Jul 19, 2013
  14. mrkrad

    mrkrad Well-Known Member

    Joined:
    Oct 13, 2012
    Messages:
    1,234
    Likes Received:
    49
    Hmm I think you would get a datastore lost connectivity if latency got anywhere near 1 second! Ouch!

    Can you tune solaris so it will drop the drive in less than a quarter of a second?
     
    #14
  15. Boris

    Boris Member

    Joined:
    May 16, 2015
    Messages:
    65
    Likes Received:
    11
    Could please someone help me.
    Which CPU will be better for ZFS storage RAIDZ2 or RAIDZ3 with 8-12-16 HDD. Socket 2011 Xeon 8 cores, but 2.2Ghz, or 4 cores and 3.7 Ghz?
     
    #15
  16. PigLover

    PigLover Moderator

    Joined:
    Jan 26, 2011
    Messages:
    2,757
    Likes Received:
    1,098
    Just for the filesystem processing and raid calcs? They are both overkill. Massive overkill.
     
    #16
  17. Boris

    Boris Member

    Joined:
    May 16, 2015
    Messages:
    65
    Likes Received:
    11
    Oh... So, only memory matter?
     
    #17
  18. Boris

    Boris Member

    Joined:
    May 16, 2015
    Messages:
    65
    Likes Received:
    11
    And one more question. Hardware LSI controller does not matter? Controller BBU and RAM cache, all useless, right?
    Software LSI 2208/etc is good enough?
     
    #18
  19. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    LSI 9211/ IBM 1015 (2008 chip) flashed with IT firmware
    or the newer LSI 9207 (2308 chip, IT mode per default) are perfect.

    LSI 2208 is a bad choice for ZFS as it is used for hardware raid.
     
    #19
  20. gea

    gea Well-Known Member

    Joined:
    Dec 31, 2010
    Messages:
    2,097
    Likes Received:
    674
    Depends on your overall performance needs.
    But mostly RAM is the key for performance (readcache)
     
    #20
    MiniKnight likes this.
Similar Threads: build OpenSolaris
Forum Title Date
Solaris, Nexenta, OpenIndiana, and napp-it New build on barebone hardware Jan 20, 2019
Solaris, Nexenta, OpenIndiana, and napp-it New build and getting "Maintenance Mode" on install Jan 4, 2017
Solaris, Nexenta, OpenIndiana, and napp-it Triton (SmartOS) / Plex / Low Power Media Server Build Dec 28, 2016
Solaris, Nexenta, OpenIndiana, and napp-it QA My Build Oct 6, 2016
Solaris, Nexenta, OpenIndiana, and napp-it Napp-it storageserver build examples May 29, 2016

Share This Page