napp-it on OmniOS on ESXi - performance?

jjoyceiv · Jul 19, 2017

So I'm familiar with ESXi, and if possible I'd like to use it as the hypervisor for an upcoming AIO solution for a small office (actually happening this time). What I'm wondering is whether the following hardware combination (more or less) will give me the performance I need for up to five simultaneous end users:

- 6x 6tb HD, connected to an LSI9260-8i HBA in IT mode
- 2x Samsung 850 Evo, Intel 545s, or better SSD, mirrored, connected to mobo
- SATA DOM or extra SSD

I'm thinking I'd install ESXi to the last of these, then drop the napp-it to go VM on the same device, then assign the HBA (definitely doable) and the SSDs (not exactly sure how) to the napp-it VM. The napp-it VM would then serve the drives as raidz2 and the SSDs as raid1 to the network, with ESXi utilizing the SSDs as datastores for other VMs.

Would I see good enough performance here? Is there any downside to the "loopback" sort of configuration for the SSDs? Should I grab another HBA for the SSDs?

I'd really prefer to run ESXi, the napp-it VM, and the other VMs (which will mostly be low-traffic services, testing, and an OwnCloud or similar instance for outside sharing) on the mirrored SSDs, as low-level as possible, but I'm not seeing any good way other than hardware RAID or fakeraid to do so, and both have their disadvantages.

Thanks!

ps: I know the SSDs aren't exactly "enterprise grade", but they wouldn't exactly be hosting anything IO-demanding. The server will be behind a beefy-enough UPS that as long as I configure autoshutdown right, power loss is of little concern. If I get a set to use with ZFS for SLOG I'll go with something higher-grade. That's another thread.

gea · Jul 19, 2017

LSI 9260 is a hardware raid and not well suited for ZFS.
Better is a HBA with IT firmware ex LSI 9211, 9207 or 9300

A desktop SSD work but does not offer powerloss protection what means that a crash during write can corrupt a VM, Samsung SM 863 would be a better but more expensive choice but saves the extra slog (that requires powerloss protection as well)

A virtualised NAS can be nearly as fast as a barebone NAS if you compare them with similar CPU and RAM assignements.

The real power of a virtualised ZFS NAS for ESXi comes when you passthrough the HBA, connect all disks to OmniOS and create two pools, a fast one from SSDs and for VMs and one from disks for general use. Then share a ZFS filesystem (SSD) via NFS and use it for VMs in ESXi.

jjoyceiv · Jul 19, 2017

Thanks gea. My exact hardware choices aren't anywhere close to set in stone. Sounds like I'll want either a second HBA, or a beefier primary one to handle the SSDs. I'm not sure I follow why having the VM datastore SSDs be battery-backed eliminates the need for a SLOG - what would happen to the HD pool's ZIL in the event of a power loss without a SLOG?

gea · Jul 19, 2017

With enough RAM ZFS use a write cache of 4GB to improve performance for small random writes. In case of a crash or power outage during a write the cache content is commited as on disk to a writing application but in reality lost.

While this does not affect consistency of ZFS as there a atomic write like write data + update metadate is done completely or not. But on ESXi you may have VMs with ext or ntfs filesystems. On them a partly written transaction ex data modification is done but metadata was not updated what con result in a corrupted filesystem.

For VMs, you should enable sync to log all committed writes. Without an slog the logging is done to the pool (ZIL device) or an slog. In bboth cases you should look for devices with poerloss protection.

In former times with older filesystems, a hardware raid + BBU was the method to protect against this. But as a ZFS slog is faster and larger it is better you skip the hardware raid + BBU and use a good, cheaper and faster HBA with SSDs and powerloss protection (Intel DC line or Samsung 863).

dragonme · Jul 21, 2017

@jjoyceiv

So to clarify..for a small office serving 5 people.. serving them what? Desktops, NAS data, just the owncloud data stores? You will need to fairly carefully plan out the required iops and network bandwidth for anything other than just file sharing or this will cause you choke points... anywho... I have been setting up an all in one at home to replace my aged media server and also run a OS X instance to do security cameras.. I run plex on a ubuntu vm, OS X,observium, vcenter vcsa, and others on my esxi box and things are starting to come together.

The hardware setup is a dual l5640 with 48gb of RAM (you will run out of ram before anything else in esxi get the most you can afford) a lsi 9212 4e4i (I only have room for one card), just the single mobo nic.

For drive arrangement

BOOT ESXI: internal USB header, cheap 6gb thumb drive (best practices esxi runs in memory)

SECOND BOOT: Observium THEN Napp-it VM. 60gb cheap SSD to onboard sata (a esxi VMFS datastore)
This second boot is critical as napp-it is providing the NFS shares esxi is looking to for the VMs
I would keep this on a cheap ssd as booting from something slower sometimes causes timeout
ESXI is a PITA for UPS Mgmt unless you have a networked UPS and even then its a PITA
I double purpose the observium vm to run apcupsd connected to my usb UPS to control shutdown
Therefor it HAS to boot first and shutdown LAST to make that work

NAPP-IT boots and gets the LSI card through passthrough thus has hardware level control of that card and the 2x intel 3500 SSDs running a stripe for speed and max storage space for the VMs... it also gets both 8TB hard drives passed to it via RDM from mobo sata ports. This works just fine and is pretty transparent.. I dont know if I am taking some write performance hits this way as the que depth of the drives this way I think is 32.. but its just a data drive for media mostly so its mostly read intensive and scrubs of this pool is just as fast as raw tests on the drives before the build. Ideally another lsi would be the way to go but I cant fit one in the box. So now we have 2 pools up and running .. on fast ssd based pool for the VMs and a large data pool... keeping the io latency critical stuff on the lsi as passing RDM from sata is a questionmark.. also .. since you are booting nappit from a sata port... you can't pass nappit the sata controller so this RDM passing of individual drives is a requirement at this point.

ESXI then gets NFS share from the napp-it SSD pool for the VMs and they begin to autoboot in the sequence you specify through vcenter. Getting that working without using vcenter is a crapshoot as best.

I will not get into the internal and external networking as gea has good guides and there are a million question marks on how to optimize so you will have to experiment.. personally .. if you have some VMs that need lots of network bandwidth get a 4 port card and pass some ports directly to the VM vs having all that traffic virutaulized but thats just a opinion

For your setup on the drives.. you have or will get a 8 port card and only listed 8 drives for you working sets.. 6x6 and 2 ssd for pools.. you could probably keep them on the same lsi card and just use the sata on the mobo for the ssd for nappit.. you could boot esxi from it too but its a waste

Trade offers and decisions...

One.. zfs raid/mirrors whatever.. is not a backup.. lets repeat that.. its not a backup.. raidz and mirrors exist for one reason.. to repair data and keep you limping along online without having to bring the datastore down for repair.

I backup everything .. hence the 4e4i card.. the 4e goes 8088 cable to a 24disk shelf running a backup storage pool also controlled by napp-it when I need to run backups.. therefor my online datasets are setup for speed and storage capacity while the backup array is setup for redundancy .. I am comfortable running this way you may not.. depends on your tolerance for downdtime and criticality of the online pools between backups.

BIG QUESTION MARKS

The SSD based VM pool on SSD with capacitor protection

Zfs and hardware raid are fundamentally different on sync write protection.. and only ZFS guarantees data integrity.. google hardware raid write hole.. even with battery backup .. you can have silent data corruption.

Some say that sync writes are not needed at all.. personally I leave it on for peace of mind since VMDKs corrupt easily and just hitting the power off vs shutdown guest is like yanking the cord on the VM and you will do it... so when zfs tells your os that the write has been committed its best to have that guarantee..

In ZFS this guarantee is provided by ZIL.. it is either in one of 2 places.. its either on the pool with the rest of your data... or its on a seperate log device.. a SLOG. If you keep your ZIL on the pool.... the ZIL only hold the data between commits which is about 5 seconds of data before zfs commits it to a regular write.. so what happens here is that the data is sent to storage. Rather than wait til zfs does a flush to write the data out of ram.. for a sync write.. it LOGS the write to the ZIL... and then commits the write.. but it commits it from ram.. its faster.. the ZIL / SLOG never gets read unless there is a power outage that 1 takes out the ram and 2 on reboot .. the ZIL/SLOG has uncommitted writes.. then it REPLAYS what should have been written.. reported back to the os as completed but died in RAM on power loss... its a 5 second non volitile scrachpad.. more below.

So knowing that we have to deal with sync writes this is what I have found.. and ZFS is still very much a black box as far as the algorithms used and you will find disagreement at almost every turn

Sync writes generally write data TWICE... it was designed that way for spinning media to return the ACK for the write as quick as possible so ZFS shotguns the initial write wherever the heads happens to be .. its scattered all over the platters and makes a mess but its quick.. then zfs re commits the writes and lays the tracks down in a more efficient manner on the second write 5 second or so later from RAM... this behavior does change based on the blocksize of the data written.. and also tuning settings under the hood of zfs but that is generally how it works without a log device.. with a log device... the first write goes to this log device and zfs still does the final commit from RAM to the spinning media in order to lay down bigger more efficient tracks...( unless as stated above there is a power failure) with spinning disks and heavy sync writes that actually commit to zil.. an ssd speeds this up.. but if you are on a SSD based pool. .. having a SSD zil really doesnt get you performance but it keeps your SSDs from writing data twice and also reduces Swiss cheasing you spinning platters as a copy on write zfs file system doesnt rewrite... so on SSD pool. You will notice very quickly on a pool that has sync workload that fragmentation goes up quickly and you cant defrag a zfs drive.. no big deal on a SSD.. not great for platters. You would need to use a nvme or zeusram to get a speed boost with SSD..

Further .. on a SSD based sync write pool with adiquate UPS backup and capacitor backed SSD.. some say that setting logbias=thoughput vs latency (default) bypasses the ZIL altogether and only writes that data once.. forcing the sync write to commit before ACK.. but on stripped SSDs that have no latency could be faster than with a seperate log device.. on spinning platters it might give a speed bump on large sequential writes but be REALLY BAD for small random sync writes.. but for SSD might be the way to go.. still looking at but for now stripped SSD is plenty for my VMs as they are not write intensive.. vcenter probably does the most

So for your setup it should work

I would put your VM SSD disks on the HBA controller and keep data there too with SATA via RDM as a second tier solution

Get lots of ram... get lots of ram...

You need a backup solution.. ZFS is not a BACKUP... let me say that again.. you need a backup..

What little performance hit you MAY take on having the SSD VM storage on the all in one vs running it directly connected to esxi via the raid on you lsi card is more than made up for in simplicity of use... hardware raid is flakey, has silent corruption via the write hole, and it esxi is hard to backup without 3rd party software...

Keep in mind that most esxi installations get their data from external data stores over a cable of some type.. Ethernet/fiberchannel etc.. from a seperate filer ... a esxi all in one is getting its data from a internal - virtual - network connection that has far less latency and more pipe than all but the most expensive 10/40GB networking gear... and far more reliable especially if its only one host and not a cluster.. if you need a cluster.. then perhaps having a seperate shared resource makes sense.. but single host all in one really reduces failure points

Running in esxi protects the data FAR better than hardware raid.. PERIOD and that is not disputed .. its far supperior to hardware raid in every aspect. For example in hardware raid your 6x6 TB drives would take weeks to rebuild in a hardware raid if they were 5% filled in zfs it only rebuilds DATA not BLOCKS so it would take minutes

Zfs is also far more tolerant of disk/cabling issues ... where a hardware raid would fault the array on a hiccup..

Some issues I am having with napp-it / Omni as my zfs provider

I am runnning without a napp-it license .. so I think I am loosing ZFS ESXI auto/hot snaps which in this case would be nice... ACL and permissions are a pain without having access to napp-its gui. Monitoring is virtually non existent without license. Omni solaris based zfs lacks xattr=sa .. which embeds xattr as part of the file which i know OSX preferres and most other modern os's and I think that is really slowing things down and possibly causing me other OS X compatibility issues on my SMB shares. SMB is only 2.1 on Omni while the rest of the world is running 3+.. again a slowdown.

For my use case I am thinking that freenas being based on FreeBSD (which is closer to OS X) would be a bit better in the compatibility department and have all the features I am lacking without needing to buy a license.. its a hobby machine so I dont have the cash to fork out every year for a license..

Or take the overhead and run zfs on OS X like I have for years on my non esxi server that this replaces and use OS X for 90% of my zfs data and use napp-it free only for the vm pools... but then I would need to tear down and restructure the esxi box and this has taken long enough to setup.. so testing continues ..

I have been running zfs for 8+ years and I really cant find a better solution.. zfs still does more and does it better than the competition....
ZFS however is VERY powerful and thus very COMPLEX.. its not a set it forget it.. if you want to get everything out of it..that is why SUN Solaris storage engineers get paid big consulting bucks to setup enterprise but for our use cases .. just knowing the basics is usually good enough unless you make some serious blunders...

6 drives in raidz2 will only be as fast as the slowest drive.. its really visible as a single 'drive' for iops.. vs say doing a stripe of 2 3 drive raidz. You get the same capacity.. but now its got twice the iops as its seen as 2 devices and your rebuild with be faster.

There are lots of good sites on zfs .. if you have never used zfs before its really best to read first... build later.. many thinks with a zfs pool are set in stone at creation time and cant be changed... and a poor setup will hobble you

whitey · Jul 21, 2017

Wow, longest reply EVER hah, good stuff though

dragonme · Jul 21, 2017

@whitey

I was drinking my morning jo and felt generous.. hehe

gea · Jul 21, 2017

Thank you for your detailed explanations that are quite my stance.

Only a few remarks about Solaris and napp-it
- ESXi Hotspap integration in ZFS snaps is a free feature, requires only some SSH work
- Most people set Solaris nfs4 ACL (Windows ntfs alike) via Windows.
You need napp-it with the ACL extension only on deny rules or on problems withWindows due long paths or on problems with owners (napp-it on Unix has always full root acces, you cannot lock at as you can on Windows and it takes care on rule order in Solaris, Windows does not as Windows care first on deny rules then allow while Solaris respects order of rules,)
- SMB 3 or special SMB features are not related to an OS but the SMB server.
Any Linux/Unix incl. Solaris has SAMBA, the most feature rich SMB server that offers SMB3

Solaris additionally and per default offers its own SMB server. It has a much better integration of Windows ntfs alike ACL, supports Windows sid in an AD environment, supports Window salike SMB groups additionally to Unix groups, has a perfect integration of snaps as previous versions, is multithreaded and mostly faster than SAMBA (no problem to go up to 10G performance) and is perfectly integrated into the OS and ZFS as a ZFS property.

This does not mean that SAMBA has its own advantages. But if you want SMB3 mostly due its mpio features, you should use a 2016 Window server.

On OSX there are other problems. On 10,5 OSX was faster than Windows with SMB on 10G networks. Currently OSX is a disaster performancewise. Even when disabling SMB signing as suggested it is much slower than before. Some additions regarding Timemachine with SMB are more or less there to hinder non-Apple hardware. Best performance is with Jumboframes. Last year I saw 900 MB/s on my Mac Pros and 10G. With the same hardware I am currently at around 300 MB/s, see http://napp-it.org/doc/downloads/performance_smb2.pdf

Last Word
ZFS is not complex, its quite easy to understand.
If you look for complexity, look at Microsoft Storage Spaces and fast and reliable setups

dragonme · Jul 21, 2017

@gea

thanks as always for weighing in and I am in no way trying to insult napp-it or your work.. you are always here and helpful!! most of the limitiations I am facing is getting solaris flavored zfs to play well with OSX and the kernel based SMB in solarish is worse than using SMB in freenas and / or osx for me. I have signing turned off on mac to mac smb or mac to freenas smb I can write at line speed over 100MBs best I can get going to solaris/napp-it is about 65 and its very bursty.. I can download from napp-it to OSX at line speed over 110MBs.. so something is going on there....

OSX posix/ACL issues illude me in my setup. and I dont thing solaris plays well with OSX extended attributes but that also could be a premissions issue.

in your reply above you mention windows this and windows that .. I dont run a singe windows box in the house... I have some windows VMs that I can spin up on my worksatation just to play with but I need a sever that is going to work well and play well in the mac environment and solaris fights me at every turn.

I like that napp-it is lightweight and seems stable most of the time.. I had some issues with feezing but I think all that is ironed out..

I would love to have more charts and stats on the free version.. its prettty bleak compared to freenas and to be honest.. I can get more done more quickly just using the command line vs using the GUI on the free version.. again I ran my osx server with zfs on it command line only for years.. was hoping to get more out of napp-it in that regard..

as for ESXI I am probably doing something wrong then?!!??

under system -> esxi I have my server rackable.lab.local

under jobs -> esxi-hotsnaps -> esxi serverlist I have rackable.lab.local

under jobs -> esxi-hotsnaps -> request esxi Vm list napp it returns a table of all my VMs

under jobs -> esxi-hotsnaps -> edit vm list I commented out the VMs I don't need snapped.

click submit

on the page that is presented I select snapjob under action I can select import new actions from all avaialable VMs ... what exactly do I need to fill in there .. I guess that might be the week link?

so I think I have the setup right? Can you explain how to properly setup a autosnap esxi job? I have tired a couple things and read the guide I must be missing something?

thanks gea.. appeciate the help..

gea · Jul 22, 2017

The basic idea is to create an ESXi hotsnap remotely via SSH command on ESXi prior an ZFS autosnap and to destroy this after the ZFS snap is done. The menus help to create the needed actions and to edit a list of VMs for whom you want an ESXi snap for a given ZFS snap job.

You can activate a pre/post function when creating a new autosnap job or when you edit one (click on job-id in menu Jobs)

On a home system you may also simply create an ESXi hot memory snap manually from time to time and do a ZFS snap manually to include the hot memory restore option. The syncronisation via job is mainly needed if you want this feature quite often examply every day or more often.

About stats
You can get all stats continously at console or you can use basic stats in menu system. Only realtime updates with background tasks are not in the free version. I would not declare this as essential. The idea behind napp-it is to offer essentials for free and to offer support and some comfort extras nonfree that are required to finance the fun. There are enough projects that cancelled as not enough were paying for the work. This is different on Linux with big companies behind and at a lesser degree with BSD.

dragonme · Jul 22, 2017

Thanks gea..

so to get this straight in my head.. the process I described above.. where napp-it imports a list of my VMs... does not actually add any of those entries automatically in the pre and post actions.. ie napp-it doesnt set anything up. I have to edit the pre and post script that come up with a generic esxi snapshot command that is commented out.. and write a seperate command line for each VM manually using the VM numbers from the list that it imports? then why bother telling the user on the imported list to comment out any unwanted VMs if it doesnt even use said list?

I just dont follow..

as for doing int manually .. yea... I could do it manually... and could do it from any zfs from command line so why use a product that doesnt provide value added? again just dont get it.

If I am using a front end on something to accomplish a task.. I should not have to use command line.. hence the reason for running in a gui.. no?

and sorry.. but you dont provide a single graph in the free version.. again.. I can run zilstat and iostat from the command line.. I dont need a gui to give me text... I expect a gui to provide a graphical interface to justify the overhead of running the the front end in the firstplace..

I think this is why freenas has taken off while napp-it has not.. its the same relative product from what I can see between free and enterprise with enterprise getting paid support contracts and or pre build storage hardware with it installed.. they make thier money from corporate support contracts... just like every other successful opensource the flourishes... just like SUN with solaris.. when oricle took them over and everyone complained they were doing away with open solaris... solaris started its protraced death right there.. and now omni is probably headed for the grave

gea · Jul 22, 2017

It is as it is.
You can select from whats on the table and there is no best for all as every solution has its advantages and disadvantages.

But why are you claiming the death of OmniOS or other Solaris based solutions based on the only fact that FreeNAS has a feature for free while napp-it (my personal work, not OpenSource) has not. There are other features that Solarish or napp-it has that BSD or FreeNAS has not. FreeNAS does not offer a similar support to BSD as nappit does for Solaris, its free fork Illumos or even Linux. BSD or Linux does not have these features and napp-it is mainly an effort to make an enterprise Unix like Solaris or Illumos usable for an average user. And as an Apple only user you must know that using a non mainstream solution restricts your options in favour of a most wanted feature and regarding Solarish this is the best of all ZFS experience.

dragonme · Jul 22, 2017

@gea

Yes.. napp-it is what it is.. and I am just giving you my observations and technical issues with it. You are really not shooting for the enterprise market with this and I doubt your install base there is very significant. If you haven't noticed the OSX operating system is fast becoming popular and everyone I know that use to use windows.. me included 10+ years ago.. are switching.. i dont know a single mac user going to windows.. that should tell you something. I am not locked into one operating system. I have machines running ubuntu, OS X, OS X server, ubuntu server, Debian, FreeBSD... I use what gets the job done. The right tool for the right job that makes the process easier not harder... windows hasn't made anything easy in a very very long time.

I pronounce omnios as dead as it has lost its major development and is being thrown to the wolves..time will tell if enough hobby horsepower gets behind it to make it anything worth while moving forward. Mainstream solaris is dying .. oracle just announced back at the beginning of the year its scrapping 12 and there is no roadmap.. that screams death to me... and good riddance .. I dont know a single data center manager that will loose sleep over it. It evolves too slowly and oracle was too arrogant to take requests from the field. It is almost always last to the party to support the latest hardware and its decision to go its seperate ways with zfs almost killed one of the finest storage systems of modern computing... it did however make its zfs incompatible moving forward with ZFS on every other platform.. bsd, OS X,linux, etc. .further distancing solaris from the mainstream as being compatible..

Innovate or die.. that is the watchword in IT and when was the last time Solaris was leading from the front....

gea · Jul 22, 2017

Time will tell.
Current roadmap for Solaris 11.next is up to 2022 with support promises much longer. The free Solaris around Illumos is fully independent and both have their users and use cases. I do not care and I do not share your expectations not with OSX nor with Solaris or its free forks.

Search

napp-it on OmniOS on ESXi - performance?

jjoyceiv

Member

gea

Well-Known Member

jjoyceiv

Member

gea

Well-Known Member

dragonme

Active Member

whitey

Moderator

dragonme

Active Member

gea

Well-Known Member

dragonme

Active Member

gea

Well-Known Member

dragonme

Active Member

gea

Well-Known Member

dragonme

Active Member

gea

Well-Known Member