ESXi 6.0 NFS with OmniOS Unstable - consistent APD on VM power off

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

TechIsCool

Active Member
Feb 8, 2012
263
117
43
Clinton, WA
techiscool.com
very interesting. I have found that omniOS is stable except some latency spikes but I still think thats hardware not software related for me. If you not moving files all the time I would just go iSCSI and call it good if you find that NFS is not stable on Solaris.
 

socra

Member
Feb 4, 2011
81
2
8
Well for me, NFS is stable on Oracle Solaris 11.2 but OmniOS NFS is not stable.
iSCSI...maybe can do some testing dunno if I have the stomach for switching from NFS to iSCSI..(file vs block based)
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
I asked for iSCSI tests repeatedly to no avail...I'm sure the gentlemen is TAXED for time and has to draw the line at some point though...he's spent and inordinate amoutn of time TS this so far...seems silly to rule out NFS as it is a rock solid protocol for VM storage and a TON of us use an extremely similar if not IDENTICAL setup to what this guy is using w/ high success/performance results. SMH and color me perplexed!

How abt ZoL NFS backed datastores...Ubuntu LTS w/ ZFS on Linux (ZoL) would be easy as pie to try/test.
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
One point to add
I just updated OmniOS 151014 from April to July on a testmachine and after this update I saw the all path down and was not able to remount NFS. I rebooted ESXi and after that all was ok again. Have you rebooted ESXi after switching the storage VM?
 

dswartz

Active Member
Jul 14, 2011
610
79
28
In general, NFS has been rock solid for me in the past on ESXi (I am using iSCSI now for other reasons), but ESXi had at least one critical APD bug in NFS in 5.5 that took *weeks* to resolve, so it isn't beyond belief they have another in 6.0. I had also had issues with omnios (not like this) that got me to switch away from omnios/NFS to linux/iSCSI. His call of course...
 

socra

Member
Feb 4, 2011
81
2
8
@whitey
Thanks for understanding..like you said I had to draw the line somewhere..if I had focused and gotten iSCSI to work it wouldn't prove anything with regards to OmniOS or NFS for that matter (NFS has always worked for me and still does with OI). Maybe I'll have a try with Linux but I also need a good working CIFS server that I use heavily.
Quote from Gea:
Restrictions:
Linux is not my preferred or main platform. Many napp-it features rely on Solaris projects like
the CIFS server with Windows SID and ACL support, Comstar iSCSI or Crossbow network virtualisation.
A napp-it version with similar functionality like on OmniOS is currently not planned.
@dswartz
Can you please explain why you are using ISCSI now? always willing to learn and get different insights..how is ZFS under linux treating you?
and what problems where you having with OmniOS?

@gea 1
I have some minor text errors for you to fix within Napp-it would you like me to email you?

@gea 2
I don't know but during one of my many tests I did first reboot my ESXi (I think) but to make sure, I'll use your latest appliance and then:
- Shutdown OI
- Remove VM's from inventory
- Unmount Datastore
- Remove LSI from OI
- Add LSI to Latest Gea OmniOS appliance
- Reboot ESXi
- Create a new test pool with test harddrive that is attached to LSI
- Try to create and start a VM

I know I'm fighting an uphill battle because I just realized I installed ESXi 6 on a clean USB drive so my LSI had to again be configured for passthrough. :(
 

dswartz

Active Member
Jul 14, 2011
610
79
28
I was having a problem with the vmware tools for omnios (also OI though). If you tried to shutdown the guest with tools running, the guest OS would panic. Lovely. I am using iSCSI with ESOS (a flash drive based iSCSI SAN appliance using SCST as the target) - see ESOS - Enterprise Storage OS for info. Works off flash with ramdisk-based filesystems for config and such. ZFS support was added some time ago - not in official releases yet, though, so until that happens (and ZoL is more stable), I am sticking with XFS. Don't get me wrong, I love ZoL, but it's still only 0.6.4 and lots of oddities still being experienced - I'm sure they will hammer these out over the next year or so, and I will switch back to ZFS from XFS...
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
I was having a problem with the vmware tools for omnios (also OI though). If you tried to shutdown the guest with tools running, the guest OS would panic. Lovely.
I have seen the same with Vmware tools 6.0 but this seems fixed (at least on my testmachine) with current Vmware tools 6.0.0.b
 

socra

Member
Feb 4, 2011
81
2
8
@gea
I know my friend we spoke a few times trough email couple of years back :) (reviewed your installation documentation)

@dswartz
I can confirm that with latest vmware tools OmniOS does a nice shutdown!

As for NFS..no luck...complete reboot of machine.
imported new appliance gave it same ip's added vm's from datastore and tried to start it....*pow* APD

Dropping NFS for now..going to focus on getting my vm's to local SSD, setting up backups..stuff should have been done already but this problem has been controlling my life for weeks now :)

Still undecided on my pools..keeping them at v28 and going for Gea OmniOS-Napp-it appliance for CIFS and backup. I just hope that when I export import my CIFS pool everything will operatie as before and I don't see same issues with CIFS.
Hopefully some discussions on OmniOS developer mailling list will emerge (or vmware's support) and this problem can be fixed..I don't think I'll be the only one since Solaris 11.2 works fine but time will tell.


VOBD.LOG: vobd.log - Pastebin.com

VMKERNEL.LOG : vmkernel.log - Pastebin.com
 
Last edited:

socra

Member
Feb 4, 2011
81
2
8
Most frustrating has been the logs...esxi complaining it has lost nfs share and omnios not saying anything.
1 thing I can try is creating a new zpool and then mounting it through ubuntu was unable to test this last time.
 

dswartz

Active Member
Jul 14, 2011
610
79
28
Keep in mind that may very well be a vmware issue. Like I said, 5.5 had this showstopper bug where NFS datastores would go APD. It took them like 3 months to fix it. Obviously NFS is not a priority there (not saying it should be, but...)
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Keep in mind that may very well be a vmware issue. Like I said, 5.5 had this showstopper bug where NFS datastores would go APD. It took them like 3 months to fix it. Obviously NFS is not a priority there (not saying it should be, but...)
@dswartz I can assure you that VMware is 'all-in' w/ supporting NFS (heck NFS 4.1 w/ pNFS/session trunking/etc. although implementations and alignment still lacks from client/svr frustratingly). NFSv3 on the other hand is and will continue to be a rock solid clustered/shared stg protocol that VMware supports for the foreseeable future. That 5.5 bug was lame and I concur it took a bit to fix, I think the vendors came right out and provided a patch/work-around and it was linked to in this thread pages back (at least the netapp fix/workaround). Forget if it addressed 'all vendors/NFS implementations) but it probably did.

This poor guy just has a stg gremlin' running arnd in his stg stack 'somehow'

I would venture to guess I can lay down all sorts of OmniOS VM w/ varying levels of releases 008/10/12/14 and simply mount a vdisk ZFS NFS share w/out issues on my 6.0 GA (unpatched as of yet Virtual Infrastructure) and fire up VM's on top of that NFS datastore off that stg virt appliance.

2cents
 
  • Like
Reactions: T_Minus

TechIsCool

Active Member
Feb 8, 2012
263
117
43
Clinton, WA
techiscool.com
@whitey I would venture to say the same thing for me as well. I am not sure how he is getting gremlins in his vm's but it feels like it. Basically we have narrowed it down to being either a hardware or hardware compatibility issue.

@nostradamus99 You have been SHA1 or at least MD5 checksumming your iso/media before you install correct? It would suck if you have a single flipped bit and all this time you been reinstalling from the same ISO that you downloaded.
 

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
@whitey I would venture to say the same thing for me as well. I am not sure how he is getting gremlins in his vm's but it feels like it. Basically we have narrowed it down to being either a hardware or hardware compatibility issue.

@nostradamus99 You have been SHA1 or at least MD5 checksumming your iso/media before you install correct? It would suck if you have a single flipped bit and all this time you been reinstalling from the same ISO that you downloaded.
We've certainly narrowed it down but I am still internally conflicted and torn to say hw issue if it works on sol 11 ga and OI, bet ZoL would work as well, to me feels more like sw level (bad bits, misbehaving nfs client/server stack, buggy code on some data/control plane etc. I swear I will go stand up every iteration of omnios in stg virt appliances and post results if wanted, hell even provide you my isos/screenshots if anyone thinks that is beneficial. Mine would be atock ga omni cause thats were I prefer to stay until I can get @gea to hook up a multi jode aio send/recv lic haha
 

gea

Well-Known Member
Dec 31, 2010
3,156
1,195
113
DE
Beside a hardware compatibility issue, some of the extra packages on the VM appliance may cause the problem. A basic OmnoOS setup without napp-it, with napp-it and then with Esxi tools can be compared. The difference to the VM is then only TLS to allow encrypted email.

Update
I have uploaded a new template 15_c where I have included boot environments for
- OmniOS 151014 initial (April)
- Update to 151014 July
- July update with napp-it
- July update with napp-it and vmware tools 6.000b

maybe one of the steps introduces the problem
15_b includes some SSL and TLS updates that I have not added to this release.
 
Last edited:

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Ok, had some play time/restless tonight after my Red Hat Satellite 6 training and labs that kept me up late. Hope this provides at least some 'valued' feedback w/ this off the cuff ZoL testing to ESXi 6.0 GA (build 2494585)

High level here is what I did (you can look at screenies to see more detail) First time doing this, I likes my ZFS on Illumos as stated before but figured I'd give it a whirl to check perf/supportability/stability w/in vSphere env as shared/clustered stg protocol and to see if I could derive a sanity checkpoint on your APD NFS woes.

Installed Ubuntu LTS 14.04.2 server (base w/ ssh) in a ESXi ZFS on Linux config (ZoL, this is simply a Linux VM acting as a storage appliance w/ ZFS packages installed)
Enabled ubuntu repo ppa:zfs-native/stable and installed ubuntu-zfs (native kernel space ZFS guts), created zpool, created ZFS filesystem, configured ZoL stoage appliance VM for NFS support to mount to vSphere (ESXi 6.0), mounted up to an ESXi 6.0 host, installed a win10 VM in 9 minutes w/ good throughput (70-80MB/sec and keep in mind this is inception at it's finest right, my AIO config w/ HBA passwd through to Illumos, install ZoL VM that NFS datastore, install ZoL, pass up another vdisk to ZoL stg appliance VM, ZFS on top of that shared out again w/ NFS...you get the idea...long and short AIO, ZFS, w/in ZFS if that doesn't make your head hurt).
 

Attachments

Last edited:
  • Like
Reactions: socra

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Sorry for the screenshot blast, thought you all may get some value out of this...seems pretty dang solid to me! No APD shinanigans at all and I'm sure w/ vanilla OmniOS acting as nested stg virtual appliances I'd have just as much success. I know Gea's gift to the world is solid (AKA the bomb) and certainly not trying to point any fingers here just wanted to peel back the layers of the onion some more and provide you some semi-relevant testing of the client/svr/zfs/nfs stack on ZoL being used as vSphere shared/clustered stg...I KNOW it works on Unix/Illumos as that's ALL I have EVER used in the past so I'm losing my mind here.

Please try Gea's update as well as a stripped/base OmniOS/ZoL config w/ nothing but a headless barebones (JEOS) setup that you can get to try to rule SOMETHING out. Simply replicating my test woule tell us SW/HW issues I'd assume.

I do believe that constitutes my 'first' informal tutorial/guide here on STH 'Whitey's ZoL howto' :-D wink wink Patrick
 

Attachments

Last edited:
  • Like
Reactions: socra

whitey

Moderator
Jun 30, 2014
2,766
868
113
41
Update: OK this is cool, sVMotioned my ZoL stg appliace VM up to my newly acquired (thx T/slimer you ROCK!) 320GB MLC fusion-io iodrive datastore (still NFS mounted while I do the win10 install to the ZoL NFS datastore in vSphere).

CRUSHIN IT on throughput although I am still at a roughly 10 min install (I blame M$) So it's not that the ZoL approach can only yield 70-80MB/s just that my poor AIO ZFS hybrid pool already running 30+ VM's was a poor choice to place the ZoL storage appliance so no limitations here as far as NFS throughput (pushing past 1G) just backend disk/iops bound.
 

Attachments

Last edited:
  • Like
Reactions: socra