unable to get any pool information or function Napp IT

epicurean · Dec 12, 2017

my napp it VM recently started to behave strangely but either going into maintenance mode, or when it does boot - my vsphere shows that it does not have VM tools installed (which is incorrect)
This is for the latest VM download

And when I am in the console, whenever function under "Pool", I get this error
"errors: No known data errors --- _lib _lib/illumos/get-disk.pl &get_mydisk_format 20 <- admin.pl (eval) 833 <- admin.pl &load_lib 515 <- admin.pl &get_preloads 272 ---

exe(get-disk.pl 86): format"

I was hoping to just export the pools and start a new VM, but I cannot even do that

Urgent help needed!. Much thanks

nle · Dec 13, 2017

I'm pretty sure you don't have to export the pool to able to import it on a new system. Even though it is best practice to do so.

Google a little bit about it, and I'm pretty confident you'll get your answer. [emoji4]

Sent from my iPhone using Tapatalk Pro

gea · Dec 13, 2017

It seems napp-it is hanging on a format command to detect all current disks. You can crosscheck this by entering format at console (use ctrl-c to abort after a successful listing).

This mostly happens due a bad disk or controller.
First check System > Logs (/var/adm/messages) for reasons or (in pass-through mode) remove all disks and try to boot. If this works add disk by disk waiting for napp-it to discover the disk unless the system hangs again due a bad disk.

epicurean · Dec 13, 2017

Hi Gea,
Thank you for responding.
this is what I get when entering format at console
"Monitor-Extension: Command Log --- _lib _lib _lib/interface.pl &console 408 <- admin.pl &menu 267 ---

interface.pl 894# format
--- _lib _lib _lib/interface.pl &console 408 ---

p1
--- _lib _lib _lib/interface.pl &console 408 <- admin.pl &menu 267 ---

main, /_lib/interface.pl, line 926
exe: format"

Could you explain in detail( I am not familiar) how to remedy this?

epicurean · Dec 13, 2017

nle said:
I'm pretty sure you don't have to export the pool to able to import it on a new system. Even though it is best practice to do so.

I did try that with a brand new VM, I could not even get any of the usual menus to show in the pool section

gea · Dec 14, 2017

Format should return a list of all currently detected disks.
In your case the system hangs after the format (this is also why some disk related menus in napp-it are not working).

So next check system logs or remove all data disks and check if the appliance is working then and attach disk by disk and call format or menu disks to check if the inserted disk is detected (and the system not hanging)

epicurean · Dec 14, 2017

Hi Gea,
As you suggested, we took away one of the passthrough HBA which contained 8 drives . I passthrough 3 HBAs to the Napp It VM
Under Pool, this is what I got:

pool: Hitachi
state: DEGRADED
status: One or more devices has been REMOVED by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 0h1m with 0 errors on Thu Dec 14 10:51:37 2017
config:

NAME STATE READ WRITE CKSUM CAP Product /napp-it IOstat mess SN/LUN
Hitachi DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
c8t5000CCA025B48FA5d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK69BGB
c8t5000CCA0214A4F19d0 REMOVED 0 0 0
c8t5000CCA025ADEA41d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK2N1XB
c8t5000CCA025AD7C49d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK2DRWB
c8t5000CCA025AE1D45d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK2SG7B
c8t5000CCA025A49D9Dd0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRJXJJ1B
c8t5000CCA025A83D11d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRJZJ8TB
c8t5000CCA025AED335d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK34KVB
raidz2-1 ONLINE 0 0 0
c8t5000CCA025A82C75d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRJZH5HB
c8t5000CCA025AA0D5Dd0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK0J6AB
c8t5000CCA025AE1D65d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK2SGHB
c8t5000CCA025ADF3A5d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK2NP9B
c8t5000CCA025AD790Dd0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK2DJ6B
c8t5000CCA0214A74A9d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PNW9YD1B
c8t5000CCA025A83061d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRJZHELB
c8t5000CCA025AE1899d0 ONLINE 0 0 0 450.1 GB HUC106045CSS601 S:0 H:0 T:0 PRK2S4LB

But I seem to see all the lights for these drives to be on. What can I do to remedy this?

Rand__ · Dec 15, 2017

So you took away one of the HBAs and now the format command works?
O/c the pool is degraded if 8 drives are missing

So if you detach the drives from the currently detached hba and reattach it (only the hba) does format still work?
Then attach disk by disk find the one that breaks the format command...

epicurean · Dec 15, 2017

thanks @Rand_ . Oh boy, this will take me a looong time.

Rand__ · Dec 16, 2017

Well the usual optimization process o/c is applicable - add 4 drives at once and check, then split in 2 and then either one
Shouldnt be too long if you can monitor the insert and run format after it

gea · Dec 16, 2017

On current hardware a hot inserted disk is detected by the OS within a few seconds so you can remove all disks and insert one by one. After inserting the disk try to detect the disk via some napp-it Disks > Details requests (or format at console). A bad or blocking disk will not shown or hinder the listing of the last disks.

epicurean · Dec 16, 2017

Problem is this pool has 16 HGST sas drives, and I dont know which one the "removed" one is. And all 16 drives on 2 HBAs passed through to a Napp It VM. The other HBA(which was removed) is a Z2 with 8 drives( a different pool), and now one of those 4TB seagate drives has "too many errors" and needs to be replaced too.

Need to fix the HGST pool first. so add a hot drive and it replace the "removed" one?

Rand__ · Dec 17, 2017

That sounds a little bit like you forgot to take an inventory before adding the drives to the chassis ... and maybe did not do a drive to slot map either?

Just kidding o/c but that helps in such cases.

Ok, so several options

If your chassis/backplane supports locate command you could run that on all working drives leaving the dead one unblinking
If your chassis has activity indication run a longer activity on all working drives leaving the dead one unblinking
Try to isolate the drive -
- do you have any distribution pattern that you followed during pool assignment? Eg all drives from HBA1 are in Pool1 and are in Slot 0-7=> Then you just need to identify (by pulling another drive) on which hba/group you are working
- Else pull a HBA to identify which block of drives is then missing until you have the right one
- Or list all 15 serial numbers, power the box off and compare the numbers until you find the one you don't have on your list

epicurean · Dec 17, 2017

Rand__ said:
That sounds a little bit like you forgot to take an inventory before adding the drives to the chassis ... and maybe did not do a drive to slot map either?

Guilty as charged..
Thankfully I did organise the drives according to the HBAs they were in and it was pretty easy to identify the errand drive. I need to label my drive cages!

one pool down. another to go.
Thanks again everyone for the advice!

Search

unable to get any pool information or function Napp IT

epicurean

Active Member

nle

Member

gea

Well-Known Member

epicurean

Active Member

epicurean

Active Member

gea

Well-Known Member

epicurean

Active Member

Rand__

Well-Known Member

epicurean

Active Member

Rand__

Well-Known Member

gea

Well-Known Member

epicurean

Active Member

Rand__

Well-Known Member

epicurean

Active Member