Help with scrub error ("unrecoverable error") [Nexenta Core, zfs]

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

nle

Member
Oct 24, 2012
204
11
18
Help with scrub error ("unrecoverable error") [Nexenta Core, ZFS, napp-it]

Hi all, I woke up with this in my inbox today:

Code:
-----------------------------------------------------------
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 10.9T 3.49T 7.38T 32% 1.30x ONLINE -
syspool 464G 51.3G 413G 11% 1.00x ONLINE -
-----------------------------------------------------------


pool: datapool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scan: scrub repaired 162K in 14h38m with 0 errors on Sun Feb 17 09:38:09 2013
config:

NAME STATE READ WRITE CKSUM
datapool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 5
c2t1d0 ONLINE 0 0 2
c2t2d0 ONLINE 0 0 3
c2t3d0 ONLINE 0 0 2
c2t4d0 ONLINE 0 0 3
c2t5d0 ONLINE 0 0 2
cache
c2t9d0 ONLINE 0 0 0
spares
c2t7d0 AVAIL

errors: No known data errors
Code:
uname -a
SunOS xxxxx 5.11 NexentaOS_134f i86pc i386 i86pc Solaris
As far as I can see, everything seems fine, logs, smartstatus etc, nothing sticks out. So I cant really identify the problem.

When I google the problem people usually have problems with one drive, here it looks like all the drives in the pool has some sort of problem (ref. checksum).

Any advice on this?
 
Last edited:

gea

Well-Known Member
Dec 31, 2010
3,177
1,199
113
DE
You can use zpool status -v
to check affected files.

Then you may check power, HBA, RAM etc
Only these parts may affect all disks.

Do have a backup?
 

nle

Member
Oct 24, 2012
204
11
18
Thank you for your answer. :)

Yes, I do have a backup (offsite [100% updated] and locally [a few weeks old]). I'll update the backup tomorrow.

I cleared the error, and I am doing another scrub to see if the problem persists.

Nothing have been done with the server in a while, so I am unsure what could be causing it. I run a scrub every friday afternoon, and get a status mail every monday morning. I have no physical access to the server atm, unless I get on a plane a fly there (the server is used in a small design studio in Oslo, and I am currently working from London).

I'll report back after the scrub (tomorrow some time).

EDIT:
Hopefully tomorrow, the scrub estimated time take alot longer time than anticipated – maybe a not so good sign. But it could also be that deduplication was previous enabled.

I did an dd test while the scrub is active and I got this
Code:
49152000000 bytes (49 GB) copied, 47.3839 seconds, 1.0 GB/s
so as far as I know the performance should be fine.
 
Last edited:

nle

Member
Oct 24, 2012
204
11
18
So the scrub completed without errors, and I cant seem to find anything wrong any other place.

Code:
 scan: scrub repaired 0 in 14h45m with 0 errors on Tue Feb 19 08:06:26 2013
Speedtests:
Code:
$ time sh -c "dd if=/dev/zero of=100MB.iso bs=1000k count=10000 && sync"
10000+0 records in
10000+0 records out
10240000000 bytes (10 GB) copied, 3.24645 seconds, 3.2 GB/s

real    0m5.160s
user    0m0.012s
sys     0m3.046s
Code:
$ time sh -c "dd if=/dev/zero of=100MB.iso bs=8k count=1250000 && sync"
1250000+0 records in
1250000+0 records out
10240000000 bytes (10 GB) copied, 9.21884 seconds, 1.1 GB/s

real    0m10.920s
user    0m0.364s
sys     0m8.706s
 
Last edited: