Whole pool degragated after replaceing one disk in pool

kaszanka

New Member
Jul 29, 2021
1
0
1
Hello,
Yesterday i had /dev/sdf disk in DEGRAGATED - too many errors state.
After replaceing hdd ...

What i done is:
zpool offline tank /dev/sdf
zpool clear tank /dev/sdf
zpool status
zpool online tank /dev/sdf
zpool replace tank /dev/sdf

after that sdf was resilvering

after some time when I check status (zpool status -v) more disks enter DEGRAGATED - too many errors state
some time after even more disks enter into DEGRAGATED - too many errors state

Now i have this :

user@backup4:/usersfs/user# zpool status -v
pool: tank
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Jul 29 11:27:20 2021
4.14T scanned out of 42.5T at 61.1M/s, 182h40m to go
223G resilvered, 9.75% done
config:

NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 20
raidz3-0 DEGRADED 0 0 105
sdd DEGRADED 0 0 0 too many errors
sde DEGRADED 0 0 0 too many errors
replacing-2 DEGRADED 0 0 0
old UNAVAIL 0 0 0
sdf ONLINE 0 0 0 (resilvering)
sdg DEGRADED 0 0 0 too many errors
sdh DEGRADED 0 0 0 too many errors
sdi DEGRADED 42 0 0 too many errors (resilvering)
sdj DEGRADED 18 0 0 too many errors
sdk DEGRADED 81 0 0 too many errors (resilvering)
sdl DEGRADED 0 0 0 too many errors
sdm DEGRADED 0 0 0 too many errors
sdn FAULTED 14 0 0 too many errors
sdo DEGRADED 0 0 0 too many errors
sdp DEGRADED 0 0 0 too many errors
sdq DEGRADED 19 0 0 too many errors
sdr DEGRADED 0 0 0 too many errors
sds DEGRADED 0 0 0 too many errors
sdt DEGRADED 0 0 0 too many errors
sdu FAULTED 14 0 0 too many errors

and list of 15 Permanent errors

what sgould I do and what have I done wrong?
 
Last edited:

pricklypunter

Well-Known Member
Nov 10, 2015
1,661
485
83
Canada
If I were to make an educated guess, your 1 disk pulls just enough current to screw up your power supply to the array. Try a larger power supply temporarily, or if you are able, try a couple of smaller ones that between them can cope with the load :)

Change it out first, before you run any spurious zfs commands or do anything else that might risk your data etc
 

gea

Well-Known Member
Dec 31, 2010
2,707
932
113
DE
Too many errors on many disks is usually a global hardware problem like bad RAM, bad PSU, overheating, bad HBA/expander etc.