Something has been has been hammering the pool today with io for a few hours. Can't seem to determine what it is.
I ran iotop script, supposedly pid 261 is the culprit.
But that pid is the pool itself?
Confused. Anyone has any idea?
The pool is 100% busy/waiting mainly because disk ..327d0 is 100% busy.
I would asume that this disk is bad/weak.
Is there something running like a scrub or realtime io monitoring in napp-it - stop it?
Can you remove this disk or set offline.
Is the pool io then ok again -> replace disk or run
an intensive surface check ex via wd data lifeguard that gives a final hd fail or may repair errors (use a hirens boot usb stick)
Weird, now it's back to normal, disks idling.
Running a copy from the pool (read) all disks seem to be similarly busy more or less saturating the 1gbit link.
Testing writes at 1gbit seems similar as well.
Looking at the job log again more carefully, something seems strange.
Nov 5th scrub job is reported on Nov 11. Each job shows the result of the week before.
But surely scrub doesnt take that long, and is reported to finish within 13h.
Really not sure what caused those reads this morning.
Normally load should be spread quite even between all disks of a pool. If a aingle disk is much worse than the others regarding wait or busy this affects whole raid performance or io situation. Even a simple action like a scrub can last days as it seems here if a single disk behaves bad. This is like a chain where the weakest element defines overall results.
If the problem comes back do an intensive surface check of the affected disk.
The thing is really robust though.
I really abused it, trying to firstly disconnect and replace, LOG, HBA, motherboard before determining it's the CPU.
Plus, for some reason the SLOG/L2ARC VMDKs became unreadable all of the sudden.
Then after the new CPU arrived, spinner array wouldnt come up making noises. After it wouldnt stop for an hour, I disconnected all disks, and plugged them back in one by one to see what makes the noise.
After the initial panic of seeing the array faulted, forcing it to import without the LOG, came back up like a champ.