What's hammering my pool?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Nemesis_001

Member
Dec 24, 2022
36
3
8
Something has been has been hammering the pool today with io for a few hours. Can't seem to determine what it is.
I ran iotop script, supposedly pid 261 is the culprit.
But that pid is the pool itself?
Confused. Anyone has any idea?

Thanks

1699776999661.png

1699776857571.png

1699777029727.png
 

gea

Well-Known Member
Dec 31, 2010
3,172
1,197
113
DE
The pool is 100% busy/waiting mainly because disk ..327d0 is 100% busy.
I would asume that this disk is bad/weak.

Is there something running like a scrub or realtime io monitoring in napp-it - stop it?

Can you remove this disk or set offline.
Is the pool io then ok again -> replace disk or run
an intensive surface check ex via wd data lifeguard that gives a final hd fail or may repair errors (use a hirens boot usb stick)
 

Nemesis_001

Member
Dec 24, 2022
36
3
8
Thanks.
I'll have a look at that disk when I get home.
Scrub usually starts Sat evening, but it was already complete.
 

Nemesis_001

Member
Dec 24, 2022
36
3
8
Weird, now it's back to normal, disks idling.
Running a copy from the pool (read) all disks seem to be similarly busy more or less saturating the 1gbit link.

1699808944326.png

Testing writes at 1gbit seems similar as well.

1699809419697.png

Looking at the job log again more carefully, something seems strange.
Nov 5th scrub job is reported on Nov 11. Each job shows the result of the week before.
But surely scrub doesnt take that long, and is reported to finish within 13h.

Any idea?

1699809136425.png

Really not sure what caused those reads this morning.
 

gea

Well-Known Member
Dec 31, 2010
3,172
1,197
113
DE
Normally load should be spread quite even between all disks of a pool. If a aingle disk is much worse than the others regarding wait or busy this affects whole raid performance or io situation. Even a simple action like a scrub can last days as it seems here if a single disk behaves bad. This is like a chain where the weakest element defines overall results.

If the problem comes back do an intensive surface check of the affected disk.
 
  • Like
Reactions: Nemesis_001

Nemesis_001

Member
Dec 24, 2022
36
3
8
Meanwhile, coincidentally, my CPU has died at that time. PCI Express defective.
Wonder if that could have caused the issue to begin with.
 

gea

Well-Known Member
Dec 31, 2010
3,172
1,197
113
DE
Everything dies sometimes, simply be prepared that this is not a disaster.
 

Nemesis_001

Member
Dec 24, 2022
36
3
8
Yeah.
The thing is really robust though.
I really abused it, trying to firstly disconnect and replace, LOG, HBA, motherboard before determining it's the CPU.
Plus, for some reason the SLOG/L2ARC VMDKs became unreadable all of the sudden.
Then after the new CPU arrived, spinner array wouldnt come up making noises. After it wouldnt stop for an hour, I disconnected all disks, and plugged them back in one by one to see what makes the noise.

After the initial panic of seeing the array faulted, forcing it to import without the LOG, came back up like a champ.