What's hammering my pool?

Nemesis_001 · Nov 12, 2023

Something has been has been hammering the pool today with io for a few hours. Can't seem to determine what it is.
I ran iotop script, supposedly pid 261 is the culprit.
But that pid is the pool itself?
Confused. Anyone has any idea?

Thanks

gea · Nov 12, 2023

The pool is 100% busy/waiting mainly because disk ..327d0 is 100% busy.
I would asume that this disk is bad/weak.

Is there something running like a scrub or realtime io monitoring in napp-it - stop it?

Can you remove this disk or set offline.
Is the pool io then ok again -> replace disk or run
an intensive surface check ex via wd data lifeguard that gives a final hd fail or may repair errors (use a hirens boot usb stick)

Nemesis_001 · Nov 12, 2023

Thanks.
I'll have a look at that disk when I get home.
Scrub usually starts Sat evening, but it was already complete.

Nemesis_001 · Nov 12, 2023

Weird, now it's back to normal, disks idling.
Running a copy from the pool (read) all disks seem to be similarly busy more or less saturating the 1gbit link.

Testing writes at 1gbit seems similar as well.

Looking at the job log again more carefully, something seems strange.
Nov 5th scrub job is reported on Nov 11. Each job shows the result of the week before.
But surely scrub doesnt take that long, and is reported to finish within 13h.

Any idea?

Really not sure what caused those reads this morning.

gea · Nov 12, 2023

Normally load should be spread quite even between all disks of a pool. If a aingle disk is much worse than the others regarding wait or busy this affects whole raid performance or io situation. Even a simple action like a scrub can last days as it seems here if a single disk behaves bad. This is like a chain where the weakest element defines overall results.

If the problem comes back do an intensive surface check of the affected disk.

Nemesis_001 · Dec 4, 2023

Meanwhile, coincidentally, my CPU has died at that time. PCI Express defective.
Wonder if that could have caused the issue to begin with.

Nemesis_001 · Dec 19, 2023

Well, disk died as well. But not the one that was busy, the one with the pending sector count. Rip.

gea · Dec 20, 2023

Everything dies sometimes, simply be prepared that this is not a disaster.

Nemesis_001 · Dec 20, 2023

Yeah.
The thing is really robust though.
I really abused it, trying to firstly disconnect and replace, LOG, HBA, motherboard before determining it's the CPU.
Plus, for some reason the SLOG/L2ARC VMDKs became unreadable all of the sudden.
Then after the new CPU arrived, spinner array wouldnt come up making noises. After it wouldnt stop for an hour, I disconnected all disks, and plugged them back in one by one to see what makes the noise.

After the initial panic of seeing the array faulted, forcing it to import without the LOG, came back up like a champ.

Search

What's hammering my pool?

Nemesis_001

Member

gea

Well-Known Member

Nemesis_001

Member

Nemesis_001

Member

gea

Well-Known Member

Nemesis_001

Member

Nemesis_001

Member

gea

Well-Known Member

Nemesis_001

Member