Strange MDADM RAID 6 behaviour

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Mashie

Member
Jun 26, 2020
37
9
8
I have Xubuntu 20.04 with a 10x10TB MDADM RAID 6 array.

It is working great except for the first time I do a large >7GB file write to the array after a reboot. It will hang at different places in the transfer each time such as 2.5GB/3.6GB/4.3GB/5.6GB. When it hangs the same 5 disks will be busy with activity for 3-4 minutes before everyting resumes and every consecutive file transfer is working as normal until the next reboot.

Code:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdb1[3] sdc1[0] sdd1[1] sde1[2] sdi1[9] sdk1[4] sdh1[5] sdg1[7] sdf1[8] sda1[6]
      78129610752 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
      bitmap: 0/73 pages [0KB], 65536KB chunk

unused devices: <none>
Linux 5.4.0-88-generic (IONE)   06/10/21        _x86_64_        (12 CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.08    0.00    0.75    5.25    0.00   92.92

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
md0             105.00       420.00         0.00         0.00        420          0          0
nvme0n1           0.00         0.00         0.00         0.00          0          0          0
sda              21.00        84.00         0.00         0.00         84          0          0    (disk 7)
sdb              21.00        84.00         0.00         0.00         84          0          0    (disk 1)
sdc               0.00         0.00         0.00         0.00          0          0          0
sdd              21.00        84.00         0.00         0.00         84          0          0    (disk 3)
sde               0.00         0.00         0.00         0.00          0          0          0
sdf              21.00        84.00         0.00         0.00         84          0          0    (disk 9)
sdg              21.00        84.00         0.00         0.00         84          0          0    (disk 10)
sdh               0.00         0.00         0.00         0.00          0          0          0
sdi               0.00         0.00         0.00         0.00          0          0          0
sdj               0.00         0.00         0.00         0.00          0          0          0
sdk               0.00         0.00         0.00         0.00          0          0          0


Syslog 120s after the transfer hangs:
Code:
Oct  6 21:32:58 IONE kernel: [  605.733369] INFO: task jbd2/md0-8:1158 blocked for more than 120 seconds.
Oct  6 21:32:58 IONE kernel: [  605.733375]       Tainted: P           OE     5.4.0-88-generic #99-Ubuntu
Oct  6 21:32:58 IONE kernel: [  605.733377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  6 21:32:58 IONE kernel: [  605.733380] jbd2/md0-8      D    0  1158      2 0x80004000
Oct  6 21:32:58 IONE kernel: [  605.733383] Call Trace:
Oct  6 21:32:58 IONE kernel: [  605.733395]  __schedule+0x2e3/0x740
Oct  6 21:32:58 IONE kernel: [  605.733402]  ? __wake_up_common_lock+0x8a/0xc0
Oct  6 21:32:58 IONE kernel: [  605.733405]  schedule+0x42/0xb0
Oct  6 21:32:58 IONE kernel: [  605.733411]  jbd2_journal_commit_transaction+0x258/0x17f0
Oct  6 21:32:58 IONE kernel: [  605.733415]  ? __switch_to_asm+0x40/0x70
Oct  6 21:32:58 IONE kernel: [  605.733416]  ? __switch_to_asm+0x34/0x70
Oct  6 21:32:58 IONE kernel: [  605.733418]  ? __switch_to_asm+0x40/0x70
Oct  6 21:32:58 IONE kernel: [  605.733420]  ? __switch_to_asm+0x34/0x70
Oct  6 21:32:58 IONE kernel: [  605.733422]  ? __switch_to_asm+0x40/0x70
Oct  6 21:32:58 IONE kernel: [  605.733424]  ? __switch_to_asm+0x34/0x70
Oct  6 21:32:58 IONE kernel: [  605.733426]  ? __switch_to_asm+0x40/0x70
Oct  6 21:32:58 IONE kernel: [  605.733428]  ? __switch_to_asm+0x40/0x70
Oct  6 21:32:58 IONE kernel: [  605.733430]  ? __switch_to_asm+0x34/0x70
Oct  6 21:32:58 IONE kernel: [  605.733432]  ? __switch_to_asm+0x34/0x70
Oct  6 21:32:58 IONE kernel: [  605.733435]  ? wait_woken+0x80/0x80
Oct  6 21:32:58 IONE kernel: [  605.733444]  ? try_to_del_timer_sync+0x54/0x80
Oct  6 21:32:58 IONE kernel: [  605.733449]  kjournald2+0xb6/0x280
Oct  6 21:32:58 IONE kernel: [  605.733452]  ? wait_woken+0x80/0x80
Oct  6 21:32:58 IONE kernel: [  605.733458]  kthread+0x104/0x140
Oct  6 21:32:58 IONE kernel: [  605.733461]  ? commit_timeout+0x20/0x20
Oct  6 21:32:58 IONE kernel: [  605.733464]  ? kthread_park+0x90/0x90
Oct  6 21:32:58 IONE kernel: [  605.733466]  ret_from_fork+0x35/0x40
Oct  6 21:32:58 IONE kernel: [  605.733497] INFO: task pool-Thunar:4920 blocked for more than 120 seconds.
Oct  6 21:32:58 IONE kernel: [  605.733499]       Tainted: P           OE     5.4.0-88-generic #99-Ubuntu
Oct  6 21:32:58 IONE kernel: [  605.733501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  6 21:32:58 IONE kernel: [  605.733503] pool-Thunar     D    0  4920   2573 0x00000000
Oct  6 21:32:58 IONE kernel: [  605.733505] Call Trace:
Oct  6 21:32:58 IONE kernel: [  605.733509]  __schedule+0x2e3/0x740
Oct  6 21:32:58 IONE kernel: [  605.733513]  schedule+0x42/0xb0
Oct  6 21:32:58 IONE kernel: [  605.733518]  rwsem_down_read_slowpath+0x16c/0x4a0
Oct  6 21:32:58 IONE kernel: [  605.733523]  down_read+0x85/0xa0
Oct  6 21:32:58 IONE kernel: [  605.733527]  ext4_da_map_blocks.constprop.0+0x2d3/0x380
Oct  6 21:32:58 IONE kernel: [  605.733530]  ext4_da_get_block_prep+0x55/0xe0
Oct  6 21:32:58 IONE kernel: [  605.733533]  ext4_block_write_begin+0x157/0x520
Oct  6 21:32:58 IONE kernel: [  605.733536]  ? ext4_da_map_blocks.constprop.0+0x380/0x380
Oct  6 21:32:58 IONE kernel: [  605.733540]  ? __ext4_journal_start_sb+0x69/0x120
Oct  6 21:32:58 IONE kernel: [  605.733544]  ext4_da_write_begin+0x1cf/0x460
Oct  6 21:32:58 IONE kernel: [  605.733551]  generic_perform_write+0xc2/0x1c0
Oct  6 21:32:58 IONE kernel: [  605.733556]  __generic_file_write_iter+0x107/0x1d0
Oct  6 21:32:58 IONE kernel: [  605.733561]  ext4_file_write_iter+0xb9/0x360
Oct  6 21:32:58 IONE kernel: [  605.733565]  ? common_file_perm+0x5e/0x110
Oct  6 21:32:58 IONE kernel: [  605.733572]  do_iter_readv_writev+0x14f/0x1d0
Oct  6 21:32:58 IONE kernel: [  605.733575]  do_iter_write+0x84/0x1a0
Oct  6 21:32:58 IONE kernel: [  605.733577]  vfs_iter_write+0x19/0x30
Oct  6 21:32:58 IONE kernel: [  605.733584]  iter_file_splice_write+0x24d/0x390
Oct  6 21:32:58 IONE kernel: [  605.733588]  do_splice+0x23f/0x650
Oct  6 21:32:58 IONE kernel: [  605.733593]  __x64_sys_splice+0x131/0x150
Oct  6 21:32:58 IONE kernel: [  605.733599]  do_syscall_64+0x57/0x190
Oct  6 21:32:58 IONE kernel: [  605.733602]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct  6 21:32:58 IONE kernel: [  605.733605] RIP: 0033:0x7fc61952a7f3
Oct  6 21:32:58 IONE kernel: [  605.733613] Code: Bad RIP value.
Oct  6 21:32:58 IONE kernel: [  605.733615] RSP: 002b:00007fc616bbd700 EFLAGS: 00000293 ORIG_RAX: 0000000000000113
Oct  6 21:32:58 IONE kernel: [  605.733617] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007fc61952a7f3
Oct  6 21:32:58 IONE kernel: [  605.733619] RDX: 0000000000000016 RSI: 0000000000000000 RDI: 0000000000000017
Oct  6 21:32:58 IONE kernel: [  605.733620] RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000004
Oct  6 21:32:58 IONE kernel: [  605.733622] R10: 00007fc616bbd840 R11: 0000000000000293 R12: 0000000000000016
Oct  6 21:32:58 IONE kernel: [  605.733623] R13: 0000000000000000 R14: 0000000000000017 R15: 00007fc616bbd850
Any idea what is going on? SMART isn't showing errors on any of the disks.
 

Mashie

Member
Jun 26, 2020
37
9
8
I just tried to update to the latest Ubuntu kernel 5.11.0-37 but it made no difference.

Code:
Oct  7 17:06:45 IONE kernel: [  363.577784] INFO: task jbd2/md0-8:1166 blocked for more than 120 seconds.
Oct  7 17:06:45 IONE kernel: [  363.577796]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct  7 17:06:45 IONE kernel: [  363.577800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  7 17:06:45 IONE kernel: [  363.577803] task:jbd2/md0-8      state:D stack:    0 pid: 1166 ppid:     2 flags:0x00004000
Oct  7 17:06:45 IONE kernel: [  363.577811] Call Trace:
Oct  7 17:06:45 IONE kernel: [  363.577817]  __schedule+0x44c/0x8a0
Oct  7 17:06:45 IONE kernel: [  363.577826]  ? __irq_work_queue_local+0x48/0x60
Oct  7 17:06:45 IONE kernel: [  363.577832]  schedule+0x4f/0xc0
Oct  7 17:06:45 IONE kernel: [  363.577837]  jbd2_journal_commit_transaction+0x300/0x18f0
Oct  7 17:06:45 IONE kernel: [  363.577844]  ? dequeue_entity+0xd8/0x410
Oct  7 17:06:45 IONE kernel: [  363.577854]  ? wait_woken+0x80/0x80
Oct  7 17:06:45 IONE kernel: [  363.577861]  ? try_to_del_timer_sync+0x54/0x80
Oct  7 17:06:45 IONE kernel: [  363.577869]  kjournald2+0xb6/0x280
Oct  7 17:06:45 IONE kernel: [  363.577877]  ? wait_woken+0x80/0x80
Oct  7 17:06:45 IONE kernel: [  363.577881]  ? commit_timeout+0x20/0x20
Oct  7 17:06:45 IONE kernel: [  363.577886]  kthread+0x12b/0x150
Oct  7 17:06:45 IONE kernel: [  363.577895]  ? set_kthread_struct+0x40/0x40
Oct  7 17:06:45 IONE kernel: [  363.577901]  ret_from_fork+0x22/0x30
Oct  7 17:06:45 IONE kernel: [  363.577935] INFO: task pool-Thunar:4959 blocked for more than 120 seconds.
Oct  7 17:06:45 IONE kernel: [  363.577940]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct  7 17:06:45 IONE kernel: [  363.577943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  7 17:06:45 IONE kernel: [  363.577945] task:pool-Thunar     state:D stack:    0 pid: 4959 ppid:  2493 flags:0x00000000
Oct  7 17:06:45 IONE kernel: [  363.577951] Call Trace:
Oct  7 17:06:45 IONE kernel: [  363.577953]  __schedule+0x44c/0x8a0
Oct  7 17:06:45 IONE kernel: [  363.577958]  ? __mod_memcg_lruvec_state+0x25/0xe0
Oct  7 17:06:45 IONE kernel: [  363.577969]  schedule+0x4f/0xc0
Oct  7 17:06:45 IONE kernel: [  363.577973]  rwsem_down_read_slowpath+0x184/0x3c0
Oct  7 17:06:45 IONE kernel: [  363.577982]  down_read+0x43/0xa0
Oct  7 17:06:45 IONE kernel: [  363.577986]  ext4_da_map_blocks.constprop.0+0x2dc/0x380
Oct  7 17:06:45 IONE kernel: [  363.577994]  ext4_da_get_block_prep+0x55/0xe0
Oct  7 17:06:45 IONE kernel: [  363.577998]  ext4_block_write_begin+0x14a/0x530
Oct  7 17:06:45 IONE kernel: [  363.578003]  ? ext4_da_map_blocks.constprop.0+0x380/0x380
Oct  7 17:06:45 IONE kernel: [  363.578007]  ? __ext4_journal_start_sb+0x106/0x120
Oct  7 17:06:45 IONE kernel: [  363.578015]  ext4_da_write_begin+0x1de/0x460
Oct  7 17:06:45 IONE kernel: [  363.578021]  generic_perform_write+0xc2/0x1c0
Oct  7 17:06:45 IONE kernel: [  363.578031]  ext4_buffered_write_iter+0x98/0x150
Oct  7 17:06:45 IONE kernel: [  363.578039]  ext4_file_write_iter+0x53/0x220
Oct  7 17:06:45 IONE kernel: [  363.578044]  ? common_file_perm+0x72/0x170
Oct  7 17:06:45 IONE kernel: [  363.578053]  do_iter_readv_writev+0x152/0x1b0
Oct  7 17:06:45 IONE kernel: [  363.578062]  do_iter_write+0x88/0x1c0
Oct  7 17:06:45 IONE kernel: [  363.578068]  vfs_iter_write+0x19/0x30
Oct  7 17:06:45 IONE kernel: [  363.578074]  iter_file_splice_write+0x276/0x3c0
Oct  7 17:06:45 IONE kernel: [  363.578084]  do_splice_from+0x21/0x40
Oct  7 17:06:45 IONE kernel: [  363.578089]  do_splice+0x2e8/0x650
Oct  7 17:06:45 IONE kernel: [  363.578095]  __do_splice+0xde/0x160
Oct  7 17:06:45 IONE kernel: [  363.578101]  __x64_sys_splice+0x99/0x110
Oct  7 17:06:45 IONE kernel: [  363.578107]  do_syscall_64+0x38/0x90
Oct  7 17:06:45 IONE kernel: [  363.578112]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct  7 17:06:45 IONE kernel: [  363.578120] RIP: 0033:0x7f0573f837f3
Oct  7 17:06:45 IONE kernel: [  363.578124] RSP: 002b:00007f0571616700 EFLAGS: 00000293 ORIG_RAX: 0000000000000113
Oct  7 17:06:45 IONE kernel: [  363.578130] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f0573f837f3
Oct  7 17:06:45 IONE kernel: [  363.578133] RDX: 0000000000000016 RSI: 0000000000000000 RDI: 0000000000000017
Oct  7 17:06:45 IONE kernel: [  363.578136] RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000004
Oct  7 17:06:45 IONE kernel: [  363.578139] R10: 00007f0571616840 R11: 0000000000000293 R12: 0000000000000016
Oct  7 17:06:45 IONE kernel: [  363.578142] R13: 0000000000000000 R14: 0000000000000017 R15: 00007f0571616850
 

UhClem

just another Bozo on the bus
Jun 26, 2012
435
249
43
NH, USA
First thing I'd do is make sure that /dev/md0 is clean and consistent.
Code:
umount /dev/md0
e2fsck -f -n /dev/md0
[ If cool, then (re-)mount it. ]

Also, a look at that message file, leading up to the task timeout, might shed light.
 
Last edited:

MrCalvin

IT consultant, Denmark
Aug 22, 2016
87
15
8
51
Denmark
www.wit.dk
Some benchmarks point out that R5 & R6 performs best with a chunk-size of 64K, your have 512K, but that might might depend on use-case (louwrentius.com) and it shouldn't result in stalling anyway.

What about stripe-cache, try to set it to 32MB if not already (echo 32768 > /sys/block/md0/md/stripe_cache_size).

Are you using your array as storage for VMs? I've never been able to NOT get disk-stall like yours when using R5 as virtualization storage. It seems R5/R6 performance is depending on optimal host-page-cache use, and somehow KVM/QEMU prevent optimal use.
 

Mashie

Member
Jun 26, 2020
37
9
8
First thing I'd do is make sure that /dev/md0 is clean and consistent.
Code:
umount /dev/md0
e2fsck -f -n /dev/md0
[ If cool, then (re-)mount it. ]

Also, a look at that message file, leading up to the task timeout, might shed light.
I will run that check later as the system is in use right now. As for the log entry, there was nothing at all going on until the array stalled and this block of text appeared. It will stall regardless what application is doing the file writing Thunar for file copying or MKVToolNix if I do a remux.

Some benchmarks point out that R5 & R6 performs best with a chunk-size of 64K, your have 512K, but that might might depend on use-case (louwrentius.com) and it shouldn't result in stalling anyway.

What about stripe-cache, try to set it to 32MB if not already (echo 32768 > /sys/block/md0/md/stripe_cache_size).

Are you using your array as storage for VMs? I've never been able to NOT get disk-stall like yours when using R5 as virtualization storage. It seems R5/R6 performance is depending on optimal host-page-cache use, and somehow KVM/QEMU prevent optimal use.
The array is mainly used for storing a large media library. Any VMs are running from a SSD.

The performance is fine once the stalling has ended. It's just that it always will stall first thing after a reboot and that is driving me up the wall at the moment.

SMR drives?
It is Seagate IronWolf non-Pro CMR drives. I know you only are supposed to have a max of 8 of those but I can't imagine that would cause this issue. People are successfully using chucked drives all over the place without problems.
 

Mashie

Member
Jun 26, 2020
37
9
8
First thing I'd do is make sure that /dev/md0 is clean and consistent.
Code:
umount /dev/md0
e2fsck -f -n /dev/md0
I ran the command and this is the output:
Code:
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 23998/1220775936 files (3.8% non-contiguous), 17772851170/19532402688 blocks
 

Stephan

Well-Known Member
Apr 21, 2017
920
698
93
Germany
Power supply sufficient? What controller are those drives attached to? You should never see that kernel message.
 

MrCalvin

IT consultant, Denmark
Aug 22, 2016
87
15
8
51
Denmark
www.wit.dk
The kernel message "/md0-8:1166 blocked for more than 120 seconds." smell like it's related to the STCERC time-out problem.
"When a disk experince problems writting/reading data it will retry and perform other actions to mitigate the problem.
It will keep trying until ít meets the disk-firmware--timeout. This timeout can be very long, especially on consumer-drives (up-to 180 sec.).
On enterpise/RAID specific drives this timer-out should be set to a few seconds be setting the SCTERC parameter on the drive. Only enterprise class disks support this feater.."

I've tested an IronWolf SSD (non-Pro) and this drive did NOT support SCTERC even though it's sold as a NAS drive, so your can't be sure!
Also the Micron 5300 Pro drive, again a NAS/RAID drive, doesn't support SCTERC either! And as always Micron is silence when you contact them.

Remember, even though the drive do support SCTERC it's often not enabled/configured. You must set the timeout to e.g. a few seconds. Default is typically 120-180 seconds (here is the smell)

To retrieve SCTERC support from drive: smartctl -l scterc /dev/sda

To set SCTERC: smartctl --log=scterc,70,70 /dev/sda (timeout is set to 7 seconds)

But if you do actually see a drive stalling because of some errors on the drive I would expect to see some SMART errors.....so maybe not?
Unless it's a cable issue! Sometimes these error don't show up! Especially on AMD CPUs how strange it might sound.
 

Mashie

Member
Jun 26, 2020
37
9
8
Power supply sufficient? What controller are those drives attached to? You should never see that kernel message.
This is the full setup:

Intel Xeon E5-1650 V3
Gigabyte MW50-SV0
4x 16GB Crucial DDR4 ECC Registered
250GB Samsung 850 EVO mSATA SSD (boot)
WD Black 1TB NVMe (system)
10x Seagate IronWolf 10TB (RAID6 storage)
Corsair AX1200
Lian-Li PC-777B
Xubuntu 20.04

Originally all 10 drives were connected to the SATA headers on the motherboard.

Earlier this week I added an LSI 9305-24i and have migrated the first eight of the drives across without any improvements. I will do the last two drives later today. I got this card for two reasons, one was to let me upgrade to a new system in the future without a dozen of onboard SATA headers and the second to hopefully resolve this issue.

The PSU itself is overkill for this build but I had it laying around so use dit. When I migrate the last two drives later I will check that the drives are spread between the power rails.


The kernel message "/md0-8:1166 blocked for more than 120 seconds." smell like it's related to the STCERC time-out problem.
"When a disk experince problems writting/reading data it will retry and perform other actions to mitigate the problem.
It will keep trying until ít meets the disk-firmware--timeout. This timeout can be very long, especially on consumer-drives (up-to 180 sec.).
On enterpise/RAID specific drives this timer-out should be set to a few seconds be setting the SCTERC parameter on the drive. Only enterprise class disks support this feater.."

I've tested an IronWolf SSD (non-Pro) and this drive did NOT support SCTERC even though it's sold as a NAS drive, so your can't be sure!
Also the Micron 5300 Pro drive, again a NAS/RAID drive, doesn't support SCTERC either! And as always Micron is silence when you contact them.

Remember, even though the drive do support SCTERC it's often not enabled/configured. You must set the timeout to e.g. a few seconds. Default is typically 120-180 seconds (here is the smell)

To retrieve SCTERC support from drive: smartctl -l scterc /dev/sda

To set SCTERC: smartctl --log=scterc,70,70 /dev/sda (timeout is set to 7 seconds)

But if you do actually see a drive stalling because of some errors on the drive I would expect to see some SMART errors.....so maybe not?
Unless it's a cable issue! Sometimes these error don't show up! Especially on AMD CPUs how strange it might sound.
Good call but all 10 of the drives are reporting 7 seconds already:

Code:
sudo smartctl -l scterc /dev/sda
[sudo] password for mashie:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-37-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)
 
Last edited:

Mashie

Member
Jun 26, 2020
37
9
8
All drives are now hanging off the 9305-24i, the behaviour remains the same. The power is looking fine as well, no extensions used anywhere.

The bootup log contains some errors but Google doesn't offer much insight in what is going on:

Code:
Oct  9 10:09:25 IONE systemd[1]: systemd-fsckd.service: Succeeded.
Oct  9 10:09:25 IONE kernel: [   35.097827] mpt3sas_cm0: port enable: SUCCESS
Oct  9 10:09:25 IONE kernel: [   35.302934] scsi 3:0:0:0: Direct-Access     ATA      ST10000VN0004-1Z SC60 PQ: 0 ANSI: 6
Oct  9 10:09:25 IONE kernel: [   35.302950] scsi 3:0:0:0: SATA: handle(0x001d), sas_addr(0x300062b202991498), phy(22), device_name(0x0000000000000000)
Oct  9 10:09:25 IONE kernel: [   35.302959] scsi 3:0:0:0: enclosure logical id (0x500062b202991482), slot(12) 
Oct  9 10:09:25 IONE kernel: [   35.302967] scsi 3:0:0:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:25 IONE kernel: [   35.303025] scsi 3:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:25 IONE kernel: [   35.303035] scsi 3:0:0:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:25 IONE kernel: [   35.316369]  end_device-3:0: add: handle(0x001d), sas_addr(0x300062b202991498)
Oct  9 10:09:25 IONE kernel: [   35.364030] scsi 3:0:1:0: Direct-Access     ATA      ST10000VN0004-1Z SC60 PQ: 0 ANSI: 6
Oct  9 10:09:25 IONE kernel: [   35.364050] scsi 3:0:1:0: SATA: handle(0x0019), sas_addr(0x300062b202991494), phy(18), device_name(0x0000000000000000)
Oct  9 10:09:25 IONE kernel: [   35.364053] scsi 3:0:1:0: enclosure logical id (0x500062b202991482), slot(8) 
Oct  9 10:09:25 IONE kernel: [   35.364056] scsi 3:0:1:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:25 IONE kernel: [   35.364099] scsi 3:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:25 IONE kernel: [   35.364102] scsi 3:0:1:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:25 IONE kernel: [   35.377532]  end_device-3:1: add: handle(0x0019), sas_addr(0x300062b202991494)
Oct  9 10:09:25 IONE kernel: [   35.480419] scsi 3:0:2:0: Direct-Access     ATA      ST10000VN0004-2G SC60 PQ: 0 ANSI: 6
Oct  9 10:09:25 IONE kernel: [   35.480435] scsi 3:0:2:0: SATA: handle(0x001a), sas_addr(0x300062b202991495), phy(19), device_name(0x0000000000000000)
Oct  9 10:09:25 IONE kernel: [   35.480444] scsi 3:0:2:0: enclosure logical id (0x500062b202991482), slot(9) 
Oct  9 10:09:25 IONE kernel: [   35.480451] scsi 3:0:2:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:25 IONE kernel: [   35.480509] scsi 3:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:25 IONE kernel: [   35.480518] scsi 3:0:2:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:25 IONE kernel: [   35.492653]  end_device-3:2: add: handle(0x001a), sas_addr(0x300062b202991495)
Oct  9 10:09:26 IONE kernel: [   35.678613] scsi 3:0:3:0: Direct-Access     ATA      ST10000VN0004-1Z SC60 PQ: 0 ANSI: 6
Oct  9 10:09:26 IONE kernel: [   35.678620] scsi 3:0:3:0: SATA: handle(0x001b), sas_addr(0x300062b202991496), phy(20), device_name(0x0000000000000000)
Oct  9 10:09:26 IONE kernel: [   35.678623] scsi 3:0:3:0: enclosure logical id (0x500062b202991482), slot(15) 
Oct  9 10:09:26 IONE kernel: [   35.678626] scsi 3:0:3:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:26 IONE kernel: [   35.678670] scsi 3:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:26 IONE kernel: [   35.678673] scsi 3:0:3:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:26 IONE kernel: [   35.692155]  end_device-3:3: add: handle(0x001b), sas_addr(0x300062b202991496)
Oct  9 10:09:26 IONE kernel: [   35.808341] scsi 3:0:4:0: Direct-Access     ATA      ST10000VN0004-1Z SC60 PQ: 0 ANSI: 6
Oct  9 10:09:26 IONE kernel: [   35.808357] scsi 3:0:4:0: SATA: handle(0x001c), sas_addr(0x300062b202991497), phy(21), device_name(0x0000000000000000)
Oct  9 10:09:26 IONE kernel: [   35.808366] scsi 3:0:4:0: enclosure logical id (0x500062b202991482), slot(14) 
Oct  9 10:09:26 IONE kernel: [   35.808373] scsi 3:0:4:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:26 IONE kernel: [   35.808431] scsi 3:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:26 IONE kernel: [   35.808440] scsi 3:0:4:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:26 IONE kernel: [   35.821852]  end_device-3:4: add: handle(0x001c), sas_addr(0x300062b202991497)
Oct  9 10:09:26 IONE kernel: [   35.926753] scsi 3:0:5:0: Direct-Access     ATA      ST10000VN0004-1Z SC60 PQ: 0 ANSI: 6
Oct  9 10:09:26 IONE kernel: [   35.926769] scsi 3:0:5:0: SATA: handle(0x001e), sas_addr(0x300062b202991499), phy(23), device_name(0x0000000000000000)
Oct  9 10:09:26 IONE kernel: [   35.926778] scsi 3:0:5:0: enclosure logical id (0x500062b202991482), slot(13) 
Oct  9 10:09:26 IONE kernel: [   35.926786] scsi 3:0:5:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:26 IONE kernel: [   35.926845] scsi 3:0:5:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:26 IONE kernel: [   35.926853] scsi 3:0:5:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:26 IONE kernel: [   35.940341]  end_device-3:5: add: handle(0x001e), sas_addr(0x300062b202991499)
Oct  9 10:09:26 IONE systemd[1]: systemd-hostnamed.service: Succeeded.
Oct  9 10:09:26 IONE kernel: [   35.980110] scsi 3:0:6:0: Direct-Access     ATA      ST10000VN0008-2J SC60 PQ: 0 ANSI: 6
Oct  9 10:09:26 IONE kernel: [   35.980127] scsi 3:0:6:0: SATA: handle(0x001f), sas_addr(0x300062b202991489), phy(7), device_name(0x0000000000000000)
Oct  9 10:09:26 IONE kernel: [   35.980136] scsi 3:0:6:0: enclosure logical id (0x500062b202991482), slot(5) 
Oct  9 10:09:26 IONE kernel: [   35.980143] scsi 3:0:6:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:26 IONE kernel: [   35.980204] scsi 3:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:26 IONE kernel: [   35.980213] scsi 3:0:6:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:26 IONE kernel: [   35.991475]  end_device-3:6: add: handle(0x001f), sas_addr(0x300062b202991489)
Oct  9 10:09:26 IONE systemd[1]: blueman-mechanism.service: Succeeded.
Oct  9 10:09:26 IONE kernel: [   36.105201] scsi 3:0:7:0: Direct-Access     ATA      ST10000VN0008-2J SC60 PQ: 0 ANSI: 6
Oct  9 10:09:26 IONE kernel: [   36.105217] scsi 3:0:7:0: SATA: handle(0x0020), sas_addr(0x300062b202991488), phy(6), device_name(0x0000000000000000)
Oct  9 10:09:26 IONE kernel: [   36.105226] scsi 3:0:7:0: enclosure logical id (0x500062b202991482), slot(4) 
Oct  9 10:09:26 IONE kernel: [   36.105234] scsi 3:0:7:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:26 IONE kernel: [   36.105293] scsi 3:0:7:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:26 IONE kernel: [   36.105302] scsi 3:0:7:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:26 IONE kernel: [   36.116924]  end_device-3:7: add: handle(0x0020), sas_addr(0x300062b202991488)
Oct  9 10:09:26 IONE kernel: [   36.240805] scsi 3:0:8:0: Direct-Access     ATA      ST10000VN0004-1Z SC60 PQ: 0 ANSI: 6
Oct  9 10:09:26 IONE kernel: [   36.240821] scsi 3:0:8:0: SATA: handle(0x0021), sas_addr(0x300062b202991492), phy(16), device_name(0x0000000000000000)
Oct  9 10:09:26 IONE kernel: [   36.240830] scsi 3:0:8:0: enclosure logical id (0x500062b202991482), slot(11) 
Oct  9 10:09:26 IONE kernel: [   36.240837] scsi 3:0:8:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:26 IONE kernel: [   36.240895] scsi 3:0:8:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:26 IONE kernel: [   36.240904] scsi 3:0:8:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:26 IONE kernel: [   36.254244]  end_device-3:8: add: handle(0x0021), sas_addr(0x300062b202991492)
Oct  9 10:09:26 IONE kernel: [   36.355090] scsi 3:0:9:0: Direct-Access     ATA      ST10000VN0008-2J SC60 PQ: 0 ANSI: 6
Oct  9 10:09:26 IONE kernel: [   36.355096] scsi 3:0:9:0: SATA: handle(0x0022), sas_addr(0x300062b202991493), phy(17), device_name(0x0000000000000000)
Oct  9 10:09:26 IONE kernel: [   36.355099] scsi 3:0:9:0: enclosure logical id (0x500062b202991482), slot(10) 
Oct  9 10:09:26 IONE kernel: [   36.355102] scsi 3:0:9:0: enclosure level(0x0000), connector name(     )
Oct  9 10:09:26 IONE kernel: [   36.355144] scsi 3:0:9:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct  9 10:09:26 IONE kernel: [   36.355147] scsi 3:0:9:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
Oct  9 10:09:26 IONE kernel: [   36.366615]  end_device-3:9: add: handle(0x0022), sas_addr(0x300062b202991493)
Oct  9 10:09:26 IONE kernel: [   36.367012] sd 3:0:0:0: Attached scsi generic sg2 type 0
Oct  9 10:09:26 IONE kernel: [   36.367104] sd 3:0:0:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.367276] sd 3:0:1:0: Attached scsi generic sg3 type 0
Oct  9 10:09:26 IONE kernel: [   36.367315] sd 3:0:1:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.367472] sd 3:0:2:0: Attached scsi generic sg4 type 0
Oct  9 10:09:26 IONE kernel: [   36.367606] sd 3:0:2:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.367796] sd 3:0:3:0: Attached scsi generic sg5 type 0
Oct  9 10:09:26 IONE kernel: [   36.367827] sd 3:0:3:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.368077] sd 3:0:4:0: Attached scsi generic sg6 type 0
Oct  9 10:09:26 IONE kernel: [   36.368110] sd 3:0:4:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.368336] sd 3:0:5:0: Attached scsi generic sg7 type 0
Oct  9 10:09:26 IONE kernel: [   36.368449] sd 3:0:5:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.368616] sd 3:0:6:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.368718] sd 3:0:6:0: Attached scsi generic sg8 type 0
Oct  9 10:09:26 IONE kernel: [   36.368986] sd 3:0:7:0: Attached scsi generic sg9 type 0
Oct  9 10:09:26 IONE kernel: [   36.369084] sd 3:0:7:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.369268] sd 3:0:8:0: Attached scsi generic sg10 type 0
Oct  9 10:09:26 IONE kernel: [   36.369348] sd 3:0:8:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.369545] sd 3:0:9:0: Attached scsi generic sg11 type 0
Oct  9 10:09:26 IONE kernel: [   36.369655] sd 3:0:9:0: Power-on or device reset occurred
Oct  9 10:09:26 IONE kernel: [   36.379301] sd 3:0:9:0: [sdk] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.379307] sd 3:0:9:0: [sdk] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.380139] sd 3:0:7:0: [sdi] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.380144] sd 3:0:7:0: [sdi] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.380166] sd 3:0:6:0: [sdh] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.380170] sd 3:0:6:0: [sdh] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.380181] sd 3:0:2:0: [sdd] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.380184] sd 3:0:2:0: [sdd] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.380276] sd 3:0:4:0: [sdf] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.380280] sd 3:0:4:0: [sdf] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.384185] sd 3:0:2:0: [sdd] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.384189] sd 3:0:2:0: [sdd] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.385360] sd 3:0:5:0: [sdg] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.385370] sd 3:0:5:0: [sdg] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.388275] sd 3:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.390428] sd 3:0:8:0: [sdj] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.390433] sd 3:0:8:0: [sdj] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.390689] sd 3:0:1:0: [sdc] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.390693] sd 3:0:1:0: [sdc] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.391221] sd 3:0:3:0: [sde] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.391226] sd 3:0:3:0: [sde] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.391534] sd 3:0:0:0: [sdb] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
Oct  9 10:09:26 IONE kernel: [   36.391546] sd 3:0:0:0: [sdb] 4096-byte physical blocks
Oct  9 10:09:26 IONE kernel: [   36.407163] sd 3:0:9:0: [sdk] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.407167] sd 3:0:9:0: [sdk] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.408024] sd 3:0:7:0: [sdi] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.408027] sd 3:0:7:0: [sdi] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.408052] sd 3:0:6:0: [sdh] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.408057] sd 3:0:6:0: [sdh] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.410281] sd 3:0:4:0: [sdf] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.410284] sd 3:0:4:0: [sdf] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.411019] sd 3:0:9:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.411792] sd 3:0:1:0: [sdc] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.411798] sd 3:0:1:0: [sdc] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.411891] sd 3:0:7:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.411958] sd 3:0:6:0: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.412145] sd 3:0:5:0: [sdg] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.412152] sd 3:0:5:0: [sdg] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.414770] sd 3:0:8:0: [sdj] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.414777] sd 3:0:8:0: [sdj] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.414806] sd 3:0:4:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.416368] sd 3:0:1:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.416719] sd 3:0:5:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.419310] sd 3:0:8:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.421335] sd 3:0:0:0: [sdb] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.421339] sd 3:0:0:0: [sdb] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.422154] sd 3:0:3:0: [sde] Write Protect is off
Oct  9 10:09:26 IONE kernel: [   36.422161] sd 3:0:3:0: [sde] Mode Sense: 9b 00 10 08
Oct  9 10:09:26 IONE kernel: [   36.425891] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.426714] sd 3:0:3:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
Oct  9 10:09:26 IONE kernel: [   36.516316]  sdd: sdd1
Oct  9 10:09:26 IONE kernel: [   36.532556]  sdi: sdi1
Oct  9 10:09:26 IONE kernel: [   36.536800]  sdk: sdk1
Oct  9 10:09:26 IONE kernel: [   36.540255]  sdf: sdf1
Oct  9 10:09:26 IONE kernel: [   36.542376]  sdh: sdh1
Oct  9 10:09:26 IONE kernel: [   36.547645]  sde: sde1
Oct  9 10:09:26 IONE kernel: [   36.554059]  sdb: sdb1
Oct  9 10:09:26 IONE kernel: [   36.558801]  sdg: sdg1
Oct  9 10:09:26 IONE kernel: [   36.578438]  sdc: sdc1
Oct  9 10:09:26 IONE kernel: [   36.581614] sd 3:0:2:0: [sdd] Attached SCSI disk
Oct  9 10:09:26 IONE kernel: [   36.587389]  sdj: sdj1
Oct  9 10:09:26 IONE kernel: [   36.619298] sd 3:0:4:0: [sdf] Attached SCSI disk
Oct  9 10:09:26 IONE kernel: [   36.631008] sd 3:0:7:0: [sdi] Attached SCSI disk
Oct  9 10:09:26 IONE kernel: [   36.638554] sd 3:0:6:0: [sdh] Attached SCSI disk
Oct  9 10:09:26 IONE kernel: [   36.638793] sd 3:0:0:0: [sdb] Attached SCSI disk
Oct  9 10:09:26 IONE kernel: [   36.639635] sd 3:0:3:0: [sde] Attached SCSI disk
Oct  9 10:09:26 IONE kernel: [   36.646629] sd 3:0:9:0: [sdk] Attached SCSI disk
Oct  9 10:09:27 IONE kernel: [   36.654464] sd 3:0:5:0: [sdg] Attached SCSI disk
Oct  9 10:09:27 IONE kernel: [   36.670784] sd 3:0:1:0: [sdc] Attached SCSI disk
Oct  9 10:09:27 IONE kernel: [   36.673670] sd 3:0:8:0: [sdj] Attached SCSI disk
Oct  9 10:09:27 IONE udisksd[785]: The function 'bd_md_examine' called, but not implemented!
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/degraded': Failed to open file “/sys/devices/virtual/block/md0/md/degraded”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_action': Failed to open file “/sys/devices/virtual/block/md0/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE udisksd[785]: Error reading sysfs attr `/sys/devices/virtual/block/md0/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md0/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Oct  9 10:09:27 IONE kernel: [   36.901314] md/raid:md0: device sdj1 operational as raid disk 5
Oct  9 10:09:27 IONE kernel: [   36.901327] md/raid:md0: device sdc1 operational as raid disk 4
Oct  9 10:09:27 IONE kernel: [   36.901330] md/raid:md0: device sde1 operational as raid disk 0
Oct  9 10:09:27 IONE kernel: [   36.901333] md/raid:md0: device sdb1 operational as raid disk 3
Oct  9 10:09:27 IONE kernel: [   36.901335] md/raid:md0: device sdg1 operational as raid disk 2
Oct  9 10:09:27 IONE kernel: [   36.901338] md/raid:md0: device sdk1 operational as raid disk 7
Oct  9 10:09:27 IONE kernel: [   36.901341] md/raid:md0: device sdh1 operational as raid disk 8
Oct  9 10:09:27 IONE kernel: [   36.901345] md/raid:md0: device sdf1 operational as raid disk 1
Oct  9 10:09:27 IONE kernel: [   36.901348] md/raid:md0: device sdi1 operational as raid disk 9
Oct  9 10:09:27 IONE kernel: [   36.901350] md/raid:md0: device sdd1 operational as raid disk 6
Oct  9 10:09:27 IONE systemd[1]: Started Timer to wait for more drives before activating degraded array md0..
Oct  9 10:09:27 IONE kernel: [   36.904297] md/raid:md0: raid level 6 active with 10 out of 10 devices, algorithm 2
Oct  9 10:09:27 IONE kernel: [   36.939664] md0: detected capacity change from 0 to 156259221504
Oct  9 10:09:27 IONE systemd[1]: Found device /dev/md0.
Oct  9 10:09:27 IONE systemd[1]: Finished Helper to synchronize boot up for ifupdown.
Oct  9 10:09:27 IONE systemd[1]: mdadm-last-resort@md0.timer: Succeeded.
Oct  9 10:09:27 IONE systemd[1]: Stopped Timer to wait for more drives before activating degraded array md0..
Oct  9 10:09:27 IONE systemd[1]: Started MD array scrubbing - continuation.
Oct  9 10:09:27 IONE systemd[1]: Started MD array scrubbing.
Oct  9 10:09:27 IONE systemd[1]: Started Reminder for degraded MD arrays.
Oct  9 10:09:27 IONE systemd[1]: Mounting /mnt/storage...
Oct  9 10:09:27 IONE systemd[1]: Started MD array monitor.
It is always the same five suspects that show activity during the stalling:

Code:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdj1[6] sdc1[4] sde1[0] sdb1[3] sdg1[2] sdk1[7] sdh1[9] sdf1[1] sdi1[8] sdd1[5]
      78129610752 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
      bitmap: 0/73 pages [0KB], 65536KB chunk

unused devices: <none>
Linux 5.11.0-37-generic (IONE)  09/10/21        _x86_64_        (12 CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.67    0.00    0.25    8.17    0.00   90.91

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
md0             191.00       764.00         0.00         0.00        764          0          0
nvme0n1           0.00         0.00         0.00         0.00          0          0          0
sda               0.00         0.00         0.00         0.00          0          0          0
sdb              38.00       152.00         0.00         0.00        152          0          0   disk 1
sdc               0.00         0.00         0.00         0.00          0          0          0
sdd               0.00         0.00         0.00         0.00          0          0          0
sde               0.00         0.00         0.00         0.00          0          0          0
sdf              38.00       152.00         0.00         0.00        152          0          0   disk 3
sdg               0.00         0.00         0.00         0.00          0          0          0
sdh               0.00         0.00         0.00         0.00          0          0          0
sdi              39.00       156.00         0.00         0.00        156          0          0   disk 9
sdj              38.00       152.00         0.00         0.00        152          0          0   disk 7
sdk              38.00       152.00         0.00         0.00        152          0          0   disk 8
The kernel error in the syslog during the stall is pretty much identical every time:

Code:
Oct  9 10:18:55 IONE kernel: [  605.755095] INFO: task jbd2/md0-8:1163 blocked for more than 120 seconds.
Oct  9 10:18:55 IONE kernel: [  605.755109]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct  9 10:18:55 IONE kernel: [  605.755117] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  9 10:18:55 IONE kernel: [  605.755123] task:jbd2/md0-8      state:D stack:    0 pid: 1163 ppid:     2 flags:0x00004000
Oct  9 10:18:55 IONE kernel: [  605.755135] Call Trace:
Oct  9 10:18:55 IONE kernel: [  605.755144]  __schedule+0x44c/0x8a0
Oct  9 10:18:55 IONE kernel: [  605.755157]  schedule+0x4f/0xc0
Oct  9 10:18:55 IONE kernel: [  605.755166]  jbd2_journal_commit_transaction+0x300/0x18f0
Oct  9 10:18:55 IONE kernel: [  605.755177]  ? sysvec_irq_work+0x48/0x90
Oct  9 10:18:55 IONE kernel: [  605.755188]  ? asm_sysvec_irq_work+0x12/0x20
Oct  9 10:18:55 IONE kernel: [  605.755205]  ? wait_woken+0x80/0x80
Oct  9 10:18:55 IONE kernel: [  605.755217]  ? try_to_del_timer_sync+0x54/0x80
Oct  9 10:18:55 IONE kernel: [  605.755231]  kjournald2+0xb6/0x280
Oct  9 10:18:55 IONE kernel: [  605.755242]  ? wait_woken+0x80/0x80
Oct  9 10:18:55 IONE kernel: [  605.755250]  ? commit_timeout+0x20/0x20
Oct  9 10:18:55 IONE kernel: [  605.755260]  kthread+0x12b/0x150
Oct  9 10:18:55 IONE kernel: [  605.755272]  ? set_kthread_struct+0x40/0x40
Oct  9 10:18:55 IONE kernel: [  605.755283]  ret_from_fork+0x22/0x30
Oct  9 10:18:55 IONE kernel: [  605.755322] INFO: task pool-Thunar:4975 blocked for more than 120 seconds.
Oct  9 10:18:55 IONE kernel: [  605.755330]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct  9 10:18:55 IONE kernel: [  605.755337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  9 10:18:55 IONE kernel: [  605.755342] task:pool-Thunar     state:D stack:    0 pid: 4975 ppid:  2368 flags:0x00000000
Oct  9 10:18:55 IONE kernel: [  605.755352] Call Trace:
Oct  9 10:18:55 IONE kernel: [  605.755357]  __schedule+0x44c/0x8a0
Oct  9 10:18:55 IONE kernel: [  605.755366]  schedule+0x4f/0xc0
Oct  9 10:18:55 IONE kernel: [  605.755374]  rwsem_down_read_slowpath+0x184/0x3c0
Oct  9 10:18:55 IONE kernel: [  605.755388]  down_read+0x43/0xa0
Oct  9 10:18:55 IONE kernel: [  605.755396]  ext4_da_map_blocks.constprop.0+0x2dc/0x380
Oct  9 10:18:55 IONE kernel: [  605.755409]  ext4_da_get_block_prep+0x55/0xe0
Oct  9 10:18:55 IONE kernel: [  605.755419]  ext4_block_write_begin+0x14a/0x530
Oct  9 10:18:55 IONE kernel: [  605.755427]  ? ext4_da_map_blocks.constprop.0+0x380/0x380
Oct  9 10:18:55 IONE kernel: [  605.755437]  ? __ext4_journal_start_sb+0x106/0x120
Oct  9 10:18:55 IONE kernel: [  605.755449]  ext4_da_write_begin+0x1de/0x460
Oct  9 10:18:55 IONE kernel: [  605.755460]  generic_perform_write+0xc2/0x1c0
Oct  9 10:18:55 IONE kernel: [  605.755477]  ext4_buffered_write_iter+0x98/0x150
Oct  9 10:18:55 IONE kernel: [  605.755489]  ext4_file_write_iter+0x53/0x220
Oct  9 10:18:55 IONE kernel: [  605.755499]  ? common_file_perm+0x72/0x170
Oct  9 10:18:55 IONE kernel: [  605.755513]  do_iter_readv_writev+0x152/0x1b0
Oct  9 10:18:55 IONE kernel: [  605.755526]  do_iter_write+0x88/0x1c0
Oct  9 10:18:55 IONE kernel: [  605.755537]  vfs_iter_write+0x19/0x30
Oct  9 10:18:55 IONE kernel: [  605.755547]  iter_file_splice_write+0x276/0x3c0
Oct  9 10:18:55 IONE kernel: [  605.755564]  do_splice_from+0x21/0x40
Oct  9 10:18:55 IONE kernel: [  605.755573]  do_splice+0x2e8/0x650
Oct  9 10:18:55 IONE kernel: [  605.755585]  __do_splice+0xde/0x160
Oct  9 10:18:55 IONE kernel: [  605.755596]  __x64_sys_splice+0x99/0x110
Oct  9 10:18:55 IONE kernel: [  605.755607]  do_syscall_64+0x38/0x90
Oct  9 10:18:55 IONE kernel: [  605.755615]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct  9 10:18:55 IONE kernel: [  605.755626] RIP: 0033:0x7f66b6f777f3
Oct  9 10:18:55 IONE kernel: [  605.755634] RSP: 002b:00007f66af7fd700 EFLAGS: 00000293 ORIG_RAX: 0000000000000113
Oct  9 10:18:55 IONE kernel: [  605.755644] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f66b6f777f3
Oct  9 10:18:55 IONE kernel: [  605.755651] RDX: 0000000000000016 RSI: 0000000000000000 RDI: 0000000000000017
Oct  9 10:18:55 IONE kernel: [  605.755656] RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000004
Oct  9 10:18:55 IONE kernel: [  605.755662] R10: 00007f66af7fd840 R11: 0000000000000293 R12: 0000000000000016
Oct  9 10:18:55 IONE kernel: [  605.755668] R13: 0000000000000000 R14: 0000000000000017 R15: 00007f66af7fd850
Any other ideas what to try (besides replacing hardware or rebuilding the array that is)?
 

Tinkerer

Member
Sep 5, 2020
83
28
18
I remember I had something similar a long time ago, with hung tasks etc. I don't remember exactly all the details though, it's 8 years ago so ... I think it ended up being a kernel issue with debian. I moved to arch for that server.

You could try booting from a live usb stick with another distro, rocky Linux, Debian 11. Something with another kernel.
 

UhClem

just another Bozo on the bus
Jun 26, 2012
435
249
43
NH, USA
Any other ideas what to try ...?
After a reboot, try to provoke the issue with:
Code:
dd if=/dev/nvme0n1 of=FOO bs=1M count=20k iflag=direct oflag=direct
where FOO is some-name on your md file-system.
If that succeeds, delete FOO and try it again, BUT eliminate the oflag=direct .
And, if that succeeds, delete FOO and try it again, BUT eliminate both the oflag=direct and the iflag=direct .
 
Last edited:
  • Like
Reactions: MrCalvin

Mashie

Member
Jun 26, 2020
37
9
8
After a reboot, try to provoke the issue with:
Code:
dd if=/dev/nvme0n1 of=FOO bs=1M count=20k iflag=direct oflag=direct
where FOO is some-name on your md file-system.
If that succeeds, delete FOO and try it again, BUT eliminate the oflag=direct .
And, if that succeeds, delete FOO and try it again, BUT eliminate both the oflag=direct and the iflag=direct .
All three options got the system stalled however the amount of logs generated varied a lot. The last option caused the system to stall over 4 minutes.

Code:
mashie@IONE:~$ sudo dd if=/dev/nvme0n1 of=/mnt/storage/test.txt bs=1M count=20k iflag=direct oflag=direct
20480+0 records in
20480+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 557.337 s, 38.5 MB/s
mashie@IONE:~$


Oct 10 12:40:45 IONE kernel: [  363.767047] INFO: task jbd2/md0-8:1167 blocked for more than 120 seconds.
Oct 10 12:40:45 IONE kernel: [  363.767059]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 12:40:45 IONE kernel: [  363.767063] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 12:40:45 IONE kernel: [  363.767066] task:jbd2/md0-8      state:D stack:    0 pid: 1167 ppid:     2 flags:0x00004000
Oct 10 12:40:45 IONE kernel: [  363.767074] Call Trace:
Oct 10 12:40:45 IONE kernel: [  363.767080]  __schedule+0x44c/0x8a0
Oct 10 12:40:45 IONE kernel: [  363.767090]  schedule+0x4f/0xc0
Oct 10 12:40:45 IONE kernel: [  363.767094]  jbd2_journal_commit_transaction+0x300/0x18f0
Oct 10 12:40:45 IONE kernel: [  363.767102]  ? dequeue_entity+0xd8/0x410
Oct 10 12:40:45 IONE kernel: [  363.767112]  ? wait_woken+0x80/0x80
Oct 10 12:40:45 IONE kernel: [  363.767118]  ? try_to_del_timer_sync+0x54/0x80
Oct 10 12:40:45 IONE kernel: [  363.767127]  kjournald2+0xb6/0x280
Oct 10 12:40:45 IONE kernel: [  363.767134]  ? wait_woken+0x80/0x80
Oct 10 12:40:45 IONE kernel: [  363.767138]  ? commit_timeout+0x20/0x20
Oct 10 12:40:45 IONE kernel: [  363.767144]  kthread+0x12b/0x150
Oct 10 12:40:45 IONE kernel: [  363.767153]  ? set_kthread_struct+0x40/0x40
Oct 10 12:40:45 IONE kernel: [  363.767159]  ret_from_fork+0x22/0x30
Code:
mashie@IONE:~$ sudo dd if=/dev/nvme0n1 of=/mnt/storage/test.txt bs=1M count=20k iflag=direct
20480+0 records in
20480+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 311.656 s, 68.9 MB/s
mashie@IONE:~$ 


Oct 10 12:59:49 IONE kernel: [  363.915473] INFO: task jbd2/md0-8:1170 blocked for more than 120 seconds.
Oct 10 12:59:49 IONE kernel: [  363.915485]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 12:59:49 IONE kernel: [  363.915489] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 12:59:49 IONE kernel: [  363.915491] task:jbd2/md0-8      state:D stack:    0 pid: 1170 ppid:     2 flags:0x00004000
Oct 10 12:59:49 IONE kernel: [  363.915499] Call Trace:
Oct 10 12:59:49 IONE kernel: [  363.915505]  __schedule+0x44c/0x8a0
Oct 10 12:59:49 IONE kernel: [  363.915515]  schedule+0x4f/0xc0
Oct 10 12:59:49 IONE kernel: [  363.915519]  jbd2_journal_commit_transaction+0x300/0x18f0
Oct 10 12:59:49 IONE kernel: [  363.915527]  ? dequeue_entity+0xd8/0x410
Oct 10 12:59:49 IONE kernel: [  363.915537]  ? wait_woken+0x80/0x80
Oct 10 12:59:49 IONE kernel: [  363.915544]  ? try_to_del_timer_sync+0x54/0x80
Oct 10 12:59:49 IONE kernel: [  363.915553]  kjournald2+0xb6/0x280
Oct 10 12:59:49 IONE kernel: [  363.915560]  ? wait_woken+0x80/0x80
Oct 10 12:59:49 IONE kernel: [  363.915564]  ? commit_timeout+0x20/0x20
Oct 10 12:59:49 IONE kernel: [  363.915570]  kthread+0x12b/0x150
Oct 10 12:59:49 IONE kernel: [  363.915578]  ? set_kthread_struct+0x40/0x40
Oct 10 12:59:49 IONE kernel: [  363.915585]  ret_from_fork+0x22/0x30
Oct 10 12:59:49 IONE kernel: [  363.915628] INFO: task dd:4761 blocked for more than 120 seconds.
Oct 10 12:59:49 IONE kernel: [  363.915632]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 12:59:49 IONE kernel: [  363.915635] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 12:59:49 IONE kernel: [  363.915638] task:dd              state:D stack:    0 pid: 4761 ppid:  4760 flags:0x00000000
Oct 10 12:59:49 IONE kernel: [  363.915643] Call Trace:
Oct 10 12:59:49 IONE kernel: [  363.915646]  __schedule+0x44c/0x8a0
Oct 10 12:59:49 IONE kernel: [  363.915650]  ? obj_cgroup_charge+0x9e/0x160
Oct 10 12:59:49 IONE kernel: [  363.915657]  schedule+0x4f/0xc0
Oct 10 12:59:49 IONE kernel: [  363.915661]  rwsem_down_read_slowpath+0x184/0x3c0
Oct 10 12:59:49 IONE kernel: [  363.915670]  down_read+0x43/0xa0
Oct 10 12:59:49 IONE kernel: [  363.915675]  ext4_da_map_blocks.constprop.0+0x2dc/0x380
Oct 10 12:59:49 IONE kernel: [  363.915682]  ext4_da_get_block_prep+0x55/0xe0
Oct 10 12:59:49 IONE kernel: [  363.915687]  ext4_block_write_begin+0x14a/0x530
Oct 10 12:59:49 IONE kernel: [  363.915691]  ? ext4_da_map_blocks.constprop.0+0x380/0x380
Oct 10 12:59:49 IONE kernel: [  363.915696]  ? __ext4_journal_start_sb+0x106/0x120
Oct 10 12:59:49 IONE kernel: [  363.915703]  ext4_da_write_begin+0x1de/0x460
Oct 10 12:59:49 IONE kernel: [  363.915709]  generic_perform_write+0xc2/0x1c0
Oct 10 12:59:49 IONE kernel: [  363.915720]  ext4_buffered_write_iter+0x98/0x150
Oct 10 12:59:49 IONE kernel: [  363.915727]  ext4_file_write_iter+0x53/0x220
Oct 10 12:59:49 IONE kernel: [  363.915733]  new_sync_write+0x117/0x1b0
Oct 10 12:59:49 IONE kernel: [  363.915742]  vfs_write+0x1ca/0x280
Oct 10 12:59:49 IONE kernel: [  363.915748]  ksys_write+0x67/0xe0
Oct 10 12:59:49 IONE kernel: [  363.915756]  __x64_sys_write+0x1a/0x20
Oct 10 12:59:49 IONE kernel: [  363.915762]  do_syscall_64+0x38/0x90
Oct 10 12:59:49 IONE kernel: [  363.915768]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 10 12:59:49 IONE kernel: [  363.915775] RIP: 0033:0x7f056d4911e7
Oct 10 12:59:49 IONE kernel: [  363.915780] RSP: 002b:00007ffd04e53b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct 10 12:59:49 IONE kernel: [  363.915785] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f056d4911e7
Oct 10 12:59:49 IONE kernel: [  363.915788] RDX: 0000000000100000 RSI: 00007f056cd0e000 RDI: 0000000000000001
Oct 10 12:59:49 IONE kernel: [  363.915791] RBP: 00007f056cd0e000 R08: 00007f056cd0d010 R09: 0000000000000000
Oct 10 12:59:49 IONE kernel: [  363.915794] R10: 00007f056cd0e000 R11: 0000000000000246 R12: 0000000000000000
Oct 10 12:59:49 IONE kernel: [  363.915796] R13: 0000000000000000 R14: 0000000000100000 R15: 00007f056cd0e000
Code:
mashie@IONE:~$ sudo dd if=/dev/nvme0n1 of=/mnt/storage/test.txt bs=1M count=20k
20480+0 records in
20480+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 335.385 s, 64.0 MB/s
mashie@IONE:~$


Oct 10 13:08:45 IONE kernel: [  243.307955] INFO: task jbd2/md0-8:1166 blocked for more than 120 seconds.
Oct 10 13:08:45 IONE kernel: [  243.307970]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 13:08:45 IONE kernel: [  243.307974] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 13:08:45 IONE kernel: [  243.307976] task:jbd2/md0-8      state:D stack:    0 pid: 1166 ppid:     2 flags:0x00004000
Oct 10 13:08:45 IONE kernel: [  243.307984] Call Trace:
Oct 10 13:08:45 IONE kernel: [  243.307990]  __schedule+0x44c/0x8a0
Oct 10 13:08:45 IONE kernel: [  243.308000]  schedule+0x4f/0xc0
Oct 10 13:08:45 IONE kernel: [  243.308004]  jbd2_journal_commit_transaction+0x300/0x18f0
Oct 10 13:08:45 IONE kernel: [  243.308011]  ? dequeue_entity+0xd8/0x410
Oct 10 13:08:45 IONE kernel: [  243.308021]  ? wait_woken+0x80/0x80
Oct 10 13:08:45 IONE kernel: [  243.308027]  ? try_to_del_timer_sync+0x54/0x80
Oct 10 13:08:45 IONE kernel: [  243.308036]  kjournald2+0xb6/0x280
Oct 10 13:08:45 IONE kernel: [  243.308043]  ? wait_woken+0x80/0x80
Oct 10 13:08:45 IONE kernel: [  243.308047]  ? commit_timeout+0x20/0x20
Oct 10 13:08:45 IONE kernel: [  243.308052]  kthread+0x12b/0x150
Oct 10 13:08:45 IONE kernel: [  243.308060]  ? set_kthread_struct+0x40/0x40
Oct 10 13:08:45 IONE kernel: [  243.308067]  ret_from_fork+0x22/0x30
Oct 10 13:08:45 IONE kernel: [  243.308106] INFO: task dd:4735 blocked for more than 120 seconds.
Oct 10 13:08:45 IONE kernel: [  243.308110]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 13:08:45 IONE kernel: [  243.308114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 13:08:45 IONE kernel: [  243.308116] task:dd              state:D stack:    0 pid: 4735 ppid:  4734 flags:0x00000000
Oct 10 13:08:45 IONE kernel: [  243.308122] Call Trace:
Oct 10 13:08:45 IONE kernel: [  243.308124]  __schedule+0x44c/0x8a0
Oct 10 13:08:45 IONE kernel: [  243.308129]  schedule+0x4f/0xc0
Oct 10 13:08:45 IONE kernel: [  243.308133]  rwsem_down_read_slowpath+0x184/0x3c0
Oct 10 13:08:45 IONE kernel: [  243.308141]  down_read+0x43/0xa0
Oct 10 13:08:45 IONE kernel: [  243.308146]  ext4_da_map_blocks.constprop.0+0x2dc/0x380
Oct 10 13:08:45 IONE kernel: [  243.308153]  ext4_da_get_block_prep+0x55/0xe0
Oct 10 13:08:45 IONE kernel: [  243.308158]  ext4_block_write_begin+0x14a/0x530
Oct 10 13:08:45 IONE kernel: [  243.308162]  ? ext4_da_map_blocks.constprop.0+0x380/0x380
Oct 10 13:08:45 IONE kernel: [  243.308167]  ? __ext4_journal_start_sb+0x106/0x120
Oct 10 13:08:45 IONE kernel: [  243.308174]  ext4_da_write_begin+0x1de/0x460
Oct 10 13:08:45 IONE kernel: [  243.308179]  generic_perform_write+0xc2/0x1c0
Oct 10 13:08:45 IONE kernel: [  243.308190]  ext4_buffered_write_iter+0x98/0x150
Oct 10 13:08:45 IONE kernel: [  243.308197]  ext4_file_write_iter+0x53/0x220
Oct 10 13:08:45 IONE kernel: [  243.308203]  new_sync_write+0x117/0x1b0
Oct 10 13:08:45 IONE kernel: [  243.308212]  vfs_write+0x1ca/0x280
Oct 10 13:08:45 IONE kernel: [  243.308218]  ksys_write+0x67/0xe0
Oct 10 13:08:45 IONE kernel: [  243.308226]  __x64_sys_write+0x1a/0x20
Oct 10 13:08:45 IONE kernel: [  243.308232]  do_syscall_64+0x38/0x90
Oct 10 13:08:45 IONE kernel: [  243.308237]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 10 13:08:45 IONE kernel: [  243.308244] RIP: 0033:0x7fbaabe4c1e7
Oct 10 13:08:45 IONE kernel: [  243.308249] RSP: 002b:00007ffd6fcb44b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct 10 13:08:45 IONE kernel: [  243.308254] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007fbaabe4c1e7
Oct 10 13:08:45 IONE kernel: [  243.308258] RDX: 0000000000100000 RSI: 00007fbaab6c9000 RDI: 0000000000000001
Oct 10 13:08:45 IONE kernel: [  243.308261] RBP: 00007fbaab6c9000 R08: 00007fbaab6c8010 R09: 0000000000000000
Oct 10 13:08:45 IONE kernel: [  243.308263] R10: 00007fbaab6c9000 R11: 0000000000000246 R12: 0000000000000000
Oct 10 13:08:45 IONE kernel: [  243.308266] R13: 0000000000000000 R14: 0000000000100000 R15: 00007fbaab6c9000
Oct 10 13:08:45 IONE kernel: [  243.308271] INFO: task gvfsd-trash:4745 blocked for more than 120 seconds.
Oct 10 13:08:45 IONE kernel: [  243.308275]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 13:08:45 IONE kernel: [  243.308278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 13:08:45 IONE kernel: [  243.308280] task:gvfsd-trash     state:D stack:    0 pid: 4745 ppid:  2904 flags:0x00004000
Oct 10 13:08:45 IONE kernel: [  243.308285] Call Trace:
Oct 10 13:08:45 IONE kernel: [  243.308288]  __schedule+0x44c/0x8a0
Oct 10 13:08:45 IONE kernel: [  243.308292]  schedule+0x4f/0xc0
Oct 10 13:08:45 IONE kernel: [  243.308296]  wait_transaction_locked+0x8e/0xd0
Oct 10 13:08:45 IONE kernel: [  243.308304]  ? wait_woken+0x80/0x80
Oct 10 13:08:45 IONE kernel: [  243.308308]  add_transaction_credits+0xdb/0x2b0
Oct 10 13:08:45 IONE kernel: [  243.308315]  ? dx_probe+0x4b/0x4b0
Oct 10 13:08:45 IONE kernel: [  243.308322]  start_this_handle+0xff/0x4f0
Oct 10 13:08:45 IONE kernel: [  243.308329]  ? _cond_resched+0x19/0x30
Oct 10 13:08:45 IONE kernel: [  243.308333]  ? kmem_cache_alloc+0x38c/0x440
Oct 10 13:08:45 IONE kernel: [  243.308341]  jbd2__journal_start+0xff/0x1e0
Oct 10 13:08:45 IONE kernel: [  243.308345]  __ext4_journal_start_sb+0x106/0x120
Oct 10 13:08:45 IONE kernel: [  243.308350]  ext4_dirty_inode+0x42/0x80
Oct 10 13:08:45 IONE kernel: [  243.308355]  __mark_inode_dirty+0x169/0x2f0
Oct 10 13:08:45 IONE kernel: [  243.308361]  generic_update_time+0x9d/0xc0
Oct 10 13:08:45 IONE kernel: [  243.308365]  update_time+0x25/0x30
Oct 10 13:08:45 IONE kernel: [  243.308369]  touch_atime+0xb3/0x100
Oct 10 13:08:45 IONE kernel: [  243.308374]  iterate_dir+0x10f/0x1c0
Oct 10 13:08:45 IONE kernel: [  243.308380]  __x64_sys_getdents64+0x81/0x120
Oct 10 13:08:45 IONE kernel: [  243.308386]  ? verify_dirent_name+0x40/0x40
Oct 10 13:08:45 IONE kernel: [  243.308392]  do_syscall_64+0x38/0x90
Oct 10 13:08:45 IONE kernel: [  243.308395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 10 13:08:45 IONE kernel: [  243.308401] RIP: 0033:0x7f58d964f43b
Oct 10 13:08:45 IONE kernel: [  243.308404] RSP: 002b:00007fffd3476188 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Oct 10 13:08:45 IONE kernel: [  243.308408] RAX: ffffffffffffffda RBX: 0000564be953f650 RCX: 00007f58d964f43b
Oct 10 13:08:45 IONE kernel: [  243.308411] RDX: 0000000000008000 RSI: 0000564be953f650 RDI: 0000000000000009
Oct 10 13:08:45 IONE kernel: [  243.308414] RBP: ffffffffffffff80 R08: 0000000000000030 R09: 00007f58d9759c20
Oct 10 13:08:45 IONE kernel: [  243.308416] R10: 0000564be94ea010 R11: 0000000000000293 R12: 0000564be953f624
Oct 10 13:08:45 IONE kernel: [  243.308419] R13: 0000000000000000 R14: 0000564be953f620 R15: 0000000000000000
Oct 10 13:10:46 IONE kernel: [  364.138011] INFO: task jbd2/md0-8:1166 blocked for more than 241 seconds.
Oct 10 13:10:46 IONE kernel: [  364.138023]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 13:10:46 IONE kernel: [  364.138027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 13:10:46 IONE kernel: [  364.138030] task:jbd2/md0-8      state:D stack:    0 pid: 1166 ppid:     2 flags:0x00004000
Oct 10 13:10:46 IONE kernel: [  364.138037] Call Trace:
Oct 10 13:10:46 IONE kernel: [  364.138044]  __schedule+0x44c/0x8a0
Oct 10 13:10:46 IONE kernel: [  364.138053]  schedule+0x4f/0xc0
Oct 10 13:10:46 IONE kernel: [  364.138057]  jbd2_journal_commit_transaction+0x300/0x18f0
Oct 10 13:10:46 IONE kernel: [  364.138065]  ? dequeue_entity+0xd8/0x410
Oct 10 13:10:46 IONE kernel: [  364.138075]  ? wait_woken+0x80/0x80
Oct 10 13:10:46 IONE kernel: [  364.138081]  ? try_to_del_timer_sync+0x54/0x80
Oct 10 13:10:46 IONE kernel: [  364.138090]  kjournald2+0xb6/0x280
Oct 10 13:10:46 IONE kernel: [  364.138098]  ? wait_woken+0x80/0x80
Oct 10 13:10:46 IONE kernel: [  364.138101]  ? commit_timeout+0x20/0x20
Oct 10 13:10:46 IONE kernel: [  364.138107]  kthread+0x12b/0x150
Oct 10 13:10:46 IONE kernel: [  364.138116]  ? set_kthread_struct+0x40/0x40
Oct 10 13:10:46 IONE kernel: [  364.138122]  ret_from_fork+0x22/0x30
Oct 10 13:10:46 IONE kernel: [  364.138158] INFO: task dd:4735 blocked for more than 241 seconds.
Oct 10 13:10:46 IONE kernel: [  364.138162]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 13:10:46 IONE kernel: [  364.138165] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 13:10:46 IONE kernel: [  364.138167] task:dd              state:D stack:    0 pid: 4735 ppid:  4734 flags:0x00000000
Oct 10 13:10:46 IONE kernel: [  364.138173] Call Trace:
Oct 10 13:10:46 IONE kernel: [  364.138176]  __schedule+0x44c/0x8a0
Oct 10 13:10:46 IONE kernel: [  364.138181]  schedule+0x4f/0xc0
Oct 10 13:10:46 IONE kernel: [  364.138184]  rwsem_down_read_slowpath+0x184/0x3c0
Oct 10 13:10:46 IONE kernel: [  364.138193]  down_read+0x43/0xa0
Oct 10 13:10:46 IONE kernel: [  364.138198]  ext4_da_map_blocks.constprop.0+0x2dc/0x380
Oct 10 13:10:46 IONE kernel: [  364.138205]  ext4_da_get_block_prep+0x55/0xe0
Oct 10 13:10:46 IONE kernel: [  364.138210]  ext4_block_write_begin+0x14a/0x530
Oct 10 13:10:46 IONE kernel: [  364.138214]  ? ext4_da_map_blocks.constprop.0+0x380/0x380
Oct 10 13:10:46 IONE kernel: [  364.138219]  ? __ext4_journal_start_sb+0x106/0x120
Oct 10 13:10:46 IONE kernel: [  364.138226]  ext4_da_write_begin+0x1de/0x460
Oct 10 13:10:46 IONE kernel: [  364.138232]  generic_perform_write+0xc2/0x1c0
Oct 10 13:10:46 IONE kernel: [  364.138243]  ext4_buffered_write_iter+0x98/0x150
Oct 10 13:10:46 IONE kernel: [  364.138250]  ext4_file_write_iter+0x53/0x220
Oct 10 13:10:46 IONE kernel: [  364.138257]  new_sync_write+0x117/0x1b0
Oct 10 13:10:46 IONE kernel: [  364.138267]  vfs_write+0x1ca/0x280
Oct 10 13:10:46 IONE kernel: [  364.138273]  ksys_write+0x67/0xe0
Oct 10 13:10:46 IONE kernel: [  364.138281]  __x64_sys_write+0x1a/0x20
Oct 10 13:10:46 IONE kernel: [  364.138287]  do_syscall_64+0x38/0x90
Oct 10 13:10:46 IONE kernel: [  364.138292]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 10 13:10:46 IONE kernel: [  364.138299] RIP: 0033:0x7fbaabe4c1e7
Oct 10 13:10:46 IONE kernel: [  364.138304] RSP: 002b:00007ffd6fcb44b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct 10 13:10:46 IONE kernel: [  364.138309] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007fbaabe4c1e7
Oct 10 13:10:46 IONE kernel: [  364.138313] RDX: 0000000000100000 RSI: 00007fbaab6c9000 RDI: 0000000000000001
Oct 10 13:10:46 IONE kernel: [  364.138315] RBP: 00007fbaab6c9000 R08: 00007fbaab6c8010 R09: 0000000000000000
Oct 10 13:10:46 IONE kernel: [  364.138318] R10: 00007fbaab6c9000 R11: 0000000000000246 R12: 0000000000000000
Oct 10 13:10:46 IONE kernel: [  364.138321] R13: 0000000000000000 R14: 0000000000100000 R15: 00007fbaab6c9000
Oct 10 13:10:46 IONE kernel: [  364.138326] INFO: task gvfsd-trash:4745 blocked for more than 241 seconds.
Oct 10 13:10:46 IONE kernel: [  364.138331]       Tainted: P           OE     5.11.0-37-generic #41~20.04.2-Ubuntu
Oct 10 13:10:46 IONE kernel: [  364.138334] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 13:10:46 IONE kernel: [  364.138336] task:gvfsd-trash     state:D stack:    0 pid: 4745 ppid:  2904 flags:0x00004000
Oct 10 13:10:46 IONE kernel: [  364.138341] Call Trace:
Oct 10 13:10:46 IONE kernel: [  364.138343]  __schedule+0x44c/0x8a0
Oct 10 13:10:46 IONE kernel: [  364.138347]  schedule+0x4f/0xc0
Oct 10 13:10:46 IONE kernel: [  364.138351]  wait_transaction_locked+0x8e/0xd0
Oct 10 13:10:46 IONE kernel: [  364.138361]  ? wait_woken+0x80/0x80
Oct 10 13:10:46 IONE kernel: [  364.138365]  add_transaction_credits+0xdb/0x2b0
Oct 10 13:10:46 IONE kernel: [  364.138372]  ? dx_probe+0x4b/0x4b0
Oct 10 13:10:46 IONE kernel: [  364.138381]  start_this_handle+0xff/0x4f0
Oct 10 13:10:46 IONE kernel: [  364.138388]  ? _cond_resched+0x19/0x30
Oct 10 13:10:46 IONE kernel: [  364.138392]  ? kmem_cache_alloc+0x38c/0x440
Oct 10 13:10:46 IONE kernel: [  364.138401]  jbd2__journal_start+0xff/0x1e0
Oct 10 13:10:46 IONE kernel: [  364.138405]  __ext4_journal_start_sb+0x106/0x120
Oct 10 13:10:46 IONE kernel: [  364.138411]  ext4_dirty_inode+0x42/0x80
Oct 10 13:10:46 IONE kernel: [  364.138416]  __mark_inode_dirty+0x169/0x2f0
Oct 10 13:10:46 IONE kernel: [  364.138423]  generic_update_time+0x9d/0xc0
Oct 10 13:10:46 IONE kernel: [  364.138428]  update_time+0x25/0x30
Oct 10 13:10:46 IONE kernel: [  364.138432]  touch_atime+0xb3/0x100
Oct 10 13:10:46 IONE kernel: [  364.138438]  iterate_dir+0x10f/0x1c0
Oct 10 13:10:46 IONE kernel: [  364.138445]  __x64_sys_getdents64+0x81/0x120
Oct 10 13:10:46 IONE kernel: [  364.138450]  ? verify_dirent_name+0x40/0x40
Oct 10 13:10:46 IONE kernel: [  364.138457]  do_syscall_64+0x38/0x90
Oct 10 13:10:46 IONE kernel: [  364.138460]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 10 13:10:46 IONE kernel: [  364.138466] RIP: 0033:0x7f58d964f43b
Oct 10 13:10:46 IONE kernel: [  364.138469] RSP: 002b:00007fffd3476188 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Oct 10 13:10:46 IONE kernel: [  364.138473] RAX: ffffffffffffffda RBX: 0000564be953f650 RCX: 00007f58d964f43b
Oct 10 13:10:46 IONE kernel: [  364.138476] RDX: 0000000000008000 RSI: 0000564be953f650 RDI: 0000000000000009
Oct 10 13:10:46 IONE kernel: [  364.138478] RBP: ffffffffffffff80 R08: 0000000000000030 R09: 00007f58d9759c20
Oct 10 13:10:46 IONE kernel: [  364.138481] R10: 0000564be94ea010 R11: 0000000000000293 R12: 0000564be953f624
Oct 10 13:10:46 IONE kernel: [  364.138484] R13: 0000000000000000 R14: 0000564be953f620 R15: 0000000000000000
 

UhClem

just another Bozo on the bus
Jun 26, 2012
435
249
43
NH, USA
I can think of 3 possibilities:

Of course, #3 is "something else".

1. maybe there's a glitch/inconsistency within the array itself [or the array's journal?].

2. maybe you've actually discovered/provoked a kernel bug.

--------------------------
Note that all of your (primary) stack traces [from task-timeout] have the routine jbd2_journal_commit_transaction() as top-level [that routine's call to schedule() must be the mechanism to "set" the timeout]. jbd2_j...() is called from kjournald2() which is called from kthread().
-------------

Regarding #1:
===========
It doesn't look like you're using an array write-journal. Wouldn't that have been indicated in the "Personalities:..." in the OP? [vs "bitmap"]

Even if there were parity/data inconsistencies, I can't imagine how they could get us "to here".

Regarding #2:
===========
[Superficially, at least,] there appears to have been a similar bug (from 3 years ago [fixed]) :
[From LINK]

jbd2_hung.jpg

======================
Course of action: ??????? !!!!
If it's #1, that means a complete rebuild, which is drastic/onerous to the extreme. And, without knowing WHY it happened, there's the agonizing concern that it will happen again.

If it's #2, what CAN you do? [You don't have a support contract, right?] Regardless, unless someone else can replicate the problem, how can they debug it?
=======================

Meanwhile ... I'm really intrigued by the "premise" of the issue.
It is working great except for the first time I do a large >7GB file write to the array after a reboot. It will hang at different places in the transfer each time such as 2.5GB/3.6GB/4.3GB/5.6GB. When it hangs the same 5 disks will be busy with activity for 3-4 minutes before everyting resumes and every consecutive file transfer is working as normal until the next reboot.
Minor clarification, please?
How do you "see" the hang? Do your transfer tools have progress indicators? Or ???

As frustrating, and confounding, as this issue is, be thankful that it is reliably reproducible, AND "one and done" [rinse/reboot, repeat]:),

One argument against it being a bug is: The md raid stuff is very mature (not bug-free to be sure, but it's been hammered hard); what's so unique about your array? especially in conjunction with the non-random "premise" (quoted above).

In any case, it would be comforting to know if it is safe to kluge a "workaround". Consider putting something like one of those dd;rm commands into the appropriate /etc/.../rc... file. "Ignorance is (close to) bliss."

For now, it might be helpful if we could create a timeline of "normal" transfer activity, followed by "hung" (just 5 slow), followed by "resumed normal", so we can correlate it to the timestamp of the 120-second kernel-message timeout()? I can envision a date;iostat...1 command covering the duration of a (provoking) dd command.

[Enough of my ramblings (for now)]
 

Mashie

Member
Jun 26, 2020
37
9
8
Thank you for your response.

1. The setup is default MDADM RAID6, I changed the stripe-cache from 256 to 16384 yesterday but it made no improvement for the stalling. I think this issue may have started after the array was expanded from 8 to 10 drives. The last two drives are not showing any errors at all though which doesn't help. I was hoping to buy 4 more drives today but the stock ETA has been pushed out another week so no idea when I will get hold of them. Once I have those I will fail drive 9 and 10 and have drive 11 and 12 pick up their duty.

2. All I have is helpful people online and Google. I have been searching on the error output and there have been posts about this bug all the way throughout kernel 4 as well as the first few builds of 5.0. After that the discussion died out without a conclusion.

3. It is very easy to know when the stalling is happening, I can hear it with the same 5 drives (1, 3, 7, 8 and 9) always rattling away like no tomorrow for about 2.5 to 4.5 minutes. My wife is referring to the computer as burping when it happens. You can see it in the output with the personalities above where there are 5 drives busy reading at 100-200k/s, that is what they are doing throughout the duration of the stall, no writing at all until it starts to work again. Any file transfer will just freeze until the stall is over, hence why I can see how far it has progressed. The provoking dd command made it fail at 1MB. During the stall you can't open the file browser (Thunar) nor read/write anything from the array.

I'm happy to try things as long as the array isn't nuked in the process. I have 67TB of Movies/TV shows on there which would be very painful to put back on.

Unless the array is stalling no syslog messages are generated about the IO system.
 

UhClem

just another Bozo on the bus
Jun 26, 2012
435
249
43
NH, USA
1.
I think this issue may have started after the array was expanded from 8 to 10 drives.
That's an interesting bit of info, and possibly a valid causal explanation. This elusive glitch of yours had to start somewhere, and that long and intensive -grow operation would be a candidate. By chance, can you recall anything "unusual" that happened during that operation? Power fail, brown-out ... [I assume that you had no reason to check (much less save) system logs from that duration.]
Once I have those I will fail drive 9 and 10 and have drive 11 and 12 pick up their duty.
I think that is a good plan, since anything (that can safely be done) might rectify the status quo. If that doesn't change things, maybe the next step will ... At 67/80 full, you're probably planning another expansion. Going from a 10- to a 12-drive array (cleanly) could do it.

2.
... I have been searching on the error output and there have been posts about this bug ...
(No offense, but) It is unlikely that you are able to determine that equivalence. (This is really hairy stuff. I'm a (very!) old Unix hack, so ... well, "trust me" :).) Of course, I could be mistaken--and I'd appreciate a link to what you feel is the most on-point example.

3.
... The provoking dd command made it fail at 1MB. ...
(Just to be sure) Did the stall start directly following (ie, within 1-2 seconds) of issuing the dd command? (vs the 2-5GB [cf: your first post])
 

Mashie

Member
Jun 26, 2020
37
9
8
1.

That's an interesting bit of info, and possibly a valid causal explanation. This elusive glitch of yours had to start somewhere, and that long and intensive -grow operation would be a candidate. By chance, can you recall anything "unusual" that happened during that operation? Power fail, brown-out ... [I assume that you had no reason to check (much less save) system logs from that duration.]

I think that is a good plan, since anything (that can safely be done) might rectify the status quo. If that doesn't change things, maybe the next step will ... At 67/80 full, you're probably planning another expansion. Going from a 10- to a 12-drive array (cleanly) could do it.
I have thankfully not had any incidents while expanding the array over the years, which I have done a few times now as I add 1-2 drives at a time. I started off with an array of 3 as a test and then did an expansion to 4 before putting any data at risk just to try the process. After that it was quickly expanded to 6 and kept on going. The original array was built on Xubuntu 18.04.

I have no logs from any of those times.

Good idea to try and just go from 10 to 12 drives first and see if that will fix it, saves me a few days of drive rebuilding if that is the case. :)

2.

(No offense, but) It is unlikely that you are able to determine that equivalence. (This is really hairy stuff. I'm a (very!) old Unix hack, so ... well, "trust me" :).) Of course, I could be mistaken--and I'd appreciate a link to what you feel is the most on-point example.
None taken, it was this particular thread that caught my eye while trying to figure out what was going on and the behaviour sounds very similar:


3.

(Just to be sure) Did the stall start directly following (ie, within 1-2 seconds) of issuing the dd command? (vs the 2-5GB [cf: your first post])
Correct, it was an instant stall and Thunar reported 1MB written to file until the stall ended.
 

Mashie

Member
Jun 26, 2020
37
9
8
I was looking at top while the stalling happened today and spotted this little process pop up for pretty much the entire duration of the stalling:

Code:
    303 root      20   0       0      0      0 D   1.0   0.0   0:00.56 kworker/u24:11+flush-9:0
Some googling later and one explanation was that this could be caused by fragmentation, even on ext4 partitions. One comment pointed out that a defragmentation of freshly expanded RAID arrays was a very good idea so here we are, e4defrag is running for the next week or so...

In other news, I managed to order 4 more drives so we shall see if they get here before or after the defragmentation is done.
 

UhClem

just another Bozo on the bus
Jun 26, 2012
435
249
43
NH, USA
!!! EDIT !!! I didn't see your defrag posting till after posting the below. (Had this thread opened, then switched to analog life, and neglected to refresh upon return. :)) Will be interesting to see the effect.
... go from 10 to 12 drives first and see if that will fix it ...
I do think that is the best next-step. [And, since the 8==>10 grow was not your first such, and the previous one(s) were (most likely) problem-free, be sure to follow the earlier procedure; just in case you (might have) deviated in the 8==>10.]
it was this particular thread that caught my eye while trying to figure out what was going on and the behaviour sounds very similar:
[sarcasm]very similar?? You mean like tangerines and tomatoes? [They are both fruits.][/sarcasm];) It ain't easy being a stranger in a strange land ...
Correct, it was an instant stall and Thunar reported 1MB written to file until the stall ended.
That is actually very good to hear.

I don't want to be pessimistic, but there's an ~even chance that the 8==>10 won't change anything (except capacity). In either case, just so you don't feel pressured into doing the -grow asap, it might be useful to devise a "coping plan" so that this wart has minimal impact on life-and-wife. E.g., you can minimize the frequency of shutdown/boot-up by seeing what procedures for Hibernate are available in Ubuntu.

And, we can force the stall-event to happen at the last stage of boot-up, and avoid the surprise annoyance, as currently. This might also avoid the rattling of those 5 disks, but that is just a "theory" I have. The 120 seconds of timeout is probably unavoidable. Details on this forcing can wait till later--just something to ponder in the meantime.

Next time you reboot (normally--no need to force the event), try to cause the stall with the following command:
Code:
dd if=/dev/zero of=/mnt/storage/40MBz bs=8M count=5 oflag=direct
 
Last edited: