odd mdadm behavior

jbeyer · Aug 3, 2023

madadm is behaving in a way that I don't fully understand. I have a seven-drive RAID 6 array of 6TB drives (drives are sda - sdg) and a hot a spare (sdh). So with 2x parity drives for RAID 6, that is five drives of storage for 30TB. More on that in a moment.

Drive sdc showed some read errors (pasted at the bottom) that seem to have triggered the spare to be brought into the array. However, after the rebuild was complete (I think it was a 'rebuild'), mdadm doesn't show any drive in a degraded state, and sdh is now just an active drive. I'm fairly certain the size of the array grew as well to 36TB.

Note that I didn't issue any commands to mdadm, other than mdadm -D to monitor the progress of what was going on?

Does this make any sense to anyone? It seems like mdadm should show a degraded drive if it's going to activate the spare. And it also makes little sense to me that the array size grew from 30 to 36 TB.

It almost feels like mdadm decided to move sdh from a spare into the RAID array, but I certainly didn't ask it to do that.

Finally, dmesg shows some odd messages re: sdh and power and device resets (relavent log messages are again at the bottom).

Thanks in advance for any thoughts you have!

/dev/md124:
Version : 1.2
Creation Time : Mon Apr 20 00:08:20 2015
Raid Level : raid6
Array Size : 35162339328 (32.75 TiB 36.01 TB)
Used Dev Size : 5860389888 (5.46 TiB 6.00 TB)
Raid Devices : 8
Total Devices : 8
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Wed Aug 2 14:46:52 2023
State : clean
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Consistency Policy : bitmap

Name : localhost:export
UUID : 94dfc16f:5ba9e1a2:e31dda07:482141e3
Events : 3174657

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
8 8 49 3 active sync /dev/sdd1
4 8 65 4 active sync /dev/sde1
5 8 81 5 active sync /dev/sdf1
6 8 97 6 active sync /dev/sdg1
7 8 113 7 active sync /dev/sdh1

sdc errors in dmesg
[Sun Jul 30 11:32:23 2023] sd 0:0:2:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=4s
[Sun Jul 30 11:32:23 2023] sd 0:0:2:0: [sdc] tag#1 Sense Key : Medium Error [current] [descriptor]
[Sun Jul 30 11:32:23 2023] sd 0:0:2:0: [sdc] tag#1 Add. Sense: Unrecovered read error
[Sun Jul 30 11:32:23 2023] sd 0:0:2:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 01 bd 96 6e 00 00 00 01 00 00 00
[Sun Jul 30 11:32:23 2023] blk_update_request: critical medium error, dev sdc, sector 7475719856
[Sun Jul 30 13:55:37 2023] sd 0:0:2:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
[Sun Jul 30 13:55:37 2023] sd 0:0:2:0: [sdc] tag#1 Sense Key : Medium Error [current] [descriptor]
[Sun Jul 30 13:55:37 2023] sd 0:0:2:0: [sdc] tag#1 Add. Sense: Unrecovered read error
[Sun Jul 30 13:55:37 2023] sd 0:0:2:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 02 24 c9 a5 00 00 00 01 00 00 00
[Sun Jul 30 13:55:37 2023] blk_update_request: critical medium error, dev sdc, sector 9207129536
[Sun Jul 30 13:55:41 2023] sd 0:0:2:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
[Sun Jul 30 13:55:41 2023] sd 0:0:2:0: [sdc] tag#0 Sense Key : Medium Error [current] [descriptor]
[Sun Jul 30 13:55:41 2023] sd 0:0:2:0: [sdc] tag#0 Add. Sense: Unrecovered read error
[Sun Jul 30 13:55:41 2023] sd 0:0:2:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 02 24 c9 a5 c0 00 00 00 40 00 00
[Sun Jul 30 13:55:41 2023] blk_update_request: critical medium error, dev sdc, sector 9207129536
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127488 on sdc1)
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127496 on sdc1)
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127504 on sdc1)
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127512 on sdc1)
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127520 on sdc1)
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127528 on sdc1)
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127536 on sdc1)
[Sun Jul 30 13:55:41 2023] md/raid:md124: read error corrected (8 sectors at 9207127544 on sdc1)
[Sun Jul 30 13:55:45 2023] sd 0:0:2:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=4s
[Sun Jul 30 13:55:45 2023] sd 0:0:2:0: [sdc] tag#1 Sense Key : Medium Error [current] [descriptor]
[Sun Jul 30 13:55:45 2023] sd 0:0:2:0: [sdc] tag#1 Add. Sense: Unrecovered read error
[Sun Jul 30 13:55:45 2023] sd 0:0:2:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 02 24 c9 a8 00 00 00 01 00 00 00
[Sun Jul 30 13:55:45 2023] blk_update_request: critical medium error, dev sdc, sector 9207130176
[Sun Jul 30 13:55:49 2023] sd 0:0:2:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=4s
[Sun Jul 30 13:55:49 2023] sd 0:0:2:0: [sdc] tag#0 Sense Key : Medium Error [current] [descriptor]
[Sun Jul 30 13:55:49 2023] sd 0:0:2:0: [sdc] tag#0 Add. Sense: Unrecovered read error

sdh errors in dmesg
[Sat Jul 29 09:50:05 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 09:57:45 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 09:57:46 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:05:21 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:05:22 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:05:23 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:05:24 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:05:39 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:05:39 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:15:50 2023] sd 0:0:7:0: Power-on or device reset occurred
[Sat Jul 29 10:15:50 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:38:24 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:38:24 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:39:39 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:39:40 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:43:28 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:43:28 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:43:30 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:43:30 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:52:17 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:52:17 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:52:45 2023] sd 0:0:7:0: Power-on or device reset occurred
[Wed Aug 2 13:52:45 2023] sd 0:0:7:0: Power-on or device reset occurred

jbeyer · Aug 3, 2023

To add a few more data points, it looks like sdh (which was the spare) had issues a few months ago, and then when the server was rebooted, that was what triggered sdh to be put into the array, as opposed to continue as the spare. Does this make any sense?

May 23 07:23:11 apocalypse kernel: sd 0:0:7:0: [sdh] Synchronizing SCSI cache
May 23 07:23:11 apocalypse kernel: md: md124 still in use.
May 23 07:23:11 apocalypse kernel: md/raid:md124: Disk failure on sdh1, disabling device.#012md/raid:md124: Operation continuing on 7 devices.
May 23 07:23:12 apocalypse kernel: mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
May 23 07:23:12 apocalypse kernel: mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
May 23 07:23:12 apocalypse kernel: mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
May 23 07:23:12 apocalypse kernel: mpt2sas0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] CDB: Read(16) 88 00 00 00 00 00 c7 e1 d4 00 00 00 00 a8 00 00
May 23 07:23:12 apocalypse kernel: blk_update_request: I/O error, dev sdh, sector 3353465856
May 23 07:23:12 apocalypse kernel: blk_update_request: I/O error, dev sdh, sector 0
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] CDB: Write(16) 8a 00 00 00 00 00 8c 3e a9 b8 00 00 01 00 00 00
May 23 07:23:12 apocalypse kernel: blk_update_request: I/O error, dev sdh, sector 2352916920
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] CDB: Read(16) 88 00 00 00 00 00 f3 2a a4 28 00 00 00 d0 00 00
May 23 07:23:12 apocalypse kernel: blk_update_request: I/O error, dev sdh, sector 4079658024
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] CDB: Write(16) 8a 00 00 00 00 00 8c 3e a8 d0 00 00 00 e8 00 00
May 23 07:23:12 apocalypse kernel: blk_update_request: I/O error, dev sdh, sector 2352916688
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] CDB: Read(16) 88 00 00 00 00 00 c7 e1 d4 a8 00 00 00 80 00 00
May 23 07:23:12 apocalypse kernel: blk_update_request: I/O error, dev sdh, sector 3353466024
May 23 07:23:12 apocalypse kernel: sd 0:0:7:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 23 07:23:12 apocalypse kernel: mpt2sas0: removing handle(0x0010), sas_addr(0x4433221107000000)
May 23 07:23:50 apocalypse kernel: scsi 0:0:8:0: Direct-Access ATA WDC WD60EFRX-68M 0A82 PQ: 0 ANSI: 6
May 23 07:23:50 apocalypse kernel: scsi 0:0:8:0: SATA: handle(0x0010), sas_addr(0x4433221107000000), phy(7), device_name(0x0000000000000000)
May 23 07:23:50 apocalypse kernel: scsi 0:0:8:0: SATA: enclosure_logical_id(0x500605b002c88570), slot(4)
May 23 07:23:50 apocalypse kernel: scsi 0:0:8:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May 23 07:23:50 apocalypse kernel: scsi 0:0:8:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
May 23 07:23:50 apocalypse kernel: sd 0:0:8:0: Attached scsi generic sg7 type 0
May 23 07:23:50 apocalypse kernel: sd 0:0:8:0: [sdm] 11721045168 512-byte logical blocks: (6.00 TB/5.45 TiB)
May 23 07:23:50 apocalypse kernel: sd 0:0:8:0: [sdm] 4096-byte physical blocks
May 23 07:23:50 apocalypse kernel: sd 0:0:8:0: [sdm] Write Protect is off
May 23 07:23:50 apocalypse kernel: sd 0:0:8:0: [sdm] Write cache: enabled, read cache: enabled, supports DPO and FUA
May 23 07:23:50 apocalypse kernel: sdm: sdm1
May 23 07:23:50 apocalypse kernel: sd 0:0:8:0: [sdm] Attached SCSI disk

Pete.S. · Aug 4, 2023

Well, it doesn't look like you created a 7 drive RAID6 with one hot spare. It looks like you created an 8 drive RAID6. Then you had I/O errors on two of the drives (sdc & sdh).

If you did create a hot spare you should have been able to find when it was being activated in the log files.
Something like

Code:

mdadm: /dev/md124: hot spare /dev/sdh1 activated

You should also be able to find in the log files when the resync process was started and completed.

jbeyer · Aug 4, 2023

Thanks Pete. That was the conclusion that I had basically reached on my own. I am not an mdadm expert, but I'm sort of surprised that I would have accidentally added the eighth drive into the array, as opposed to adding it as a spare. So it took me a while to admit to myself that I screwed that up so royally.

If I HAD properly set it up as 7-drive RAID 6, with a hot spare, which should have been 30 TB, it makes no sense that mdadm would, on it's own, turn it into a 36 TB array, right?

Pete.S. · Aug 4, 2023

jbeyer said:
Thanks Pete. That was the conclusion that I had basically reached on my own. I am not an mdadm expert, but I'm sort of surprised that I would have accidentally added the eighth drive into the array, as opposed to adding it as a spare. So it took me a while to admit to myself that I screwed that up so royally.

If I HAD properly set it up as 7-drive RAID 6, with a hot spare, which should have been 30 TB, it makes no sense that mdadm would, on it's own, turn it into a 36 TB array, right?

No, it wouldn't make sense.

One thing that can really mess with your head is when you play around adding drives and spares, removing drives, re-adding drives etc. There is a signature on each drive so that md knows what drives belongs to what array. I managed to get strange combinations when inserting an "empty" drive that actually had a raid signature on it. So it best to remove raid signatures from empty drives with wipefs or similar.

jbeyer · Aug 4, 2023

Thanks Pete.

You seem like you know more about mdadm than I do. While I have your attention, maybe I could ask you one related question that I haven't found a good answer to through Google, SO, etc. I have read that with modern, larger disks, it is fairly common that in rebuilding an array, you hit a read error on one drive in a sector that hasn't been accessed in a while, and the whole rebuild just fails. I'm scared that I'll end up here when I do a rebuild to pull out /dev/sdc and then /dev/sdh.
What is the solution if that occurs? And is there a term that I should be searching for to better understand this corner case? The disks have huge video files, so I'd like a rebuild to just "move on" if it encounters a bad sector. Junk data on one sector after the rebuild is LIKELY fine. It just means one of many video files is corrupt. But I'm sort of terrified to start a rebuild, have a bad read somewhere, have the rebuild fail, and end up in some weird purgatory state. And lest you think it's pirated media, this is media for a video production company!

Any thoughts or experience here?

Many thanks in advance!

Pete.S. · Aug 4, 2023

One other thing that is good to do is to take a look at your /etc/mdadm/mdadm.conf file.
You create an array and everything looks good but after a reboot things have changed.

md (the device driver in the kernel) is what actually runs your raid array and mdadm is the management tool.

Something the autoassemble process of the md driver doesn't do what you want. That's why you want to check out mdadm.conf

Pete.S. · Aug 4, 2023

jbeyer said:
Thanks Pete.

You seem like you know more about mdadm than I do. While I have your attention, maybe I could ask you one related question that I haven't found a good answer to through Google, SO, etc. I have read that with modern, larger disks, it is fairly common that in rebuilding an array, you hit a read error on one drive in a sector that hasn't been accessed in a while, and the whole rebuild just fails. I'm scared that I'll end up here when I do a rebuild to pull out /dev/sdc and then /dev/sdh.
What is the solution if that occurs? And is there a term that I should be searching for to better understand this corner case? The disks have huge video files, so I'd like a rebuild to just "move on" if it encounters a bad sector. Junk data on one sector after the rebuild is LIKELY fine. It just means one of many video files is corrupt. But I'm sort of terrified to start a rebuild, have a bad read somewhere, have the rebuild fail, and end up in some weird purgatory state. And lest you think it's pirated media, this is media for a video production company!

Any thoughts or experience here?

Many thanks in advance!

Thanks, but my mdadm skills are only so-so and I haven't kept up with the latest developments. You'll find the real experts on the mdadm developers mailing list, called linux-raid. This is their wiki: Linux Raid Wiki

To your question though: first, most distros set up a periodically occuring cron job that scrubs the data on the disk, basically checking all drives for errors. That detects most of those kinds of errors because all data is actually read on all drives.

But you could potentially have a real I/O error in some situations. On RAID-6 it could be if two drives fails completely and then one of the other drives have I/O errors on some sector. This could happen when failures goes unnoticed.

The solution in these scenarios is shutdown the array completely and take the almost working drive and clone it to a new drive using dd with the conv=noerror, sync option. That creates an exact copy of all data that you can possibly read from the drive. Then you replace the faulty drive with the clone and now you can rebuild.

Search

odd mdadm behavior

jbeyer

New Member

jbeyer

New Member

Pete.S.

Member

jbeyer

New Member

Pete.S.

Member

jbeyer

New Member

Pete.S.

Member

Pete.S.

Member