Linux md raid1 with SSDs

BLinux · Aug 14, 2016

i was wondering what other people did about using Linux md raid1 on SSDs. on centos, there's a weekly scheduled job to do a check on the raid1, and although I thought check operations are mostly read-and-compare, it seems that it instead does a 'resync' from 1 SSD to the 2nd one. This of course, as I've found out, makes for an imbalanced wear on the SSDs.. the 2nd SSD gets several orders of magnitude more writes and than the 1st. so, a couple of questions:

1) what do you do to not cause such wear ? (i've chosen to schedule the check less frequently)

2) why does it mostly seem to write to the 2nd SSD during this operation? why can't such an operation be just a read-and-compare and report on error. how does linux md raid decide which copy of the data is correct when it detects inconsistencies in a mirror set?

3) is it possible to have it alternate between the 1st and 2nd SSD? so that it can even out the write wear?

EffrafaxOfWug · Aug 14, 2016

On debian at least, the scheduled check for mdadm is a cron job that triggers /usr/share/mdadm/checkarray. On RAID1 devices, this doesn't trigger a resync from one drive t'other but compares the blocks and reports at the end.

Mdadm checkarray - Thomas-Krenn-Wiki

What is it that your OS is doing that makes you think it's doing something different...?

Keljian · Aug 23, 2016

If I were using raid1, I would either be using Btrfs or zfs rather than mdraid, is there a reason you aren't?

Locutis · Nov 20, 2016

I have been using RAID1 with spinning drives since 2005. Starting this year I've decided to migrate my servers to Intel DC 3610 drives, running in RAID1. I haven't executed my plan yet. I use Xeon E3-1265Lv3 chips on SuperMicro boards, with 32GB ECC RAM. Presently using WD RE4 drives.

I plan on using the Intel DC 3610 200GB and ensure I underuse it by at least 25-50%, to allow for enough spare NAND. I plan on keeping Debian 7 as my main OS. I've never known about the RAID check you mention above. I'll have to implement it when I change my server later this year/early next year.

I use ext4 as my filesystem, along with mdadm. I have never had any reason to use anything else. Why would I want BTRFS or ZFS over this setup?

If you run the mdadm checkarray, if there is a hot spare, when an error is encountered during a check of a RAID1, will the hot spare automatically come online and copy data from a good drive to migrate over to the hot spare? I'm fortunate to have never run into mdadm raid issues, nor disk failures. Of course, I usually run a machine 3-4 years and then replace with all new hardware.

I have a new server that I setup in another city earlier this year that I'm going to check on next weekend. It has a Kingston enterprise SSD and an Intel consumer SSD in RAID1, along with a hot spare. It's a relatively light-use file server (4 users). I'll run the checkarray once I'm there and see what it finds/does. I don't want to "try" it on my main server here that's still running spinning rust.

Is the checkarray different on CentOS vs Debian (to go back to the OP question)? I'll have to check what happens when I execute it and report back what I find (if I can remember).

Locutis

EffrafaxOfWug · Nov 20, 2016

Locutis said:
I've never known about the RAID check you mention above. I'll have to implement it when I change my server later this year/early next year.

If you're already using Debian, it should already be scheduling this check automagically:

Code:

effrafax@wug:~# cat /etc/cron.d/mdadm
#
# cron.d/mdadm -- schedules periodic redundancy checks of MD devices
#
# Copyright © martin f. krafft <madduck@madduck.net>
# distributed under the terms of the Artistic Licence 2.0
#

# By default, run at 00:57 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
# changed to run at 0057 on a monday morning instead
57 0 * * 1 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi

Pretty sure the checkarray script is part of mdadm and thus should also be part of CentOS/RHEL but not at work at the moment (it's sunday evening and thus pub o'clock!) but no doubt someone else can comment on that (or 5s of google-fu could also manage).

If you've got a hot spare in an array, then when a drive fails mdadm should automatically start bringing the new drive into the fold. I have on occasion ran into scenarios where mdadm hasn't recognised a failing (but not yet fully "failed") drive and you get shitty array performance until you manually kibosh the drive.

Search

Linux md raid1 with SSDs

BLinux

cat lover server enthusiast

EffrafaxOfWug

Radioactive Member

Keljian

Active Member

Locutis

New Member

EffrafaxOfWug

Radioactive Member