Bad mdraid0 performance

fossxplorer · Mar 10, 2017

I'm wondering why i get such low numbers from my mdraid0 RAID setup. I've setup 2 mdraid0 of 2x SSD partitions and both perform like a single SSD when it comes to IOPS, but better with seq. read. Here are some numbers:
First RAID0 of 2x Intel S3700 400GB, using only 100GiB partitions for mdraid0:
[root@node02 ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 10 13:52:43 2017
Raid Level : raid0
Array Size : 209582080 (199.87 GiB 214.61 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Fri Mar 10 13:52:43 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : node02.linuxwonders.com:0 (local to host node02.linuxwonders.com)
UUID : eacd032d:297e40c1:e0eb2a9a:7f7439bd
Events : 0
Number Major Minor RaidDevice State
0 8 209 0 active sync /dev/sdn1
1 8 177 1 active sync /dev/sdl1

[root@node02 ~]# hdparm -t /dev/sd{n,l}1
/dev/sdn1:
Timing buffered disk reads: 1332 MB in 3.00 seconds = 443.96 MB/sec
/dev/sdl1:
Timing buffered disk reads: 1226 MB in 3.01 seconds = 407.96 MB/sec
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/sdn1 | egrep '(^ read)'
read : io=2274.1MB, bw=232881KB/s, iops=58220, runt= 10003msec
[root@node02 ~]#
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md0 | egrep '(^ read)'
read : io=2675.5MB, bw=273854KB/s, iops=68463, runt= 10004msec

IOPS of 68463 is terrible and just a bit above a single disk perf.
Second RAID0 of 2x Samsung PM853T 960GB/IMZ7GE960HMHP using only 100GiB partitions for mdraid0:
[root@node02 ~]# mdadm --detail /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Fri Mar 10 13:54:12 2017
Raid Level : raid0
Array Size : 209582080 (199.87 GiB 214.61 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Fri Mar 10 13:54:12 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : node02.linuxwonders.com:1 (local to host node02.linuxwonders.com)
UUID : 9fb4b518:8c69c399:8fe36006:1cb9f430
Events : 0
Number Major Minor RaidDevice State
0 8 97 0 active sync /dev/sdg1
1 8 193 1 active sync /dev/sdm1

[root@node02 ~]# hdparm -t /dev/sd{g,m}1
/dev/sdg1:
Timing buffered disk reads: 1092 MB in 3.00 seconds = 363.67 MB/sec
/dev/sdm1:
Timing buffered disk reads: 1370 MB in 3.00 seconds = 456.11 MB/sec

Notice that /dev/sdg performs much worse than exact same disk /dev/sdm!

[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/sdg1 | egrep '(^ read)'
read : io=2376.8MB, bw=243307KB/s, iops=60826, runt= 10003msec
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md1 | egrep '(^ read)'
read : io=2661.3MB, bw=272379KB/s, iops=68094, runt= 10005msec

Same bad IOPS here. Is the M1015 to blame? Since this is a 2 socket AMD NUMA node, should i consider IRQ affinity etc?
I'm using ServeRaid M1015 with the following FW, both RAID sets are on this controller:
Versions
================
Product Name : ServeRAID M1015 SAS/SATA Controller
Serial No : SP20109963
FW Package Build: 20.10.1-0052
..
..
Default Settings
================
Phy Polarity : 0
Phy PolaritySplit : 0
Background Rate : 30
Strip Size : 64kB
Flush Time : 4 seconds
Write Policy : WT
Read Policy : None
Cache When BBU Bad : Disabled
Cached IO : No
SMART Mode : Mode 6
Alarm Disable : No
Coercion Mode : 1GB
ZCR Config : Unknown

As i don't use RAID with M1015, should i worry about Write Policy that's set up write through?
On the contrary, at home on an HP Compaq Elite 8300 MT (Intel Core i7 16GB RAM), i've a mdraid0 of 2 x TOSHIBA THNSNJ200PCSZ that perform very well. It's connected to onboard SATA 3)
Single disk fio yields ~ 100000 IOPS so 185272 for the RAID0 device perfectly fine:

[root@localhost ~]# fio --name=randread --ioengine=libaio --iodepth=8 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md0 | egrep '(^ read)'
read : io=4000.0MB, bw=741089KB/s, iops=185272, runt= 5527msec

@BackupProphet maybe?

Patriot · Mar 10, 2017

you could just be misaligned on the second partition...

_alex · Mar 10, 2017

can you use onboard-SATA in this System?
would be easiest to see if the m1015 is to blame.
also, putting the Intel in the other system could help to get an idea where the origin is.
I have several mdadm Raid 10 that scale nearly linear with the number of mirrors striped (up to 4 mirrors/total of 8 SSD) via LSI 9207's.

for sure, checking for misslinged partitions is maybe the first thing to do.

MBastian · Mar 15, 2017

Hmm,

are you sure your're not limited by the PCI Interface? Do your single disk tests simultaneously
What does a `lspci -vvv` say? Look for "LnkSta:"

Search

Bad mdraid0 performance

fossxplorer

Active Member

Patriot

Moderator

_alex

Active Member

MBastian

Active Member