I'm wondering why i get such low numbers from my mdraid0 RAID setup. I've setup 2 mdraid0 of 2x SSD partitions and both perform like a single SSD when it comes to IOPS, but better with seq. read. Here are some numbers:
First RAID0 of 2x Intel S3700 400GB, using only 100GiB partitions for mdraid0:
[root@node02 ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 10 13:52:43 2017
Raid Level : raid0
Array Size : 209582080 (199.87 GiB 214.61 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Fri Mar 10 13:52:43 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : node02.linuxwonders.com:0 (local to host node02.linuxwonders.com)
UUID : eacd032d:297e40c1:e0eb2a9a:7f7439bd
Events : 0
Number Major Minor RaidDevice State
0 8 209 0 active sync /dev/sdn1
1 8 177 1 active sync /dev/sdl1
[root@node02 ~]# hdparm -t /dev/sd{n,l}1
/dev/sdn1:
Timing buffered disk reads: 1332 MB in 3.00 seconds = 443.96 MB/sec
/dev/sdl1:
Timing buffered disk reads: 1226 MB in 3.01 seconds = 407.96 MB/sec
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/sdn1 | egrep '(^ read)'
read : io=2274.1MB, bw=232881KB/s, iops=58220, runt= 10003msec
[root@node02 ~]#
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md0 | egrep '(^ read)'
read : io=2675.5MB, bw=273854KB/s, iops=68463, runt= 10004msec
IOPS of 68463 is terrible and just a bit above a single disk perf.
Second RAID0 of 2x Samsung PM853T 960GB/IMZ7GE960HMHP using only 100GiB partitions for mdraid0:
[root@node02 ~]# mdadm --detail /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Fri Mar 10 13:54:12 2017
Raid Level : raid0
Array Size : 209582080 (199.87 GiB 214.61 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Fri Mar 10 13:54:12 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : node02.linuxwonders.com:1 (local to host node02.linuxwonders.com)
UUID : 9fb4b518:8c69c399:8fe36006:1cb9f430
Events : 0
Number Major Minor RaidDevice State
0 8 97 0 active sync /dev/sdg1
1 8 193 1 active sync /dev/sdm1
[root@node02 ~]# hdparm -t /dev/sd{g,m}1
/dev/sdg1:
Timing buffered disk reads: 1092 MB in 3.00 seconds = 363.67 MB/sec
/dev/sdm1:
Timing buffered disk reads: 1370 MB in 3.00 seconds = 456.11 MB/sec
Notice that /dev/sdg performs much worse than exact same disk /dev/sdm!
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/sdg1 | egrep '(^ read)'
read : io=2376.8MB, bw=243307KB/s, iops=60826, runt= 10003msec
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md1 | egrep '(^ read)'
read : io=2661.3MB, bw=272379KB/s, iops=68094, runt= 10005msec
Same bad IOPS here. Is the M1015 to blame? Since this is a 2 socket AMD NUMA node, should i consider IRQ affinity etc?
I'm using ServeRaid M1015 with the following FW, both RAID sets are on this controller:
Versions
================
Product Name : ServeRAID M1015 SAS/SATA Controller
Serial No : SP20109963
FW Package Build: 20.10.1-0052
..
..
Default Settings
================
Phy Polarity : 0
Phy PolaritySplit : 0
Background Rate : 30
Strip Size : 64kB
Flush Time : 4 seconds
Write Policy : WT
Read Policy : None
Cache When BBU Bad : Disabled
Cached IO : No
SMART Mode : Mode 6
Alarm Disable : No
Coercion Mode : 1GB
ZCR Config : Unknown
As i don't use RAID with M1015, should i worry about Write Policy that's set up write through?
On the contrary, at home on an HP Compaq Elite 8300 MT (Intel Core i7 16GB RAM), i've a mdraid0 of 2 x TOSHIBA THNSNJ200PCSZ that perform very well. It's connected to onboard SATA 3)
Single disk fio yields ~ 100000 IOPS so 185272 for the RAID0 device perfectly fine:
[root@localhost ~]# fio --name=randread --ioengine=libaio --iodepth=8 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md0 | egrep '(^ read)'
read : io=4000.0MB, bw=741089KB/s, iops=185272, runt= 5527msec
@BackupProphet maybe?
First RAID0 of 2x Intel S3700 400GB, using only 100GiB partitions for mdraid0:
[root@node02 ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 10 13:52:43 2017
Raid Level : raid0
Array Size : 209582080 (199.87 GiB 214.61 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Fri Mar 10 13:52:43 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : node02.linuxwonders.com:0 (local to host node02.linuxwonders.com)
UUID : eacd032d:297e40c1:e0eb2a9a:7f7439bd
Events : 0
Number Major Minor RaidDevice State
0 8 209 0 active sync /dev/sdn1
1 8 177 1 active sync /dev/sdl1
[root@node02 ~]# hdparm -t /dev/sd{n,l}1
/dev/sdn1:
Timing buffered disk reads: 1332 MB in 3.00 seconds = 443.96 MB/sec
/dev/sdl1:
Timing buffered disk reads: 1226 MB in 3.01 seconds = 407.96 MB/sec
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/sdn1 | egrep '(^ read)'
read : io=2274.1MB, bw=232881KB/s, iops=58220, runt= 10003msec
[root@node02 ~]#
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md0 | egrep '(^ read)'
read : io=2675.5MB, bw=273854KB/s, iops=68463, runt= 10004msec
IOPS of 68463 is terrible and just a bit above a single disk perf.
Second RAID0 of 2x Samsung PM853T 960GB/IMZ7GE960HMHP using only 100GiB partitions for mdraid0:
[root@node02 ~]# mdadm --detail /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Fri Mar 10 13:54:12 2017
Raid Level : raid0
Array Size : 209582080 (199.87 GiB 214.61 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Fri Mar 10 13:54:12 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : node02.linuxwonders.com:1 (local to host node02.linuxwonders.com)
UUID : 9fb4b518:8c69c399:8fe36006:1cb9f430
Events : 0
Number Major Minor RaidDevice State
0 8 97 0 active sync /dev/sdg1
1 8 193 1 active sync /dev/sdm1
[root@node02 ~]# hdparm -t /dev/sd{g,m}1
/dev/sdg1:
Timing buffered disk reads: 1092 MB in 3.00 seconds = 363.67 MB/sec
/dev/sdm1:
Timing buffered disk reads: 1370 MB in 3.00 seconds = 456.11 MB/sec
Notice that /dev/sdg performs much worse than exact same disk /dev/sdm!
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/sdg1 | egrep '(^ read)'
read : io=2376.8MB, bw=243307KB/s, iops=60826, runt= 10003msec
[root@node02 ~]# fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md1 | egrep '(^ read)'
read : io=2661.3MB, bw=272379KB/s, iops=68094, runt= 10005msec
Same bad IOPS here. Is the M1015 to blame? Since this is a 2 socket AMD NUMA node, should i consider IRQ affinity etc?
I'm using ServeRaid M1015 with the following FW, both RAID sets are on this controller:
Versions
================
Product Name : ServeRAID M1015 SAS/SATA Controller
Serial No : SP20109963
FW Package Build: 20.10.1-0052
..
..
Default Settings
================
Phy Polarity : 0
Phy PolaritySplit : 0
Background Rate : 30
Strip Size : 64kB
Flush Time : 4 seconds
Write Policy : WT
Read Policy : None
Cache When BBU Bad : Disabled
Cached IO : No
SMART Mode : Mode 6
Alarm Disable : No
Coercion Mode : 1GB
ZCR Config : Unknown
As i don't use RAID with M1015, should i worry about Write Policy that's set up write through?
On the contrary, at home on an HP Compaq Elite 8300 MT (Intel Core i7 16GB RAM), i've a mdraid0 of 2 x TOSHIBA THNSNJ200PCSZ that perform very well. It's connected to onboard SATA 3)
Single disk fio yields ~ 100000 IOPS so 185272 for the RAID0 device perfectly fine:
[root@localhost ~]# fio --name=randread --ioengine=libaio --iodepth=8 --rw=randread --bs=4K --direct=1 --size=500M --numjobs=8 --runtime=10 --group_reporting --filename=/dev/md0 | egrep '(^ read)'
read : io=4000.0MB, bw=741089KB/s, iops=185272, runt= 5527msec
@BackupProphet maybe?
Last edited: