I have a 4U server with 24 drives (36 actually including a second chassis, but for testing and simplicities sake I'm working with just the 24 in the main chassis for now). I've had a number of hardware configurations in this system; Motherboards, CPUs, HBA controller cards, RAID controllers, SAS Expanders, etc. The current configuration is:
Motherboard: Intel SS2600CP2J
CPU: Two Intel Xeon E5-2670's
RAM: 64GB Samsung ECC PC3-8500R, 16x4GB
HBA: Three LSI SAS9201-16e
HDD: 24 mixed SATA drives, each ranging from 2TB to 6TB
Each HBA is installed in a PCI-Express full x8 slot (one of the motherboard slots operates at x4 under certain conditions, I'm not using that one.
In running various tests utilizing simultaneous disk reads, I am unable to break a total combined read speed of a little over 1GBps. This has been the case no matter what my past hardware configurations have been, yet I remain unable to break through that 1GBps barrier. My primary benchmarking/testing software is a combination of IOMeter, HDTune, and a Java tool written by the developer of FlexRAID. The Java tool is the easiest to use in my opinion, and roughly mirrors what I have seen in my experiments to date with IOMeter, so I'm pretty sure it's a solid tool for this purpose. Individually, the slowest disks read performance is 73MBps and the fastest is 201MBps (as tested repeatedly with the Java tool).
Realistically, I don't expect the simultaneous combined read speed to be 100% equal to the sum of the individual speeds (which calculates to roughly 3GBps), but I also don't expect this level of drop-off. When reading a 5GB random data test file (true random, not an empty file) from each disk simultaneously, starting with 2 disks, and each subsequent test adding another, up to the final test reading from all 24 at once: Performance ramps up essentially at a 1:1 ratio up to ~12 simultaneous disk reads. At ~12 disks, the total combined read speed is 1.3GBps. This is more or less equivalent the sum total of each drive having been read individually. For every disk added beyond that, instead of improving the combined read speed, each disk read is slowed down such that the final combined read speed never breaks 1.3GBps. Between each test Standby List cache is cleared using the tool 'EmptyStandbyList.exe' from the developer of Process Hacker. If this step is skipped, the test files are read near instantaneously from cache, instead of from the HDD.
Here's some raw data of the results from my most recent test (combined read speed in KBps while reading n drives simultaneously):
2=297,296
3=425,171
4=534,397
5=668,829
6=785,337
7=894,563
8=1,027,959
9=1,144,467
10=1,259,989
11=1,340,224
12=1,414,289
13=1,440,753
14=1,433,548
15=1,450,578
16=1,424,043
17=1,412,460
18=1,384,576
19=1,389,126
20=1,367,885
21=1,290,037
22=1,331,943
23=1,378,285
24=1,378,545
During these tests, not even one CPU core/thread is maxed out, and the total usage across all cores doesn't exceed roughly 35%. I'm hoping that somebody out there can help me either find my bottleneck, or assist in optimizing my system to improve performance. This is not purely for academic purposes, my end-goal is to improve the performance of operations in SnapRAID, which access all drives simultaneously. If there's any more information I can provide, or specific tests I should try, please let me know!
Motherboard: Intel SS2600CP2J
CPU: Two Intel Xeon E5-2670's
RAM: 64GB Samsung ECC PC3-8500R, 16x4GB
HBA: Three LSI SAS9201-16e
HDD: 24 mixed SATA drives, each ranging from 2TB to 6TB
Each HBA is installed in a PCI-Express full x8 slot (one of the motherboard slots operates at x4 under certain conditions, I'm not using that one.
In running various tests utilizing simultaneous disk reads, I am unable to break a total combined read speed of a little over 1GBps. This has been the case no matter what my past hardware configurations have been, yet I remain unable to break through that 1GBps barrier. My primary benchmarking/testing software is a combination of IOMeter, HDTune, and a Java tool written by the developer of FlexRAID. The Java tool is the easiest to use in my opinion, and roughly mirrors what I have seen in my experiments to date with IOMeter, so I'm pretty sure it's a solid tool for this purpose. Individually, the slowest disks read performance is 73MBps and the fastest is 201MBps (as tested repeatedly with the Java tool).
Realistically, I don't expect the simultaneous combined read speed to be 100% equal to the sum of the individual speeds (which calculates to roughly 3GBps), but I also don't expect this level of drop-off. When reading a 5GB random data test file (true random, not an empty file) from each disk simultaneously, starting with 2 disks, and each subsequent test adding another, up to the final test reading from all 24 at once: Performance ramps up essentially at a 1:1 ratio up to ~12 simultaneous disk reads. At ~12 disks, the total combined read speed is 1.3GBps. This is more or less equivalent the sum total of each drive having been read individually. For every disk added beyond that, instead of improving the combined read speed, each disk read is slowed down such that the final combined read speed never breaks 1.3GBps. Between each test Standby List cache is cleared using the tool 'EmptyStandbyList.exe' from the developer of Process Hacker. If this step is skipped, the test files are read near instantaneously from cache, instead of from the HDD.
Here's some raw data of the results from my most recent test (combined read speed in KBps while reading n drives simultaneously):
2=297,296
3=425,171
4=534,397
5=668,829
6=785,337
7=894,563
8=1,027,959
9=1,144,467
10=1,259,989
11=1,340,224
12=1,414,289
13=1,440,753
14=1,433,548
15=1,450,578
16=1,424,043
17=1,412,460
18=1,384,576
19=1,389,126
20=1,367,885
21=1,290,037
22=1,331,943
23=1,378,285
24=1,378,545
During these tests, not even one CPU core/thread is maxed out, and the total usage across all cores doesn't exceed roughly 35%. I'm hoping that somebody out there can help me either find my bottleneck, or assist in optimizing my system to improve performance. This is not purely for academic purposes, my end-goal is to improve the performance of operations in SnapRAID, which access all drives simultaneously. If there's any more information I can provide, or specific tests I should try, please let me know!