couple of points :
- PCI Express 2.0 8x is capped at about 25.6Gbit/s = 3.2GB/s. It's 4Gbit a lane, 8 lanes = 32Gbit/s but it uses inefficient 8/10 encoding due to which you only have 8/10 real throughput so 32*8/10 = 25.6Gbit/s. This exactly is the reason why you didn't see more than 3.2GB/s = 25.6Gbit/s in any test of RAID controller or NIC hooked to PCI-Express 2.0 8x.
- PCI Express 3.0 4x is pretty much in the same position. The maximum has shifted from 3.2GB/s to about 3.6GB/s.
- PCI Express 3.0 8x has maximum at about 6.4GB/s = 51.2Gbit/s. Once again, you will hardly see any RAID controller faster than 6.4GB/s with any amount of SSDs hooked to it. Seagate has the fastest SSD in the world (nobody seen yet) with claimed 10GB/s throughput just because they use PCI Express 3.0 16x connection.
- beware where you place your cards. Correctly said already.
- beware how your motherboards are set in BIOS ! It's easily possible that 8x slots are 4x in reality and there's your limit. Out of 25.6Gbit/s, you are able to squeeze out ~21Gbit in tests because you have to generate so much data each second and every tiny microlatency (despite now showing in CPU utilization) adds up. Million times a second, it's a significant factor then.
- beware of any M2, U2 usage. It's seen quite often that some PCI slots have degraded performance when non-SATA storage is used in motherboard. Switch it off, if you can.
- you will NOT achieve 1200MB/s with EIGHT spinning drives you have. Those disks attack almost 200MB/s on the most outter tracks (longest) = when they are empty, but they fall to ~120MB/s on the inner shortest tracks when filled. You have 8 of those, so in the worst case you will be getting 8x 120MB/s = 960MB/s which is far cry away from your required target. Not talking about parallel operations from 2 workstations to which we get in a second.
- ZFS. How good are you in optimizing ZFS ? That is a science, believe me. Especially with writes which you will have a ton.
- RAID overhead. Don't even think about having the performance of all disks available in logical RAID. If you create RAID0 = striped vpool, well we might be getting close to simple linear scaling. With Raid5 (Raid-Z) or Raid6 (Raid-Z2), forget it instantly. Apologies for using Raid5/Raid6 naming convention, we all know it's incorrect with ZFS.
- you require TWO PARALLEL operations with those spinning disks. No way in the world you will achieve 50/50 performance when you split it. Let's say you have disk subsystem able to provide 3000MB/s either for reads or for writes. At the moment you will try to read AND write AT THE SAME TIME from the same subsystem, you will not have 1500MB/s for reading and 1500MB/s for writing no matter what RAID controllers, operating system, cache, RAM or networking adapters you have. I realistically ESTIMATE in such case you will have close to 200MB/s reads and 150MB/s writes, yes that much performance will be hit. Spinning drives are EXTREMELY bad in handling anything that is non-linear non-sequential. Even ZFS fragmentation due to copy-on-write mechanism WILL play EXTREME role after you use that filesystem a bit, after it will fill up a bit. Please note this is not american lawyer style speech "may have", it guaranteed will have performance impact.
The reason ? Seek times. Spinning drives have to move their heads back and forth. Forget linear scaling or linear split in performance. Even SSD drives do not scale that good - any model with 100.000 IOPS read, 50.000 IOPS write will not achieve 75.000 IOPS read/write combined together (see any Intel DC SSD specification, it's simply not mathematically averaged due to introduced latencies).
- introducing two or more separated pools : extremely clever idea. You will not achieve your goals without them. Eight disks in pool seem to be on the poor side to me, given the performance characteristics I would not go with less than 12 or probably 16, but even with dozen you will hit cruel laws of scalability and failure tolerance design. Split those disks evenly across controllers in order to load their CPUs and PCI bus equally. Don't put "read array" on controller1 and "write array" on controller2, mix 'em - half of "read array" disks on controller1, the other half on controller2. The same with "write array" disks.
- carefully choose your disks. I wouldn't go with Seagate and especially not with DM001 models you have (this is DESKTOP MODEL), but that's my personal choice only. Reasons ? Performance, longevity, no designation for 24x7 operation, heat/shock resistance (yes, snapping 12 disks into enclosure is not the same as having three in the desktop) and some others.
Executive summary : your project CAN NOT achieve requested 1200MB/s read and any amount of write throughput at the same time, as it is configured now. No matter how you try - due to disk subsystem and amount of physical disks you have. You will have hard times achieving that long-term with two or more separated pools, but it's doable with careful planning (and lots of low-level horsepower = physical disks).
I'm very sorry to say this bad news.
- PCI Express 2.0 8x is capped at about 25.6Gbit/s = 3.2GB/s. It's 4Gbit a lane, 8 lanes = 32Gbit/s but it uses inefficient 8/10 encoding due to which you only have 8/10 real throughput so 32*8/10 = 25.6Gbit/s. This exactly is the reason why you didn't see more than 3.2GB/s = 25.6Gbit/s in any test of RAID controller or NIC hooked to PCI-Express 2.0 8x.
- PCI Express 3.0 4x is pretty much in the same position. The maximum has shifted from 3.2GB/s to about 3.6GB/s.
- PCI Express 3.0 8x has maximum at about 6.4GB/s = 51.2Gbit/s. Once again, you will hardly see any RAID controller faster than 6.4GB/s with any amount of SSDs hooked to it. Seagate has the fastest SSD in the world (nobody seen yet) with claimed 10GB/s throughput just because they use PCI Express 3.0 16x connection.
- beware where you place your cards. Correctly said already.
- beware how your motherboards are set in BIOS ! It's easily possible that 8x slots are 4x in reality and there's your limit. Out of 25.6Gbit/s, you are able to squeeze out ~21Gbit in tests because you have to generate so much data each second and every tiny microlatency (despite now showing in CPU utilization) adds up. Million times a second, it's a significant factor then.
- beware of any M2, U2 usage. It's seen quite often that some PCI slots have degraded performance when non-SATA storage is used in motherboard. Switch it off, if you can.
- you will NOT achieve 1200MB/s with EIGHT spinning drives you have. Those disks attack almost 200MB/s on the most outter tracks (longest) = when they are empty, but they fall to ~120MB/s on the inner shortest tracks when filled. You have 8 of those, so in the worst case you will be getting 8x 120MB/s = 960MB/s which is far cry away from your required target. Not talking about parallel operations from 2 workstations to which we get in a second.
- ZFS. How good are you in optimizing ZFS ? That is a science, believe me. Especially with writes which you will have a ton.
- RAID overhead. Don't even think about having the performance of all disks available in logical RAID. If you create RAID0 = striped vpool, well we might be getting close to simple linear scaling. With Raid5 (Raid-Z) or Raid6 (Raid-Z2), forget it instantly. Apologies for using Raid5/Raid6 naming convention, we all know it's incorrect with ZFS.
- you require TWO PARALLEL operations with those spinning disks. No way in the world you will achieve 50/50 performance when you split it. Let's say you have disk subsystem able to provide 3000MB/s either for reads or for writes. At the moment you will try to read AND write AT THE SAME TIME from the same subsystem, you will not have 1500MB/s for reading and 1500MB/s for writing no matter what RAID controllers, operating system, cache, RAM or networking adapters you have. I realistically ESTIMATE in such case you will have close to 200MB/s reads and 150MB/s writes, yes that much performance will be hit. Spinning drives are EXTREMELY bad in handling anything that is non-linear non-sequential. Even ZFS fragmentation due to copy-on-write mechanism WILL play EXTREME role after you use that filesystem a bit, after it will fill up a bit. Please note this is not american lawyer style speech "may have", it guaranteed will have performance impact.
The reason ? Seek times. Spinning drives have to move their heads back and forth. Forget linear scaling or linear split in performance. Even SSD drives do not scale that good - any model with 100.000 IOPS read, 50.000 IOPS write will not achieve 75.000 IOPS read/write combined together (see any Intel DC SSD specification, it's simply not mathematically averaged due to introduced latencies).
- introducing two or more separated pools : extremely clever idea. You will not achieve your goals without them. Eight disks in pool seem to be on the poor side to me, given the performance characteristics I would not go with less than 12 or probably 16, but even with dozen you will hit cruel laws of scalability and failure tolerance design. Split those disks evenly across controllers in order to load their CPUs and PCI bus equally. Don't put "read array" on controller1 and "write array" on controller2, mix 'em - half of "read array" disks on controller1, the other half on controller2. The same with "write array" disks.
- carefully choose your disks. I wouldn't go with Seagate and especially not with DM001 models you have (this is DESKTOP MODEL), but that's my personal choice only. Reasons ? Performance, longevity, no designation for 24x7 operation, heat/shock resistance (yes, snapping 12 disks into enclosure is not the same as having three in the desktop) and some others.
Executive summary : your project CAN NOT achieve requested 1200MB/s read and any amount of write throughput at the same time, as it is configured now. No matter how you try - due to disk subsystem and amount of physical disks you have. You will have hard times achieving that long-term with two or more separated pools, but it's doable with careful planning (and lots of low-level horsepower = physical disks).
I'm very sorry to say this bad news.
Last edited: