Yeah, these tests should see massive write combining for large transfers. I mean, even if they were synchronous writes when I give it the easiest possible target, a 1MiB write it doesn't appear to help. At the filesystem layer it should be allocated and then written in a large transfer down to the md layer. Then at the MD layer they should be broken up into stripes and then write combined before writing to the devices. A bunch of splitting and coalescing should be happening for these transfers and I don't see any evidence of that.I don't believe that.
Based on single device performance I believe I should conservatively be seeing >2GB/sec and possibly >3GB/sec writes for favorable workloads like 1MB writes.