UPDATES (for @UhClem, @NateS, @ectoplasmosis and others)
To test the hypothesis that the reduced performance when testing three Gen4 drives simultaneously vs one-at-a-time is due to inefficencies/overhead in Windows and/or CrystalDiskMark, I ran very similar tests on the same box using CentOS 8.4 and fio 3.19.
For each run, I ran 8 fio tests that are as close as possible to the 8 tests that CrystalDiskMark runs. In fact, when I ran one-drive-at-a-time, the CentOS/fio tests produced results that were reasonably close (+/-10%) to the Windows/CDM tests. The one notable exception was the RND4K Q32T16 write test where Linux was 40-42% faster.
With CentOS/fio, the deltas between one-at-a-time and three-at-a-time averages were:
Using xfs and libaio:
Read tests: -6% -9% -3% -12%
Write tests: +1% +1% -17% +1%
Using xfs and io_uring:
Read tests: -1% 0% +2% +2%
Write tests: -1% -1% -25% +2%
For comparison, Windows/CDM one-at-a-time vs three-at-a-time:
Read tests: -13% -31% -9% 0%
Write tests: -1% 0% -10% -42%
(Negative numbers indicate decrease in performance when going from one-at-a-time to three-at-a-time, positive numbers indicate increases.)
On Linux, the 3rd write test (RND4K Q32T16) was the biggest challenge when running three-at-a-time using libaio or io_uring. I could see the real-time performance on the individual drives bouncing around quite a bit during the test. There was enough variability that I wouldn't read too much into the fact that io_uring performed worse than libaio here. If I had more time, I would have run more tests for longer periods. However, since the Linux RND4K Q32T16 write performance was so much better than on Windows, the Linux three-at-a-time performance was still better than the Windows one-at-a-time performance.
BOTTOM LINE -- The Linux tests make me feel a lot more comfortable that this system and the Linkreal bifurcation card can handle multiple Gen4 NVMe SSDs under heavy load simultaneously.
Footnotes:
1) To get io_uring to work, I had to use the 5.13 Linux kernel instead of the 4.x one that comes with CentOS 8.4.
2) I tried to get PCIe Advanced Error Reporting (AER) working but not sure I was successful. After enabling it in the BIOS and booting the kernel in PCIe Native Mode, the Linux boot-time messages give conflicting messages about AER (some say it is not supported, others say it is enabled). I found a tool that allows one to inject dummy AER errors but I couldn't get it to work. Anyway, FWIW, I re-ran the RND4K Q32T16 write tests again but never saw any AER messages in the system log.
To test the hypothesis that the reduced performance when testing three Gen4 drives simultaneously vs one-at-a-time is due to inefficencies/overhead in Windows and/or CrystalDiskMark, I ran very similar tests on the same box using CentOS 8.4 and fio 3.19.
For each run, I ran 8 fio tests that are as close as possible to the 8 tests that CrystalDiskMark runs. In fact, when I ran one-drive-at-a-time, the CentOS/fio tests produced results that were reasonably close (+/-10%) to the Windows/CDM tests. The one notable exception was the RND4K Q32T16 write test where Linux was 40-42% faster.
With CentOS/fio, the deltas between one-at-a-time and three-at-a-time averages were:
Using xfs and libaio:
Read tests: -6% -9% -3% -12%
Write tests: +1% +1% -17% +1%
Using xfs and io_uring:
Read tests: -1% 0% +2% +2%
Write tests: -1% -1% -25% +2%
For comparison, Windows/CDM one-at-a-time vs three-at-a-time:
Read tests: -13% -31% -9% 0%
Write tests: -1% 0% -10% -42%
(Negative numbers indicate decrease in performance when going from one-at-a-time to three-at-a-time, positive numbers indicate increases.)
On Linux, the 3rd write test (RND4K Q32T16) was the biggest challenge when running three-at-a-time using libaio or io_uring. I could see the real-time performance on the individual drives bouncing around quite a bit during the test. There was enough variability that I wouldn't read too much into the fact that io_uring performed worse than libaio here. If I had more time, I would have run more tests for longer periods. However, since the Linux RND4K Q32T16 write performance was so much better than on Windows, the Linux three-at-a-time performance was still better than the Windows one-at-a-time performance.
BOTTOM LINE -- The Linux tests make me feel a lot more comfortable that this system and the Linkreal bifurcation card can handle multiple Gen4 NVMe SSDs under heavy load simultaneously.
Footnotes:
1) To get io_uring to work, I had to use the 5.13 Linux kernel instead of the 4.x one that comes with CentOS 8.4.
2) I tried to get PCIe Advanced Error Reporting (AER) working but not sure I was successful. After enabling it in the BIOS and booting the kernel in PCIe Native Mode, the Linux boot-time messages give conflicting messages about AER (some say it is not supported, others say it is enabled). I found a tool that allows one to inject dummy AER errors but I couldn't get it to work. Anyway, FWIW, I re-ran the RND4K Q32T16 write tests again but never saw any AER messages in the system log.