PCIe Gen 4 bifurcation risers and NVMe SSDs

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

lunadesign

Active Member
Aug 7, 2013
256
34
28
UPDATES (for @UhClem, @NateS, @ectoplasmosis and others)

To test the hypothesis that the reduced performance when testing three Gen4 drives simultaneously vs one-at-a-time is due to inefficencies/overhead in Windows and/or CrystalDiskMark, I ran very similar tests on the same box using CentOS 8.4 and fio 3.19.

For each run, I ran 8 fio tests that are as close as possible to the 8 tests that CrystalDiskMark runs. In fact, when I ran one-drive-at-a-time, the CentOS/fio tests produced results that were reasonably close (+/-10%) to the Windows/CDM tests. The one notable exception was the RND4K Q32T16 write test where Linux was 40-42% faster.

With CentOS/fio, the deltas between one-at-a-time and three-at-a-time averages were:

Using xfs and libaio:
Read tests: -6% -9% -3% -12%
Write tests: +1% +1% -17% +1%

Using xfs and io_uring:
Read tests: -1% 0% +2% +2%
Write tests: -1% -1% -25% +2%

For comparison, Windows/CDM one-at-a-time vs three-at-a-time:
Read tests: -13% -31% -9% 0%
Write tests: -1% 0% -10% -42%

(Negative numbers indicate decrease in performance when going from one-at-a-time to three-at-a-time, positive numbers indicate increases.)

On Linux, the 3rd write test (RND4K Q32T16) was the biggest challenge when running three-at-a-time using libaio or io_uring. I could see the real-time performance on the individual drives bouncing around quite a bit during the test. There was enough variability that I wouldn't read too much into the fact that io_uring performed worse than libaio here. If I had more time, I would have run more tests for longer periods. However, since the Linux RND4K Q32T16 write performance was so much better than on Windows, the Linux three-at-a-time performance was still better than the Windows one-at-a-time performance.

BOTTOM LINE -- The Linux tests make me feel a lot more comfortable that this system and the Linkreal bifurcation card can handle multiple Gen4 NVMe SSDs under heavy load simultaneously.

Footnotes:

1) To get io_uring to work, I had to use the 5.13 Linux kernel instead of the 4.x one that comes with CentOS 8.4.

2) I tried to get PCIe Advanced Error Reporting (AER) working but not sure I was successful. After enabling it in the BIOS and booting the kernel in PCIe Native Mode, the Linux boot-time messages give conflicting messages about AER (some say it is not supported, others say it is enabled). I found a tool that allows one to inject dummy AER errors but I couldn't get it to work. Anyway, FWIW, I re-ran the RND4K Q32T16 write tests again but never saw any AER messages in the system log.
 

ectoplasmosis

Active Member
Jul 28, 2021
117
53
28
Thank you for this, it’s very helpful!

I have a couple of Supermicro Gen4 Retimer cards on order, will update this thread with results of my testing when they arrive.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
Thank you for this, it’s very helpful!

I have a couple of Supermicro Gen4 Retimer cards on order, will update this thread with results of my testing when they arrive.
Yes, please do!

BTW, what do you plan to connect those retimer cards to? IE, what cables/backplanes/etc?
 

ectoplasmosis

Active Member
Jul 28, 2021
117
53
28
Yes, please do!

BTW, what do you plan to connect those retimer cards to? IE, what cables/backplanes/etc?
I’ve ordered two AOC-SLG4-4E4T-O cards and four CBL-SAST-0953 50cm SlimSAS 8i to 2x NVMe SFF-8639 cables.

The cables will connect straight to Samsung PM9A3 U.2 Gen4 SSDs, no backplane/caddy.

I’ll be trialling RAIDIX software RAID0 in a 4-drive array, hoping for ~12GB/s sequential write and ~20GB/s sequential read on an EPYC 7443P storage server, used for ingesting and hosting uncompressed video.
 

TrumanHW

Active Member
Sep 16, 2018
253
34
28
Cards like the HighPoint SSD 7580 that have a PCIe switch chip on it, so, you should ask yourself what the goal for all this is.
Just for clarity sake, I have no need for PCIe 4.0 right now ... and as I have only NVMe 1.2 ... have no need for more than x4 lanes (and explicitly said I was going to be using the SuperMicro ReTimer card ... (if I did say I was intent on using the 7580 I apologize) but don't think I did..?

Was just sharing info on a product option in a category of products with very few options.
(But still grateful for any info bc though no need for it now, I don't know much about it and what if some day I did, so thanks.)

Just to connect all drives and suffer from pcie bottleneck with only 40 lanes per cpu as it'd be challenging to connect [all these to x4 per drive.
If I may ask ...
How many x4 NVMe drives CAN BE USED with 80 lanes (or 40 x 2 CPU ... lanes) ... (or safely used)
How many x4 NVMe drives IS TOO MANY to use with with (40 x 2 CPU ... lanes)..?

Thanks.
 

uldise

Active Member
Jul 2, 2020
209
72
28
How many x4 NVMe drives CAN BE USED with 80 lanes (or 40 x 2 CPU ... lanes) ... (or safely used)
How many x4 NVMe drives IS TOO MANY to use with with (40 x 2 CPU ... lanes)..?
it really depends on motherboard used. how many PCIe slots it have, and which ones.
so for example, look at this Supermicro system(but it's for 2x 48Lanes CPU) 2029UZ-TN20R25M | 2U | SuperServers | Products | Super Micro Computer, Inc.
there are 20 x4 NVME drives = 80 Lanes in without PCIe switches, and there are one additional x8 PCIe slot available too.. System have total 96 lanes - 88 = 8 lanes remaining to all other equipment like Ethernet controller or so. you can look and system manual to see how they connect them all.
 

lunadesign

Active Member
Aug 7, 2013
256
34
28
I’ve ordered two AOC-SLG4-4E4T-O cards and four CBL-SAST-0953 50cm SlimSAS 8i to 2x NVMe SFF-8639 cables.

The cables will connect straight to Samsung PM9A3 U.2 Gen4 SSDs, no backplane/caddy.

I’ll be trialling RAIDIX software RAID0 in a 4-drive array, hoping for ~12GB/s sequential write and ~20GB/s sequential read on an EPYC 7443P storage server, used for ingesting and hosting uncompressed video.
Nice! Please keep us posted on how it works out!
 

UhClem

just another Bozo on the bus
Jun 26, 2012
438
252
63
NH, USA
Just got the cards in.

They have 2x 8-lane Gen4 Montage M88RT40816 retimer chips per card: M88RT40816 | Montage Technology
...
Very good of you to (uncover,) detail, and make "public", the primary component on the card.
The net gets a new useful tidibt.

Interesting aside: After seeing this, I googled for "M88RT40816". The first 4 hits were normal; the 5th was to a intel.com PDF titled "PCI Express (PCIe) 4.0 Retimer Supplemental Features" [Link]. (A nice Intel technical doc-53 pages) But there is no mention of Montage or the Part_number in the doc. [DWIM: a 50+ year old hacker acronym Do What I Mean]
 
  • Like
Reactions: ectoplasmosis

ectoplasmosis

Active Member
Jul 28, 2021
117
53
28
Forgot to update this thread.

2x AOC-SLG4-4E4T-O driving 8x Samsung PM9A3 1.92TB U.2 Gen4 drives, in an EPYC 7443P 24-core system with 256GB 8-channel DDR4.

Currently running as an 8-way RAID0 mdadm array on a Proxmox host. Quick FIO direct=1 sequential benchmark showing ~50GB/s read, ~20GB/s write.
 

Attachments

  • Like
Reactions: vcc3

jpmomo

Active Member
Aug 12, 2018
531
192
43
those are pretty good results. I am assuming this is just sw based raid and relies on the cpu. What did you set the NPS= for the 7443P? I was testing with some highpoint 7505 cards using 8 m.2 nvme gen4 ssds. these raid cards also rely on the cpu. There was a fairly large boost in performance when switching to NPS=4. This was with a rome based cpu (7502 and 7452). My tests used 2 of the 7505 highpoint cards in what they referred to as cross sync mode. the NPS=4 setting was recommended by some doc on the highpoint site. Normally, you would try and pin the cpu to the numa node (ccd or ccx) that the card was also using. But, there was no explicit pining that I was able to do with the cards drivers.

I was able to get a little over 300Gbps for both read and write. Your read #s are much better and still wondering how you were able to do that with just sw raid. did you get a chance to test with raidix yet?
 

alexhaj

New Member
Jan 12, 2018
12
2
3
123
hi guys

just got this card and hooked it up to an asrock X570D4I-2T and in the bios set slot 7 pcie to 4x4x4x4 but my u.2 ssds are not showing up in proxmox or /dev/. can anyone help me?
 

ectoplasmosis

Active Member
Jul 28, 2021
117
53
28
those are pretty good results. I am assuming this is just sw based raid and relies on the cpu. What did you set the NPS= for the 7443P? I was testing with some highpoint 7505 cards using 8 m.2 nvme gen4 ssds. these raid cards also rely on the cpu. There was a fairly large boost in performance when switching to NPS=4. This was with a rome based cpu (7502 and 7452). My tests used 2 of the 7505 highpoint cards in what they referred to as cross sync mode. the NPS=4 setting was recommended by some doc on the highpoint site. Normally, you would try and pin the cpu to the numa node (ccd or ccx) that the card was also using. But, there was no explicit pining that I was able to do with the cards drivers.

I was able to get a little over 300Gbps for both read and write. Your read #s are much better and still wondering how you were able to do that with just sw raid. did you get a chance to test with raidix yet?
NPS set to Auto, equivalent to 1. Disabling CPU and DF C-States, enabling 'Power' determinism control, max cTDP, x2APIC and a few other tweaks had a large effect on throughput.

I've also found EPYC Milan to perform better than Rome with NVME arrays.

I've not tested with RAIDIX as I'm more than happy with the performance of simple mdadm RAID, so cannot justify the license cost for the time being.
 

jpmomo

Active Member
Aug 12, 2018
531
192
43
NPS set to Auto, equivalent to 1. Disabling CPU and DF C-States, enabling 'Power' determinism control, max cTDP, x2APIC and a few other tweaks had a large effect on throughput.

I've also found EPYC Milan to perform better than Rome with NVME arrays.

I've not tested with RAIDIX as I'm more than happy with the performance of simple mdadm RAID, so cannot justify the license cost for the time being.
Your testing encouraged me to take another shot at improving the results. This time I use 8 x seagate m.2 firecuda 530 2TB with the 2 highpoint 7505 cards. This was with a single intel 8352s ice lake cpu.

1644897419424.png
 
  • Like
Reactions: ectoplasmosis