Multi-NVMe (m.2, u.2) adapters that do not require bifurcation

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

abq

Active Member
May 23, 2015
728
231
43
https://www.aliexpress.us/item/3256805484706547.html for the US, anyway, but AliExpress tends to vary products and prices a lot by region.

Edit: I'm tempted to grab one even though my boards all support bifurcation and I already have a spare 8 port switch...
Thank You for posting link to this great deal ! ...I see you can purchase with 4 cable for $174.14 (~$30 per cable, so a bit pricey). Is there a cheaper source for these semi-unique U.2 drive NVME cables?
 

potrebitel

Member
May 8, 2020
35
15
8
Thank You for posting link to this great deal ! ...I see you can purchase with 4 cable for $174.14 (~$30 per cable, so a bit pricey). Is there a cheaper source for these semi-unique U.2 drive NVME cables?
$20+ per cable is a normal price for SFF-8654 8i to 2x SFF-8639 (U.2)
 
  • Like
Reactions: abq

panchovix

Member
Nov 11, 2025
58
15
8
Just a head up that some PLX88XXX switches from this store are quite cheap. They seem to have a good reputation so far. Also there are some offers + coupons for summer/winter sale just now.

https://es.aliexpress.com/item/1005010379686637.html PLX88024 (X8 4.0 to 4*X4 4.0) for 108 USD.

https://es.aliexpress.com/item/1005010385039092.html PLX88096 (X16 4.0 SlimSAS uplink to 4*X16 4.0 + 1*X16 4.0 SlimSAS downstream) for 409 USD.

https://es.aliexpress.com/item/1005010386414574.html PLX88096 (X16 4.0 to 5*X16 4.0 SlimSAS downlink, or 10*X8 4.0, or 20*X4 4.0 SlimSAS downstream) for 386 USD.

https://es.aliexpress.com/item/1005010379841235.html Older PLX8747 (X8 3.0 to 4*X4 3.0) for 49USD, quite cheap.
 

panchovix

Member
Nov 11, 2025
58
15
8
Finally connected all the switches and GPUs on my PC.

That took a while (like 3 days lol), multiple cables and such, but it works!

Basically it is like this:

Code:
PM50100 Switch (01:00.0)
├── Port 02.0 → GPU2 (5090) direct
├── Port 03.0 → PLX88096 (cascaded)
│   └── Complex internal structure:
│       ├── GPU0 (4090)
│       ├── GPU1 (4090)
│       ├── GPU4 (A40)
│       ├── GPU5 (A6000)
│       └── GPU6 (3090)
└── Port 04.0 → GPU3 (5090) direct
Topology looks like this (GPU 0 and 1: 4090s, 2 and 3: 5090s, 4,5 and 6: A6000, A40 and 3090):

Code:
pancho@fedora:~/cuda-samples/build/Samples/5_Domain_Specific/p2pBandwidthLatencyTest$ nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    NIC0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PXB     PXB     PXB     PXB     PXB     PIX     PHB     0-23    0               N/A
GPU1    PXB      X      PXB     PXB     PXB     PXB     PXB     PHB     0-23    0               N/A
GPU2    PXB     PXB      X      PIX     PXB     PXB     PXB     PHB     0-23    0               N/A
GPU3    PXB     PXB     PIX      X      PXB     PXB     PXB     PHB     0-23    0               N/A
GPU4    PXB     PXB     PXB     PXB      X      PIX     PXB     PHB     0-23    0               N/A
GPU5    PXB     PXB     PXB     PXB     PIX      X      PXB     PHB     0-23    0               N/A
GPU6    PIX     PXB     PXB     PXB     PXB     PXB      X      PHB     0-23    0               N/A
NIC0    PHB     PHB     PHB     PHB     PHB     PHB     PHB      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx4_0
Bandwidth and latency looks like this:

Code:
pancho@fedora:~/cuda-samples/build/Samples/5_Domain_Specific/p2pBandwidthLatencyTest$ ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 4090, pciBusID: e, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 4090, pciBusID: 11, pciDeviceID: 0, pciDomainID:0
Device: 2, NVIDIA GeForce RTX 5090, pciBusID: 5, pciDeviceID: 0, pciDomainID:0
Device: 3, NVIDIA GeForce RTX 5090, pciBusID: 18, pciDeviceID: 0, pciDomainID:0
Device: 4, NVIDIA A40, pciBusID: d, pciDeviceID: 0, pciDomainID:0
Device: 5, NVIDIA RTX A6000, pciBusID: 12, pciDeviceID: 0, pciDomainID:0
Device: 6, NVIDIA GeForce RTX 3090, pciBusID: a, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CANNOT Access Peer Device=2
Device=0 CANNOT Access Peer Device=3
Device=0 CANNOT Access Peer Device=4
Device=0 CANNOT Access Peer Device=5
Device=0 CANNOT Access Peer Device=6
Device=1 CAN Access Peer Device=0
Device=1 CANNOT Access Peer Device=2
Device=1 CANNOT Access Peer Device=3
Device=1 CANNOT Access Peer Device=4
Device=1 CANNOT Access Peer Device=5
Device=1 CANNOT Access Peer Device=6
Device=2 CANNOT Access Peer Device=0
Device=2 CANNOT Access Peer Device=1
Device=2 CAN Access Peer Device=3
Device=2 CANNOT Access Peer Device=4
Device=2 CANNOT Access Peer Device=5
Device=2 CANNOT Access Peer Device=6
Device=3 CANNOT Access Peer Device=0
Device=3 CANNOT Access Peer Device=1
Device=3 CAN Access Peer Device=2
Device=3 CANNOT Access Peer Device=4
Device=3 CANNOT Access Peer Device=5
Device=3 CANNOT Access Peer Device=6
Device=4 CANNOT Access Peer Device=0
Device=4 CANNOT Access Peer Device=1
Device=4 CANNOT Access Peer Device=2
Device=4 CANNOT Access Peer Device=3
Device=4 CAN Access Peer Device=5
Device=4 CAN Access Peer Device=6
Device=5 CANNOT Access Peer Device=0
Device=5 CANNOT Access Peer Device=1
Device=5 CANNOT Access Peer Device=2
Device=5 CANNOT Access Peer Device=3
Device=5 CAN Access Peer Device=4
Device=5 CAN Access Peer Device=6
Device=6 CANNOT Access Peer Device=0
Device=6 CANNOT Access Peer Device=1
Device=6 CANNOT Access Peer Device=2
Device=6 CANNOT Access Peer Device=3
Device=6 CAN Access Peer Device=4
Device=6 CAN Access Peer Device=5

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
     D\D     0     1     2     3     4     5     6
     0       1     1     0     0     0     0     0
     1       1     1     0     0     0     0     0
     2       0     0     1     1     0     0     0
     3       0     0     1     1     0     0     0
     4       0     0     0     0     1     1     1
     5       0     0     0     0     1     1     1
     6       0     0     0     0     1     1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1036.83  16.32  24.58  24.58  16.28  16.28  10.68
     1  16.33 999.68  24.58  24.58  16.28  16.28  10.67
     2  23.32  23.32 1783.68  33.13  23.17  23.17  14.15
     3  23.33  23.33  33.01 1775.57  23.16  23.17  14.14
     4  16.32  16.33  24.35  24.37 643.80  16.29  10.69
     5  16.32  16.32  24.39  24.39  16.27 765.93  10.71
     6  10.66  10.94  14.85  15.02  10.64  10.60 903.70
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1039.59  26.36  24.59  24.60  16.28  16.28  10.65
     1  26.36 1017.25  24.57  24.58  16.28  16.28  10.68
     2  23.25  23.33 1763.54  57.28  23.16  23.20  14.16
     3  23.26  23.33  57.25 1763.61  23.18  23.20  14.06
     4  16.30  16.33  24.37  24.36 644.86  26.36  26.36
     5  16.29  16.32  24.39  24.39  26.36 766.68  26.36
     6  10.98  10.79  14.70  15.00  26.37  26.36 904.75
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1047.25  18.94  29.60  29.62  18.76  18.95  11.90
     1  18.94 1002.25  29.55  29.66  18.68  18.92  11.88
     2  27.33  27.36 1763.45  34.63  27.23  27.21  19.40
     3  27.36  27.40  34.45 1777.52  27.27  27.27  19.38
     4  18.84  18.89  29.51  29.48 647.53  18.95  11.81
     5  18.78  18.91  29.49  29.56  18.82 770.84  11.78
     6  11.97  11.87  19.84  19.67  11.82  11.74 910.28
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1046.55  52.17  29.51  29.60  18.95  18.96  11.88
     1  52.18 995.22  29.56  29.62  18.87  18.83  11.87
     2  27.31  27.41 1761.46 110.85  27.23  27.20  19.49
     3  27.28  27.37 110.85 1753.56  27.24  27.21  19.41
     4  18.73  18.84  29.45  29.57 647.53  52.18  52.18
     5  18.83  18.92  29.49  29.56  52.17 770.65  52.19
     6  11.93  11.92  19.77  19.62  52.19  52.16 909.75
P2P=Disabled Latency Matrix (us)
   GPU     0      1      2      3      4      5      6
     0   1.42  16.46  14.35  14.35  16.65  15.06  15.14
     1  14.52   1.36  14.43  14.43  15.82  14.46  15.18
     2  14.34  14.35   2.07  14.37  14.36  14.35  14.44
     3  14.41  14.41  14.35   2.07  14.35  14.35  14.37
     4  14.71  14.97  14.34  14.38   1.77  16.56  14.26
     5  14.25  14.36  14.49  14.39  14.25   1.79  15.17
     6  15.45  17.45  14.34  14.62  14.26  15.48   1.67

   CPU     0      1      2      3      4      5      6
     0   1.42   4.25   4.16   4.14   3.97   4.15   4.14
     1   4.21   1.37   4.13   4.12   3.93   4.12   4.14
     2   4.23   4.14   1.55   4.12   3.92   4.13   4.16
     3   4.18   4.11   4.11   1.57   3.93   4.14   4.14
     4   4.04   4.01   4.01   4.00   1.30   4.01   4.01
     5   4.13   4.12   4.10   4.11   3.91   1.37   4.11
     6   4.10   4.11   4.10   4.11   3.89   4.12   1.35
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1      2      3      4      5      6
     0   1.41   1.42  14.38  14.56  15.09  14.26  14.34
     1   1.42   1.42  14.72  14.42  17.54  14.25  14.33
     2  14.34  14.34   2.07   0.36  14.35  14.36  14.36
     3  14.34  14.33   0.36   2.07  14.35  14.35  14.37
     4  15.66  15.73  14.36  14.36   1.74   1.60   1.64
     5  15.26  14.44  14.39  14.49   1.59   1.72   1.59
     6  15.18  14.24  14.38  14.38   1.54   1.53   1.64

   CPU     0      1      2      3      4      5      6
     0   1.41   1.11   4.17   4.13   3.94   4.13   4.13
     1   1.18   1.38   4.16   4.12   3.92   4.11   4.12
     2   4.19   4.15   1.58   1.09   3.93   4.08   4.11
     3   4.17   4.13   1.11   1.58   3.94   4.12   4.14
     4   4.03   3.99   3.99   4.03   1.31   1.02   1.02
     5   4.20   4.14   4.15   4.15   1.11   1.37   1.09
     6   4.12   4.10   4.11   4.12   1.08   1.09   1.38
As TL : DR:

5090 ↔ 5090: 110.82 GB/s (via PM50100 switch)
4090 ↔ 4090: 52.18 GB/s (via PLX88096 switch connected to the PM50100 switch)
Ampere Trio A40 ↔ A6000 ↔ 3090: 52.19 GB/s (via PLX88096 switch connected to the PM50100 switch)

Latency:
  • Same-gen P2P: 0.35-1.62 μs
  • Cross-gen (CPU bounce): 14-16 μs

Despite being on AM5, I had to use a MCIO Retimer (I used this one: https://es.aliexpress.com/item/1005010427621047.html), else some GPUs would drop.
 
Last edited: