Multi-NVMe (m.2, u.2) adapters that do not require bifurcation

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

abq

Active Member
May 23, 2015
729
231
43
https://www.aliexpress.us/item/3256805484706547.html for the US, anyway, but AliExpress tends to vary products and prices a lot by region.

Edit: I'm tempted to grab one even though my boards all support bifurcation and I already have a spare 8 port switch...
Thank You for posting link to this great deal ! ...I see you can purchase with 4 cable for $174.14 (~$30 per cable, so a bit pricey). Is there a cheaper source for these semi-unique U.2 drive NVME cables?
 

potrebitel

Member
May 8, 2020
39
21
8
Thank You for posting link to this great deal ! ...I see you can purchase with 4 cable for $174.14 (~$30 per cable, so a bit pricey). Is there a cheaper source for these semi-unique U.2 drive NVME cables?
$20+ per cable is a normal price for SFF-8654 8i to 2x SFF-8639 (U.2)
 
  • Like
Reactions: abq

panchovix

Member
Nov 11, 2025
63
17
8
Just a head up that some PLX88XXX switches from this store are quite cheap. They seem to have a good reputation so far. Also there are some offers + coupons for summer/winter sale just now.

https://es.aliexpress.com/item/1005010379686637.html PLX88024 (X8 4.0 to 4*X4 4.0) for 108 USD.

https://es.aliexpress.com/item/1005010385039092.html PLX88096 (X16 4.0 SlimSAS uplink to 4*X16 4.0 + 1*X16 4.0 SlimSAS downstream) for 409 USD.

https://es.aliexpress.com/item/1005010386414574.html PLX88096 (X16 4.0 to 5*X16 4.0 SlimSAS downlink, or 10*X8 4.0, or 20*X4 4.0 SlimSAS downstream) for 386 USD.

https://es.aliexpress.com/item/1005010379841235.html Older PLX8747 (X8 3.0 to 4*X4 3.0) for 49USD, quite cheap.
 

panchovix

Member
Nov 11, 2025
63
17
8
Finally connected all the switches and GPUs on my PC.

That took a while (like 3 days lol), multiple cables and such, but it works!

Basically it is like this:

Code:
PM50100 Switch (01:00.0)
├── Port 02.0 → GPU2 (5090) direct
├── Port 03.0 → PLX88096 (cascaded)
│   └── Complex internal structure:
│       ├── GPU0 (4090)
│       ├── GPU1 (4090)
│       ├── GPU4 (A40)
│       ├── GPU5 (A6000)
│       └── GPU6 (3090)
└── Port 04.0 → GPU3 (5090) direct
Topology looks like this (GPU 0 and 1: 4090s, 2 and 3: 5090s, 4,5 and 6: A6000, A40 and 3090):

Code:
pancho@fedora:~/cuda-samples/build/Samples/5_Domain_Specific/p2pBandwidthLatencyTest$ nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    NIC0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PXB     PXB     PXB     PXB     PXB     PIX     PHB     0-23    0               N/A
GPU1    PXB      X      PXB     PXB     PXB     PXB     PXB     PHB     0-23    0               N/A
GPU2    PXB     PXB      X      PIX     PXB     PXB     PXB     PHB     0-23    0               N/A
GPU3    PXB     PXB     PIX      X      PXB     PXB     PXB     PHB     0-23    0               N/A
GPU4    PXB     PXB     PXB     PXB      X      PIX     PXB     PHB     0-23    0               N/A
GPU5    PXB     PXB     PXB     PXB     PIX      X      PXB     PHB     0-23    0               N/A
GPU6    PIX     PXB     PXB     PXB     PXB     PXB      X      PHB     0-23    0               N/A
NIC0    PHB     PHB     PHB     PHB     PHB     PHB     PHB      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx4_0
Bandwidth and latency looks like this:

Code:
pancho@fedora:~/cuda-samples/build/Samples/5_Domain_Specific/p2pBandwidthLatencyTest$ ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 4090, pciBusID: e, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 4090, pciBusID: 11, pciDeviceID: 0, pciDomainID:0
Device: 2, NVIDIA GeForce RTX 5090, pciBusID: 5, pciDeviceID: 0, pciDomainID:0
Device: 3, NVIDIA GeForce RTX 5090, pciBusID: 18, pciDeviceID: 0, pciDomainID:0
Device: 4, NVIDIA A40, pciBusID: d, pciDeviceID: 0, pciDomainID:0
Device: 5, NVIDIA RTX A6000, pciBusID: 12, pciDeviceID: 0, pciDomainID:0
Device: 6, NVIDIA GeForce RTX 3090, pciBusID: a, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CANNOT Access Peer Device=2
Device=0 CANNOT Access Peer Device=3
Device=0 CANNOT Access Peer Device=4
Device=0 CANNOT Access Peer Device=5
Device=0 CANNOT Access Peer Device=6
Device=1 CAN Access Peer Device=0
Device=1 CANNOT Access Peer Device=2
Device=1 CANNOT Access Peer Device=3
Device=1 CANNOT Access Peer Device=4
Device=1 CANNOT Access Peer Device=5
Device=1 CANNOT Access Peer Device=6
Device=2 CANNOT Access Peer Device=0
Device=2 CANNOT Access Peer Device=1
Device=2 CAN Access Peer Device=3
Device=2 CANNOT Access Peer Device=4
Device=2 CANNOT Access Peer Device=5
Device=2 CANNOT Access Peer Device=6
Device=3 CANNOT Access Peer Device=0
Device=3 CANNOT Access Peer Device=1
Device=3 CAN Access Peer Device=2
Device=3 CANNOT Access Peer Device=4
Device=3 CANNOT Access Peer Device=5
Device=3 CANNOT Access Peer Device=6
Device=4 CANNOT Access Peer Device=0
Device=4 CANNOT Access Peer Device=1
Device=4 CANNOT Access Peer Device=2
Device=4 CANNOT Access Peer Device=3
Device=4 CAN Access Peer Device=5
Device=4 CAN Access Peer Device=6
Device=5 CANNOT Access Peer Device=0
Device=5 CANNOT Access Peer Device=1
Device=5 CANNOT Access Peer Device=2
Device=5 CANNOT Access Peer Device=3
Device=5 CAN Access Peer Device=4
Device=5 CAN Access Peer Device=6
Device=6 CANNOT Access Peer Device=0
Device=6 CANNOT Access Peer Device=1
Device=6 CANNOT Access Peer Device=2
Device=6 CANNOT Access Peer Device=3
Device=6 CAN Access Peer Device=4
Device=6 CAN Access Peer Device=5

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
     D\D     0     1     2     3     4     5     6
     0       1     1     0     0     0     0     0
     1       1     1     0     0     0     0     0
     2       0     0     1     1     0     0     0
     3       0     0     1     1     0     0     0
     4       0     0     0     0     1     1     1
     5       0     0     0     0     1     1     1
     6       0     0     0     0     1     1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1036.83  16.32  24.58  24.58  16.28  16.28  10.68
     1  16.33 999.68  24.58  24.58  16.28  16.28  10.67
     2  23.32  23.32 1783.68  33.13  23.17  23.17  14.15
     3  23.33  23.33  33.01 1775.57  23.16  23.17  14.14
     4  16.32  16.33  24.35  24.37 643.80  16.29  10.69
     5  16.32  16.32  24.39  24.39  16.27 765.93  10.71
     6  10.66  10.94  14.85  15.02  10.64  10.60 903.70
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1039.59  26.36  24.59  24.60  16.28  16.28  10.65
     1  26.36 1017.25  24.57  24.58  16.28  16.28  10.68
     2  23.25  23.33 1763.54  57.28  23.16  23.20  14.16
     3  23.26  23.33  57.25 1763.61  23.18  23.20  14.06
     4  16.30  16.33  24.37  24.36 644.86  26.36  26.36
     5  16.29  16.32  24.39  24.39  26.36 766.68  26.36
     6  10.98  10.79  14.70  15.00  26.37  26.36 904.75
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1047.25  18.94  29.60  29.62  18.76  18.95  11.90
     1  18.94 1002.25  29.55  29.66  18.68  18.92  11.88
     2  27.33  27.36 1763.45  34.63  27.23  27.21  19.40
     3  27.36  27.40  34.45 1777.52  27.27  27.27  19.38
     4  18.84  18.89  29.51  29.48 647.53  18.95  11.81
     5  18.78  18.91  29.49  29.56  18.82 770.84  11.78
     6  11.97  11.87  19.84  19.67  11.82  11.74 910.28
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6
     0 1046.55  52.17  29.51  29.60  18.95  18.96  11.88
     1  52.18 995.22  29.56  29.62  18.87  18.83  11.87
     2  27.31  27.41 1761.46 110.85  27.23  27.20  19.49
     3  27.28  27.37 110.85 1753.56  27.24  27.21  19.41
     4  18.73  18.84  29.45  29.57 647.53  52.18  52.18
     5  18.83  18.92  29.49  29.56  52.17 770.65  52.19
     6  11.93  11.92  19.77  19.62  52.19  52.16 909.75
P2P=Disabled Latency Matrix (us)
   GPU     0      1      2      3      4      5      6
     0   1.42  16.46  14.35  14.35  16.65  15.06  15.14
     1  14.52   1.36  14.43  14.43  15.82  14.46  15.18
     2  14.34  14.35   2.07  14.37  14.36  14.35  14.44
     3  14.41  14.41  14.35   2.07  14.35  14.35  14.37
     4  14.71  14.97  14.34  14.38   1.77  16.56  14.26
     5  14.25  14.36  14.49  14.39  14.25   1.79  15.17
     6  15.45  17.45  14.34  14.62  14.26  15.48   1.67

   CPU     0      1      2      3      4      5      6
     0   1.42   4.25   4.16   4.14   3.97   4.15   4.14
     1   4.21   1.37   4.13   4.12   3.93   4.12   4.14
     2   4.23   4.14   1.55   4.12   3.92   4.13   4.16
     3   4.18   4.11   4.11   1.57   3.93   4.14   4.14
     4   4.04   4.01   4.01   4.00   1.30   4.01   4.01
     5   4.13   4.12   4.10   4.11   3.91   1.37   4.11
     6   4.10   4.11   4.10   4.11   3.89   4.12   1.35
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1      2      3      4      5      6
     0   1.41   1.42  14.38  14.56  15.09  14.26  14.34
     1   1.42   1.42  14.72  14.42  17.54  14.25  14.33
     2  14.34  14.34   2.07   0.36  14.35  14.36  14.36
     3  14.34  14.33   0.36   2.07  14.35  14.35  14.37
     4  15.66  15.73  14.36  14.36   1.74   1.60   1.64
     5  15.26  14.44  14.39  14.49   1.59   1.72   1.59
     6  15.18  14.24  14.38  14.38   1.54   1.53   1.64

   CPU     0      1      2      3      4      5      6
     0   1.41   1.11   4.17   4.13   3.94   4.13   4.13
     1   1.18   1.38   4.16   4.12   3.92   4.11   4.12
     2   4.19   4.15   1.58   1.09   3.93   4.08   4.11
     3   4.17   4.13   1.11   1.58   3.94   4.12   4.14
     4   4.03   3.99   3.99   4.03   1.31   1.02   1.02
     5   4.20   4.14   4.15   4.15   1.11   1.37   1.09
     6   4.12   4.10   4.11   4.12   1.08   1.09   1.38
As TL : DR:

5090 ↔ 5090: 110.82 GB/s (via PM50100 switch)
4090 ↔ 4090: 52.18 GB/s (via PLX88096 switch connected to the PM50100 switch)
Ampere Trio A40 ↔ A6000 ↔ 3090: 52.19 GB/s (via PLX88096 switch connected to the PM50100 switch)

Latency:
  • Same-gen P2P: 0.35-1.62 μs
  • Cross-gen (CPU bounce): 14-16 μs

Despite being on AM5, I had to use a MCIO Retimer (I used this one: https://es.aliexpress.com/item/1005010427621047.html), else some GPUs would drop.
 
Last edited:

unphased

Active Member
Jun 9, 2022
192
43
28
Time for me to visit this again. I had great success last night expanding my NAS node on a 5600G with my Asrock ITX board: I took the x8/x4/x4 riser bifurcator card ($20 class) and am able to connect direct to my CPU my HBA at x8, my 905p Optane 960GB, and my boot NVMe to it. Then I hung the CX3 card into the PCIe slot brought out from the backside M.2. This leaves the front M.2 slot ready to accept one more component (being a NAS it will likely be another HBA, though a solution for mounting more than 15 drives in my case is not forthcoming, it could help offload 4 more disks from the mobo's SATA at least, on the other hand i should prob just upgrade that HBA to a 16 port one and leave the remaining 4 lanes for some other shenanigans. Second optane for special vdev?)

Neat reddit thread detailing a few gen 4 and gen 5 options for us: https://www.reddit.com/r/homelab/comments/1pt0g6n
I found this which is interesting price-wise ($230): link

So far the cheapest PEX88096 I found looks to be this at $333: link

It looks like PEX88024 based items would've been up my alley for being able to attach more 4 lane devices like NVMes, HBAs, NICs to a given device without breaking the bank. It helps a lot to make poor mans HEDT setups. The ability of GPUs to access each other on these is very compelling as well as @panchovix demonstrates above and you'll want the x16 capable ones for that.

With 88024 starting out at about $170 and 88048 being had at about $230, it forces the hand toward the higher forms just to be futureproof. I wonder if idle power usage is significant on these switches.
 
Last edited:
  • Like
Reactions: nexox

potrebitel

Member
May 8, 2020
39
21
8
Idle power consumption sounds irrelevant to total power consumption if you need such adapters.
In your case you mentioned at least second HBA ( ~10W+ ), then the drives to be used for that HBA are 5W+ each.

So does it really matter if its 200W or 210W machine ?

Or having 4x GPU and then caring for 10 more watts ?

Those are very specific hardware components that usually go in a powerful systems (servers, high compute machines, etc) and talking about consumption is wild :)

Also there are PLX88096 for $270 : https://www.ebay.com/itm/136730781261 if you need 8654 sockets, instead of m.2 and dealing with m.2 to pcie adapters or m.2 to 8639
 
  • Like
Reactions: nexox and etorix

unphased

Active Member
Jun 9, 2022
192
43
28
Thank you. If PEX88096 is $270, 88048 is $230, and 88024 is $170, that is almost a perfect masterclass-level pricing ladder that would talk me into buying at the top end and calling it a day...
 

Phence

Active Member
May 16, 2024
121
73
28
Thank you. If PEX88096 is $270, 88048 is $230, and 88024 is $170, that is almost a perfect masterclass-level pricing ladder that would talk me into buying at the top end and calling it a day...
Do you remember that G series APUs on AM4 only do PCI-e gen3? :( Even if the board is gen 4.
 
  • Haha
Reactions: jode and etorix

unphased

Active Member
Jun 9, 2022
192
43
28
Do you remember that G series APUs on AM4 only do PCI-e gen3? :( Even if the board is gen 4.
Indeed, but for my NAS this is completely acceptable as the hardware I need on it are LSI HBA and ConnectX-3 and a pair of optanes (I got a 375GB P4600X AIC coming, and a 905P 960GB U.2 i snagged a while back) for ZFS special vdev, and a boot NVMe (a lowly one), all of which are unbottlenecked by being limited to gen 3. Not having to plug a discrete GPU into the NAS is a massive win for power efficiency, and I am so glad I had the 5600G hanging around to deploy this way. At the end of the day I could make do having it be a fully headless NAS, but this is even more streamlined. Being an ITX board, and with the CPU only supporting x8x4x4 bifurcation instead of x4x4x4x4, it limits me to 5 pcie devices hanging off it, and that turns out to be the perfect amount needed. Giving full x8 to the HBA is pretty clean for the primary role as NAS too; the ConnectX would also like full x8, but it's okay for now... but yes, I think I should review whether I could make my x99 xeon setup be the NAS instead just to squeeze that last bit of I/O out of it... Anyway, previously I had my 5950x dual 3090 rig also host all the drives and idling at 200 watts was unpleasant. Now my triple 3090 5950x system only boots up when I need to experiment on it.

Meanwhile I have picked up two more 5060Ti 16GB for $425 ea., it would be very nice if these were x16 lanes, but it is what it is and it truly makes no sense to give them 64GB/s in the usual gaming use case. I hope it doesn't hobble them too much. I imagine the architecture and efficiency will keep them relevant as AI fabric for a little bit. In the back of my mind I worry if I should have decided to try to scale 3060 12GB at $140 instead, but those just feel too fisher price for me, even though they will actually give full 32GB/s p2p speeds for now on gen 4 PLX. I have to imagine they do not have the power to leverage that much bandwidth, but I may be wrong. I will need to wait for gen 5 switch, so, a long wait, before I can cost effectively online a cluster of 5060Ti, but I hope 16GB/s P2P across 8 of them can at least eventually deliver some really nice LLM throughput.

It is now time to acquire a PEX88096. It would appear that for my setup which due to the recent development of leveraging GPU P2P I need to switch to maximizing P2P BW, so I will prob put my 3x 3090s onto my 1950X TR gen 3 rig under a PEX88096, trading off a bit of GPU-CPU bandwidth on there to gain full gen 4 x16 (32GB/s) P2P bandwidth between the cards, pair them under NVLinks if possible, and put my (just 2 still) 5060Ti at gen 4 x8 on the 5950X for now.

The big question mark I have at this point is around how to predict when a board will refuse to boot when too many GPUs are connected. It probably could be as simple as, system RAM capacity has to be larger than combined VRAM for it to map BAR during boot. Hopefully that is not too bad as I do have oodles of DDR4 at least.
 
Last edited:

unphased

Active Member
Jun 9, 2022
192
43
28
@panchovix what is your host platform, (Edit: I saw elsewhere you mentioned it is a 9900X, thanks) and have you had to deal with issues booting the machine due to too much VRAM being connected? Luckily you have a wealth of PLX switches to mix & match and troubleshoot.

I realized a good topology for my hardware now would be to have 3x3090 and 2x5060ti 16GB off a single 88096. Because it turns out i actually do not have enough gen 4 motherboards to give the 5060ti's the 4.0 x8 that they really need (or i'd be hobbling them to a quarter of their capable bandwidth by running them on 3.0) and the 88096 can provide up to 5 x16 ports downstream. however I am being thrown for a loop on discovering I need to spend $550 or so for a full setup. $270 (which is the cost of a lowlier 6 port 88096 card that can only host 3 x16 GPUs) I can definitely justify for this coolness but $550 gives me pause. We're so, so close though. All the reports of these pcie switch products working flawlessly out of the box is really exciting. But they are still too expensive to have folks coming out and picking up cheap 16GB GPUs in droves. I still think it might happen...
 
Last edited:

unphased

Active Member
Jun 9, 2022
192
43
28
I'm gonna grab a Ryzen 3600 to unlock gen 4 to make a little efficient (still hobbled just not as badly) dual 5060Ti LLM rig. The fact that I'm going to downgrade the 5600G to a 3600 to gain a massive LLM inference tensor parallel performance unlock is absolute cinema.
 

7up

New Member
Jun 9, 2026
1
0
1
I'm probably looking for a ghost but here goes....
I know they are no longer made however is there any chance of finding PLX 8747 or 8750 PCIe switch card that splits PCIe x16 into two x8 slots, not a quad M.2 version.
 

RolloZ170

Well-Known Member
Apr 24, 2016
10,085
3,231
113
germany
I know they are no longer made however is there any chance of finding PLX 8747 or 8750 PCIe switch card that splits PCIe x16 into two x8 slots, not a quad M.2 version.
i guess it will be cheaper to swap the motherboard with one with bifurcation support