PCIE x16 breakout for ML Workstation

Yves_ · Dec 15, 2023

Hi,
I am building a ML startup and am I dire need of GPU Workstations, that are not too expensive.

Goal:

The goal is to build relatively inexpensive ML-Workstations, with a fast CPU and as many GPUs (for the beginning GeForce Cards) as possible.

The difficulty:

There is a small selection of CPUs with many PCIe Lanes available, but my problem is, that the PCIe slots are either not wide enough for double or triple slot cards, or, there are fewer slots on the board.

Therefore, I need a way to mount my GPUs in a different part of the case. For the first machine, I used a “Inter-Tech 4W2” mining case and connected each GPU with two 30cm x16 riser cables (the last on 90 degree angled). That looks and feels janky as hell. I looked into converting the PCIe connectors to oculink or SFF 8654 8i, but all the PCIe boards, that I found are from suspicious vendors and look like fire hazards.

Does anybody know a not-so-expensive PCIe “backplane” from a reputable vendor?

Thanks a lot

T_Minus · Dec 15, 2023

You should quantify "not so expensive" when GPU Servers can be built online for $50-100,000$ "cheap" and 100s of thousands for the bigger ones and used single-card systems can be $30,000.

Are you talking about building a consumer-GPU filled system? Budget of $5000? $1000?

As you know GPUs and GPU servers are priced all over.

It also sounds like you may want to use this as your own workstation but do ML on it too? Is that the case?

You can buy old servermicro servers made for GPUs now where PCIE slots are spaced, they're not dirt cheap but they're VERY CHEAP compared to new or what they were before, and older-gen hardware -- but maybe not what you're after?

unwind-protect · Dec 15, 2023

Why do you need a fast CPU, too? And if so, fast single core speed required?

Yves_ · Dec 18, 2023

Thank you for your replies.
I looked at servers, but i am concerned, that there will be issues with airflow, when I put Geforce Card in it.
For the first machines, I need faster CPUs, as we are still in a prototyping phase and use the machines as workstations and as training servers. All further machines don't need fast CPUs, just as many lanes as possible

Regarding budget: I'd rather spend the money on GPUs and CPUs, than on the plattform, since we are pretty limited. 1000-2000$ for the plattform are doable.

mrpasc · Dec 18, 2023

If GPU x1 connects are sufficient: why not grab some of those retired mining Gigabyte GPU servers with embedded Epyc 3151 CPU? Offers 10x x1 connection for up to 10 2-slot GPUs and are dirt cheap.

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM | eBay

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM up to 10x GPU Card. AMD EPYC Embedded 3151 SoC CPU 4C 2,7GHz. 1x AMD EPYC 3151 SoC CPU 4 Core 2,7GHz. G431-MM0 High-Performance Computing 4U Server Up to 10 x PCIe Gen3 GPU.

www.ebay.de

Not sure if they are entering US market as well.

Wasmachineman_NL · Dec 18, 2023

mrpasc said:
If GPU x1 connects are sufficient: why not grab some of those retired mining Gigabyte GPU servers with embedded Epyc 3151 CPU? Offers 10x x1 connection for up to 10 2-slot GPUs and are dirt cheap.

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM | eBay

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM up to 10x GPU Card. AMD EPYC Embedded 3151 SoC CPU 4C 2,7GHz. 1x AMD EPYC 3151 SoC CPU 4 Core 2,7GHz. G431-MM0 High-Performance Computing 4U Server Up to 10 x PCIe Gen3 GPU.

www.ebay.de

Not sure if they are entering US market as well.

Fill this one with Tesla P40s and you got a very good budget AI server!

bayleyw · Dec 18, 2023

How fast of a CPU do you need? If you're trying to compete against modern consumer CPUs really your only choice is Sapphire Rapids-WS (Xeon W-2400/3400); stuff like the 14900K shipping at 6GHz really sets a high standard that even regular server CPUs can't match (TR 7000 is also a viable choice but AMD has no low-end SKU in the lineup). Otherwise if you can deal with bad single-threaded performance you can get something like this Supermicro X10DRG-Q Dual LGA2011-3 E5v4 E5v3 Motherboard 5x PCIe 16x i350 Xeon | eBay (or its Chinese equivalent if you want reproducible nodes) which will let you dangle four cards off of risers.

The real issue is the reductions across GPUs during training, but I'll leave that as an exercise to the OP

If you're lucky your weights + gradient states fit in 24 GB and you don't have to worry about such nonsense...

Yves_ · Dec 18, 2023

mrpasc said:
If GPU x1 connects are sufficient: why not grab some of those retired mining Gigabyte GPU servers with embedded Epyc 3151 CPU? Offers 10x x1 connection for up to 10 2-slot GPUs and are dirt cheap.

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM | eBay

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM up to 10x GPU Card. AMD EPYC Embedded 3151 SoC CPU 4C 2,7GHz. 1x AMD EPYC 3151 SoC CPU 4 Core 2,7GHz. G431-MM0 High-Performance Computing 4U Server Up to 10 x PCIe Gen3 GPU.

www.ebay.de

Not sure if they are entering US market as well.

I actually got one of those, they are pretty cheap. Unfortunately, the one lane per GPU is not enough for my use case, since in distributed workloads, there is a lot of data transfer...

Yves_ · Dec 18, 2023

bayleyw said:
How fast of a CPU do you need? If you're trying to compete against modern consumer CPUs really your only choice is Sapphire Rapids-WS (Xeon W-2400/3400); stuff like the 14900K shipping at 6GHz really sets a high standard that even regular server CPUs can't match (TR 7000 is also a viable choice but AMD has no low-end SKU in the lineup). Otherwise if you can deal with bad single-threaded performance you can get something like this Supermicro X10DRG-Q Dual LGA2011-3 E5v4 E5v3 Motherboard 5x PCIe 16x i350 Xeon | eBay (or its Chinese equivalent if you want reproducible nodes) which will let you dangle four cards off of risers.

The real issue is the reductions across GPUs during training, but I'll leave that as an exercise to the OP If you're lucky your weights + gradient states fit in 24 GB and you don't have to worry about such nonsense...

Thanks, that board might be a really good option, not shure how many options there are for cases with 11 slots

bayleyw · Dec 18, 2023

Yves_ said:
Thanks, that board might be a really good option, not shure how many options there are for cases with 11 slots

You probably want something closer to a mining rig anyway, if you're on a budget and are using consumer cards. If you can disclose what model architecture you are training it will help us help you since different models have different requirements.

zachj · Dec 23, 2023

If you want to be like google when they were a kid you’d homebrew and just ensure you’ve got a fire suppression system…

Gallery of Google Releases Never-Before-Seen Images of Its Data Centers - 8

Image 8 of 13 from gallery of Google Releases Never-Before-Seen Images of Its Data Centers. Servers at Google's Data Center in Hamina, Finland; “Blue lights like these mean everything is functioning properly on the server floor.” Photo © Google/ Connie Zhou

www.archdaily.com

you could get any suitable motherboard supporting PCIe bifurcation and pair it with as many quad NVMe x16 adapters as it’ll fit—realistically I would say 2nd gen epyc Rome (supermicro h11sl revision 2) or cascade lake Xeon.

Grab a fistful of nvme->oculink adapter cables

Grab a fistful of oculink->pcie x16 adapters.

Cable lengths are going to restrict you in terms of case support; even 1 meter is probably way too long so you’re going to need to keep the GPUs very close to the motherboard. These things are going to be so hot nobody is going to want to sit next to them regardless of how beautiful you can make them, so in my opinion trying to find a case is a fool’s errand; plan to run them in the open air with a motherboard tray and then GPUs on separate trays above and below.

you’re going to need an air conditioner to keep these puppies cool. People use mini split heat pumps for this to great effect.

I’d get a couple of PSUs dedicated to the GPUs and do the simple cable mod to turn all of them on when you press the power button on the board.

Assuming you can live with PCIe x4 bandwidth that would get you somewhere between 8 and…a shitload of GPUs per motherboard

zachj · Dec 23, 2023

If you need PCIe x8 bandwidth then you could do these:

PCIe Bifurcation Card - x8x8 - low Profile Adapter | eBay

Find many great new & used options and get the best deals for PCIe Bifurcation Card - x8x8 - low Profile Adapter at the best online prices at eBay! Free shipping for many products!

www.ebay.com

If you go this direction I’d say you almost certainly need epyc/xeon in order to get enough PCIe lanes.

zachj · Dec 23, 2023

Here’s the whole kit for you: https://www.amazon.com/NFHK-SFF-861...oculink+egpu&qid=1703382888&sr=8-2&th=1&psc=1

All you’d need is power supplies, GPUs and a quad NVMe x16 adapter.

Lord knows why that calls for a 24-pin power adapter but you can get splitters.

zachj · Dec 23, 2023

Here’s a dual x8 bifurcation card for oculink:

https://www.amazon.com/chenyang-SFF-8611-SFF-8612-PCI-Express-Mainboard/dp/B0BRVBWDBD/ref=mp_s_a_1_4?keywords=oculink%2Begpu&qid=1703384767&sr=8-4&th=1&psc=1

SDletmk · Dec 30, 2023

I'm interested in making something similar with a group of Tesla GPUs each running Stable Diffusion. Would I be better off with one of those servers as linked above or using an X79, X99, or X299 setup like a mining rack with risers to spread out the GPUs? As far as I understand it, PCIE bandwidth really only matters when loading/unloading data, and the GPUs could work with 4x or 1x risers without too much of a negative impact, right?

Search

PCIE x16 breakout for ML Workstation

Yves_

New Member

T_Minus

Build. Break. Fix. Repeat

unwind-protect

Active Member

Yves_

New Member

mrpasc

Well-Known Member

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM | eBay

Wasmachineman_NL

Wittgenstein the Supercomputer FTW!

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM | eBay

bayleyw

Active Member

Yves_

New Member

Gigabyte G431-MM0 Server AMD EPYC embedded 3151 SoC CPU ohne SSD 0GB PC4 RAM | eBay

Yves_

New Member

bayleyw

Active Member

zachj

Active Member

Gallery of Google Releases Never-Before-Seen Images of Its Data Centers - 8

zachj

Active Member

PCIe Bifurcation Card - x8x8 - low Profile Adapter | eBay

zachj

Active Member

zachj

Active Member

SDletmk

New Member