PCIE x16 breakout for ML Workstation

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Yves_

New Member
Dec 15, 2023
4
0
1
Vienna, Austria
Hi,
I am building a ML startup and am I dire need of GPU Workstations, that are not too expensive.

Goal:

The goal is to build relatively inexpensive ML-Workstations, with a fast CPU and as many GPUs (for the beginning GeForce Cards) as possible.

The difficulty:

There is a small selection of CPUs with many PCIe Lanes available, but my problem is, that the PCIe slots are either not wide enough for double or triple slot cards, or, there are fewer slots on the board.

Therefore, I need a way to mount my GPUs in a different part of the case. For the first machine, I used a “Inter-Tech 4W2” mining case and connected each GPU with two 30cm x16 riser cables (the last on 90 degree angled). That looks and feels janky as hell. I looked into converting the PCIe connectors to oculink or SFF 8654 8i, but all the PCIe boards, that I found are from suspicious vendors and look like fire hazards.

Does anybody know a not-so-expensive PCIe “backplane” from a reputable vendor?

Thanks a lot
 
Last edited:

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,645
2,062
113
You should quantify "not so expensive" when GPU Servers can be built online for $50-100,000$ "cheap" and 100s of thousands for the bigger ones and used single-card systems can be $30,000.

Are you talking about building a consumer-GPU filled system? Budget of $5000? $1000?

As you know GPUs and GPU servers are priced all over.

It also sounds like you may want to use this as your own workstation but do ML on it too? Is that the case?

You can buy old servermicro servers made for GPUs now where PCIE slots are spaced, they're not dirt cheap but they're VERY CHEAP compared to new or what they were before, and older-gen hardware -- but maybe not what you're after?
 

Yves_

New Member
Dec 15, 2023
4
0
1
Vienna, Austria
Thank you for your replies.
I looked at servers, but i am concerned, that there will be issues with airflow, when I put Geforce Card in it.
For the first machines, I need faster CPUs, as we are still in a prototyping phase and use the machines as workstations and as training servers. All further machines don't need fast CPUs, just as many lanes as possible :)
Regarding budget: I'd rather spend the money on GPUs and CPUs, than on the plattform, since we are pretty limited. 1000-2000$ for the plattform are doable.
 

mrpasc

Well-Known Member
Jan 8, 2022
493
261
63
Munich, Germany
If GPU x1 connects are sufficient: why not grab some of those retired mining Gigabyte GPU servers with embedded Epyc 3151 CPU? Offers 10x x1 connection for up to 10 2-slot GPUs and are dirt cheap.
Not sure if they are entering US market as well.
 
  • Like
Reactions: Yves_

Wasmachineman_NL

Wittgenstein the Supercomputer FTW!
Aug 7, 2019
1,883
621
113
If GPU x1 connects are sufficient: why not grab some of those retired mining Gigabyte GPU servers with embedded Epyc 3151 CPU? Offers 10x x1 connection for up to 10 2-slot GPUs and are dirt cheap.
Not sure if they are entering US market as well.
Fill this one with Tesla P40s and you got a very good budget AI server!
 

bayleyw

Active Member
Jan 8, 2014
305
102
43
How fast of a CPU do you need? If you're trying to compete against modern consumer CPUs really your only choice is Sapphire Rapids-WS (Xeon W-2400/3400); stuff like the 14900K shipping at 6GHz really sets a high standard that even regular server CPUs can't match (TR 7000 is also a viable choice but AMD has no low-end SKU in the lineup). Otherwise if you can deal with bad single-threaded performance you can get something like this Supermicro X10DRG-Q Dual LGA2011-3 E5v4 E5v3 Motherboard 5x PCIe 16x i350 Xeon | eBay (or its Chinese equivalent if you want reproducible nodes) which will let you dangle four cards off of risers.

The real issue is the reductions across GPUs during training, but I'll leave that as an exercise to the OP :) If you're lucky your weights + gradient states fit in 24 GB and you don't have to worry about such nonsense...
 
  • Like
Reactions: Yves_

Yves_

New Member
Dec 15, 2023
4
0
1
Vienna, Austria
If GPU x1 connects are sufficient: why not grab some of those retired mining Gigabyte GPU servers with embedded Epyc 3151 CPU? Offers 10x x1 connection for up to 10 2-slot GPUs and are dirt cheap.
Not sure if they are entering US market as well.
I actually got one of those, they are pretty cheap. Unfortunately, the one lane per GPU is not enough for my use case, since in distributed workloads, there is a lot of data transfer...
 

Yves_

New Member
Dec 15, 2023
4
0
1
Vienna, Austria
How fast of a CPU do you need? If you're trying to compete against modern consumer CPUs really your only choice is Sapphire Rapids-WS (Xeon W-2400/3400); stuff like the 14900K shipping at 6GHz really sets a high standard that even regular server CPUs can't match (TR 7000 is also a viable choice but AMD has no low-end SKU in the lineup). Otherwise if you can deal with bad single-threaded performance you can get something like this Supermicro X10DRG-Q Dual LGA2011-3 E5v4 E5v3 Motherboard 5x PCIe 16x i350 Xeon | eBay (or its Chinese equivalent if you want reproducible nodes) which will let you dangle four cards off of risers.

The real issue is the reductions across GPUs during training, but I'll leave that as an exercise to the OP :) If you're lucky your weights + gradient states fit in 24 GB and you don't have to worry about such nonsense...
Thanks, that board might be a really good option, not shure how many options there are for cases with 11 slots :)
 

bayleyw

Active Member
Jan 8, 2014
305
102
43
Thanks, that board might be a really good option, not shure how many options there are for cases with 11 slots :)
You probably want something closer to a mining rig anyway, if you're on a budget and are using consumer cards. If you can disclose what model architecture you are training it will help us help you since different models have different requirements.
 

zachj

Active Member
Apr 17, 2019
161
106
43
If you want to be like google when they were a kid you’d homebrew and just ensure you’ve got a fire suppression system…


you could get any suitable motherboard supporting PCIe bifurcation and pair it with as many quad NVMe x16 adapters as it’ll fit—realistically I would say 2nd gen epyc Rome (supermicro h11sl revision 2) or cascade lake Xeon.

Grab a fistful of nvme->oculink adapter cables

Grab a fistful of oculink->pcie x16 adapters.

Cable lengths are going to restrict you in terms of case support; even 1 meter is probably way too long so you’re going to need to keep the GPUs very close to the motherboard. These things are going to be so hot nobody is going to want to sit next to them regardless of how beautiful you can make them, so in my opinion trying to find a case is a fool’s errand; plan to run them in the open air with a motherboard tray and then GPUs on separate trays above and below.

you’re going to need an air conditioner to keep these puppies cool. People use mini split heat pumps for this to great effect.

I’d get a couple of PSUs dedicated to the GPUs and do the simple cable mod to turn all of them on when you press the power button on the board.

Assuming you can live with PCIe x4 bandwidth that would get you somewhere between 8 and…a shitload of GPUs per motherboard :)
 

zachj

Active Member
Apr 17, 2019
161
106
43
If you need PCIe x8 bandwidth then you could do these:


If you go this direction I’d say you almost certainly need epyc/xeon in order to get enough PCIe lanes.
 

SDletmk

New Member
Dec 30, 2023
9
0
1
I'm interested in making something similar with a group of Tesla GPUs each running Stable Diffusion. Would I be better off with one of those servers as linked above or using an X79, X99, or X299 setup like a mining rack with risers to spread out the GPUs? As far as I understand it, PCIE bandwidth really only matters when loading/unloading data, and the GPUs could work with 4x or 1x risers without too much of a negative impact, right?