GPU Server Build for ~$50k

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

wrenlab

New Member
May 9, 2023
1
0
1
Hello everyone,

I am constructing a multi-GPU server for a small lab and am looking for good suggestions for vendors and specific chassis models.

There are a few constraints that we are under:
1) We are limited on price, and therefore purchasing equipment completely decked out is not possible (want just 1x GPUs, few RAM sticks & CPUs)
2) Needs to *support* 8x GPUs (A100s or H100s) - i.e. the future goal (when more funds come later)
3a) If the motherboard uses PCIe card slots for GPUs, it *must* be water-cooled (the cards will likely melt with 8x consumer grade PCIe card GPUs lined up)
3b) If the motherboard supports/uses SXM sockets, fan cooling is acceptable (not PCIe) (but recall [1] still applies)
4) Unfortunately, the team does not feel confident in setting up liquid cooling systems, so the liquid cooling solution needs to be as 'plug-and-play' as possible. (Like a corsair H100i CPU cooler - preferably not taking apart the GPU)

These requirements have felt pretty restrictive, as this seems like an area when most simply by 12x DGX Superpods for $mega-dollars and go about their business. A single DGX A100 80GB or a DGX H100 is pretty much what we are looking for, but with a few GPUs missing to lower the price to what we can afford.

I know that the SXM GPUs come more 'chip-like' (without the fan/radiator/pcie on them) which *may* make a closed system GPU cooler more easy to use, but we don't have any experience with these yet.

The [STH video about liquid cooling](https://www.youtube.com/watch?v=4Np1HnWiHb4) was very intriguing due to the fact that the heat exchange occurs *outside* the server (thus freeing up space, and allowing any unexpected issues to be external to the server components).

There are several decent options that look decent for chassis [see below 1,2,3,4], but we have not found how to *actually purchase* any of these items directly.

Reference chassis systems:
1) G492-ZL2 G492-ZL2 (rev. A00) | GPU Servers - GIGABYTE Global
- Liquid cooled, but *pre-setup*
2) Gigabyte G492-ZD0 (from 2020) G492-ZD0 (rev. 100) | GPU Servers - GIGABYTE Global
- Air cooled, but with SXM socket
3) Navion 4U HGX A100 Navion 4U 8 GPU NVIDIA A100 Server with NVLink® - NVIDIA HGX™ A100 - Microway
- Air cooled, but with SXM socket

These are just examples, but we're open for any suggestions.
I like (1) since it seems that components could be purchase and simply added without much fuss, and retains the far superior cooling system to get enormous gains in performance.

However, does anyone know of a good vendor to purchase *just the bare-bones chassis* for example?
 

NablaSquaredG

Layer 1 Magician
Aug 17, 2020
1,344
820
113
3a) If the motherboard uses PCIe card slots for GPUs, it *must* be water-cooled (the cards will likely melt with 8x consumer grade PCIe card GPUs lined up)
consumer grade - But are you planning to buy consumer grade cards?

Water cooling is going to be absolutely awful or very expensive if you don't do it yourself

So assuming passively cooled GPUs might be an option, could look for something like a used 4124GS-TNR
 

Patriot

Moderator
Apr 18, 2011
1,451
792
113
However, does anyone know of a good vendor to purchase *just the bare-bones chassis* for example?
You cannot buy SXM servers barebones, as you will crush the HBM memory.

Go get a used one off a failed startup.

You can't afford new and the learning curve is steep on this hardware. I am not trying to be mean, I have simply seen a contractor crush 10 10k gpus hbm stacks in an afternoon with a hand screwdriver lacking a torque limiter. I believe the torque spec is 8 inch lbs.