Choosing a server/chassis for GPU workload

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

fragar

Member
Feb 4, 2019
32
0
6
Aha, got it. So basically the Supermicro motherboards consolidate those eight traditionally scattered connectors into a single 8x2 connector, and if you use a Supermicro motherboard with a non-Supermicro chassis, those eight individual connectors from the chassis will all just happen to connect next to each other on the motherboard. Cool.
 

tinco

New Member
Apr 22, 2020
27
1
3
Thanks for the update, just found this thread because of it. I'm designing a 4GPU system and arrived at the same supermicro case, so good to know it really probably is the most attractive case out there right now. Our software requires a high clock speed CPU in addition to the GPU's, so instead of EPYC we'll be going for ThreadRipper, the GPU's are probably going to be 2080 supers.

Are you going to be ordering soon? I think we'll pitch the investors this week and hopefully order the systems somewhere next month. Have you also looked into having a company build the machines for you? I found a couple companies that would build these kinds of systems (with consumer gpu's and even threadrippers). Some even do custom water cooling for rack servers.

Super interesting that you're underclocking your gpu's, that might be interesting for us as well.
 

fragar

Member
Feb 4, 2019
32
0
6
Interesting.

My workload also has a CPU component, Threadripper would make sense for me as well but I want IPMI and no sTRX4 boards currently support it. Also, Epycs are basically underclocked Threadrippers and that might not be such a bad thing (see below).

I will be ordering my first system this week. Haven't really considered getting a company to build it, I want to do this myself and learn more about it, and save money. I've built several home servers over the years.

Watercooling sounds flaky and unnecessary.

Re. GPU underclocking, the 2080 Ti seems to have a clear sweet spot at 160 Watts:

Wattstraining speed (samples per second)
150 (-6.67%)808 (-6.68%)
155 (-3.22%)837 (-2.99%)
160862
170 (+6.25%)880 (+2.09%)
180 (+5.88%)893 (+1.48%)
200 (+11.1%)911 (+2.02%)
220 (+10.0%)926 (+1.65%)
240 (+9.09%)936 (+1.08%)

This is all on Ubuntu 18.04 via "nvidia-smi -i 0 --power-limit=160", etc. Going up from 160 Watts gives only small gains in performance while going below 160 Watts hits a cliff of some kind. These numbers are for training but I've found the same thing for inference, on two different cards. Not sure why those cards are clocked so high by default, the cost of a 2080 Ti is dominated by power consumption.

I also have a Threadripper 2990WX on my training server but haven't tried underclocking that. GPU power consumption will be the primary cost of the inference servers.
 

tinco

New Member
Apr 22, 2020
27
1
3
Cool! GPU's are actually not much of a bottleneck for us, we run multiple instances of a photogrammetry software, and it has very spiky use of the GPU. Unfortunately the spikes usually coincide (because our datasets are usually chunked and then submitted at once) so we need multiple GPU's to handle that. Having them underclocked might save us quite some money.

The water cooling isn't interesting for the 4 GPU systems, but this builder could fit 2 GPU's and a 3970X in a 2U slot with it, that's pretty cool if rackspace is at a premium :)
 

fragar

Member
Feb 4, 2019
32
0
6
Stuffing a 3970x into 2u seems pretty crazy, that's a 280 Watt CPU. How much is that builder charging?

For your use case GPU underclocking might not be so clear, you'll need to run the numbers. Underclocking makes more sense when your workload is steady.

For example, if you run a 2080 Ti at 260 Watts 24/7 for five years in a data center, you'll pay €1100 for the GPU and then something like €2500 Euro for the power. If you're only running it 20% of the time though, then you'll still pay €1100 for the GPU and but only €500 for power, so it makes more sense to run it at a higher speed.
 

tinco

New Member
Apr 22, 2020
27
1
3
Ah that makes total sense. I'll have to run the numbers to see if it makes sense at all.

I slightly misremembered, the 2U case for the 3970x with water cooling has 3 pci-e slots vertically stacked so only 1 gpu will fit, the 3u version does have support for 2 gpu's, but it is not water cooled (or at least, not necessarily). You can check them out here:


Since we might do business with them I don't want to share the exact quote here, but I was pleasantly surprised. There's only a couple hundred quid difference between the air cooled and the watercooled version.
 

fragar

Member
Feb 4, 2019
32
0
6
That's pretty cool, it's a 280 Watt CPU in 2u at 50 db. It's for CPU-centric workloads though.
 

tinco

New Member
Apr 22, 2020
27
1
3
We ordered a first test system, didn't end up with a full service seller but just a parts seller that has an assembly service (saving money where we can). They're doing basic tests now to see if it boots and they sent a picture:

IMG_8584.JPG

Looks like it all goes together easily, might even fit a fifth card in there it seems if you've got an insanely large motherboard.

Got no complaints about compatibility, even though the motherboard is an e-atx non-supermicro one (Asrock TRX40 Creator iirc).
 

liam.gigabyte

New Member
Jun 30, 2020
4
0
1
I'm going to go with the Supermicro AS-747TQ and Gigabyte MZ01-CE1. This is the only good server-grade solution which is currently available and fits into the racks at my intended data center, and it happens to be the cheapest of the prospective solutions and has the best ratio of capital costs to operation costs.

On the motherboard side, the Gigabyte MZ01-CE1 and the ASUS ASRock Rack EPYCD8-2T both seem perfectly adequate, that's close to a coin flip.

It seems that the Supermicro 4124GS-TNR would fit in the 71cm racks in my data center, they have 25 cm extra in front and 5 cm in back. It's not available though, and even if it was I'd prefer the self-assembled 747TQ.

Thanks for the help, especially to BlueFox.
So what did you decide?