Choosing a server/chassis for GPU workload

fragar · Apr 22, 2020

I am about to build some inference GPU servers. This will be my first time dealing with server-grade hardware and I have some questions.

Requirements and background info:

1. The servers should support Epyc CPUs and multiple double-slot blower-style Nvidia consumer GPUs (ie 2080 Ti).
2. Price is a major factor.
3. Storage and networking requirements are modest by server standards (a drive or two and 10 GbE will be enough).
4. The servers will be colocated (after a trial run at home).
5. There will eventually be many servers over many years.
6. I live and run my business in Europe (Hungary).
7. I have browsed around a bunch but have not started to negotiate with any vendors yet.
8. I've built several compute-intensive home servers with consumer parts over the years.
9. I don't plan to get any "official" support, and will just RMA anything that fails and deal with it myself (for cost reasons).

The server I like best is the Gigabyte G242-Z10 (G242-Z10 (rev. 100) | High Performance Computing System - GIGABYTE Global). I like the density, the extra-wide GPU slots, and the airflow. However, two things give me pause:

a. I like the idea of going with Supermicro for future-proofing (modular approach, company exclusively dedicated to the server space) and quality (based on various internet comments), but Supermicro doesn't currently seem to have great options for Epyc GPU servers. The only two Supermicro Epyc GPU servers which look reasonable are the 2023US-TR4 (2u, 2p, 2 GPUs) and the 4124GS-TNR (4u, 2p, 8 GPUs), but the former only supports two double-slot GPUs per server and the latter isn't available yet and seems to require a hump for consumer GPU power delivery.

b. The price of the Gigabyte G242-Z10 seems a bit high (based on browsing the internet, not contacting anyone). The lowest price I've found online is around $2200 (w/o VAT).

The server I like second-best is the Tyan TN83-B8215 (https://www.tyan.com/Barebones_TN83B8251_B8251T83E8HR-2T-N), but it has the same drawbacks as the Gigabyte G242-Z10. The lowest list price I've found for the Tyan TN83-B8215 is $2800 (w/o VAT).

The cheapest suitable server I've found online is the HPE ProLiant DL385 Gen 10. This wouldn't be my first choice, but it would be good enough and some of the prices I'm seeing for this server are crazy low:

$980 for 500 Watt PSU and an Epyc 7251: PROVANTAGE: HPE 878712-B21 HPE Proliant DL385 Gen10 EPYC 7251 8-Core 16GB 8LFF E208i-A 500W 3-Year
$2790 for 800 Watt PSU and a Epyc 7452: PROVANTAGE: HPE P16693-B21 HPE Proliant DL385 Gen10 AMD EPYC 7452 32-Core 16GB 24SFF P408i-A 800W 3-Year

These HPE ProLiant DL 385 Gen 10 versions don't have the Gen 10+ improvements (ie PCIe 4) but that's ok.

My questions:

1. How much should I optimize for getting the type of server I want now versus trying to pick a vendor for the long haul.

2. How good a long-term vendors are Gigabyte and Tyan. I've found a ton of info about HPE, Dell and Supermicro but very little about Gigabyte and Tyan as server vendors.

3. Am I missing something with those HP ProLiant DL385 deals? They seem incredibly good. Can I negotiate something comparable for the Gigabyte G242-Z10 or Tyan TN83-B8215?

4. How willing should I be to purchase servers in the US and have them shipped to Europe? I've done this many times with consumer hardware and use Stackry for a forwarding address. Would this be too cumbersome for this kind of hardware?

5. Another server I have my eyes on is the Gigabyte G292-Z42 (G292-Z42 (rev. 100) | High Performance Computing System - GIGABYTE Global), which packs 8 double-width GPU slots into 2u. This would actually be my first choice due to the incredible density and cost per GPU but I suspect that the cooling requires passive server-grade GPUs and just wouldn't be able to handle eight 200+ Watt consumer blower-style GPUs. Is there any way to find out if this is a feasible option?

6. Another idea I've been toying around with is building the servers myself, crypto-mining-rig style, by getting deep 4u cases and motherboards like the Supermicro H11SSL and running PCIe x8 and x16 extenders off those motherboard to the GPUs in the front of the cases. This would save some money up front, although with higher colocation costs. Does this sound reasonable?

7. I'm also seeing some really good deals for engineering sample Epyc CPUs on ebay. Often the seller states very specific motherboard compatibility. Should those usually work in other motherboards? Is there anything I should be wary of with such offers?

8. Any other feedback is welcome.

Ok, that's a lot of questions. Thanks in advance for any help!

BlueFox · Apr 22, 2020

While I can't answer most of your questions, when it comes to shipping stuff from the US, it's going to get quite expensive. GPU chassis are going to be way too large to go via the post office (USPS), so you're stuck with FedEx/UPS. The last time I shipped a Supermicro 4U chassis to Europe, it cost close to $1000 (Denmark, but Hungary shouldn't be any different).

I will warn you that the main issue you're going to encounter running consumer GPUs instead of server ones is power plug placement. I'm only familiar with Supermicro's products, where you can make it work in some chassis, but it's still a pain.

For your colocation, is space a concern? If you can use a 4U chassis instead of 2U, you could save a fair bit of money piecing everything together yourself. If you want server grade, maybe a Supermicro 747TQ-R1620B with whatever motherboard you want (doesn't have to be Supermicro). If not: Rosewill RSV-L4000C - 4U Rackmount Server Case / Chassis for Bitcoin Mining Mach 840951126622 | eBay

fragar · Apr 23, 2020

BlueFox said:
While I can't answer most of your questions, when it comes to shipping stuff from the US, it's going to get quite expensive. GPU chassis are going to be way too large to go via the post office (USPS), so you're stuck with FedEx/UPS. The last time I shipped a Supermicro 4U chassis to Europe, it cost close to $1000 (Denmark, but Hungary shouldn't be any different).

I will warn you that the main issue you're going to encounter running consumer GPUs instead of server ones is power plug placement. I'm only familiar with Supermicro's products, where you can make it work in some chassis, but it's still a pain.

For your colocation, is space a concern? If you can use a 4U chassis instead of 2U, you could save a fair bit of money piecing everything together yourself. If you want server grade, maybe a Supermicro 747TQ-R1620B with whatever motherboard you want (doesn't have to be Supermicro). If not: Rosewill RSV-L4000C - 4U Rackmount Server Case / Chassis for Bitcoin Mining Mach 840951126622 | eBay

Thanks for the comments.

Stackry charged me $226 to ship a 44 lb package in 2018. Vendors should be able to ship cheaper though, if they're willing to ship to Europe at all.

My intended data center here charges a bit under €200 per 42u per month, so using 4u instead of 2u for each server would cost an extra €120 per server per year, or let's say €600 extra TCO per server. OTOH the 4u servers would probably use less electricity due to lower fan speeds

The Supermicro 747TQ looks good but getting even 4 GPUs in there seems a bit tricky. None of the Supermicro Epyc motherboards have 4 x16 slots, and I'm not sure about clearance between the RAM and the first GPU with something like the Asrock Rack RomeD8 (ASRock Rack > ROMED8-2T). I'm also not sure there are really any big cost savings going this route - you spend €600 for the motherboard, €600 for a used Supermicro 747TQ chassis, €600 on bigger colocation costs, maybe another few hundred Euros to bump up the power supply, and you're back at €2000.

The Rosewill case would cut the most costs. I could stuff 6 or 7 GPUs in there with riser cables and get almost the same GPU density as with the Gigabyte G242-H10. My main concerns with that are reliability and tinkering time.

Cixelyn · Apr 23, 2020

We have a G242-Z10 as well as several ESC8000 G4/10Gs in production w/ dual-width consumer stuff.

If cost is a huge concern, then increasing the GPU density is definitely recommended as the cost you're trying to optimize is total system cost per GPU. I would definitely recommend skipping the G242-Z10 and moving to one of your listed 8 or 10-GPU chassis. We really like the G242 as a rack-mounted test-bench but with only 4 GPUs per system, your total supporting system cost will roughly double so its not the cheapest platform.

Also, you mentioned that this is an inference platform? 2080 Ti might be massive overkill; you can probably get away with less. Do you know what the exact workload is? A T4 is maybe 1.5-1.8x the price of a 2080Ti, but you can shove 20 of those in a single chassis. This would remove an entire second system meaning that your cost/GPU might end up the same as two individual 2080 Ti system builds. This would also come with huge benefits too:

Datacenter-grade GPU so you're not worried about temperatures + stability (esp. as this is going into a live production environment you need good monitoring + health checks and dcgmi is hugely gimped on the consumer stuff)
The GPU is within 75W thermal envelope, so both easier install (no 6pin/8pin power) and your colocation costs go down significantly due to less power used.

Obviously the math is vastly different for a training platform, but this is just some 2c on the inference side.

The DL385 is not like any other of the chassis you listed -- for one you can _maybe_ shove 2 GPUs in there, and you're also gonna need a lot of extra parts to even get started (extra riser card, extra CPU, GPU enablement kits for the power cables) which means you barebones costs to the first few systems are actually a lot closer than you think.

fragar · Apr 23, 2020

Cixelyn said:
We have a G242-Z10 as well as several ESC8000 G4/10Gs in production w/ dual-width consumer stuff.

If cost is a huge concern, then increasing the GPU density is definitely recommended as the cost you're trying to optimize is total system cost per GPU. I would definitely recommend skipping the G242-Z10 and moving to one of your listed 8 or 10-GPU chassis. We really like the G242 as a rack-mounted test-bench but with only 4 GPUs per system, your total supporting system cost will roughly double so its not the cheapest platform.

Also, you mentioned that this is an inference platform? 2080 Ti might be massive overkill; you can probably get away with less. Do you know what the exact workload is? A T4 is maybe 1.5-1.8x the price of a 2080Ti, but you can shove 20 of those in a single chassis. This would remove an entire second system meaning that your cost/GPU might end up the same as two individual 2080 Ti system builds. This would also come with huge benefits too:

Datacenter-grade GPU so you're not worried about temperatures + stability (esp. as this is going into a live production environment you need good monitoring + health checks and dcgmi is hugely gimped on the consumer stuff)

The GPU is within 75W thermal envelope, so both easier install (no 6pin/8pin power) and your colocation costs go down significantly due to less power used.

Obviously the math is vastly different for a training platform, but this is just some 2c on the inference side.

The DL385 is not like any other of the chassis you listed -- for one you can _maybe_ shove 2 GPUs in there, and you're also gonna need a lot of extra parts to even get started (extra riser card, extra CPU, GPU enablement kits for the power cables) which means you barebones costs to the first few systems are actually a lot closer than you think.

I do actually need some CPU power and those 8-10 GPU systems with two sockets will have higher CPU costs per GPU, so my total cost would actually be about the same with the 4u 8-GPU Gigabyte G482-Z52 as with the 2u 4-GPU Gigabyte G242-Z10.

The cheapest server-grade solution would be the 2u 8-GPU Gigabyte G292-Z42: G292-Z42 (rev. 100) | High Performance Computing System - GIGABYTE Global

This looks like a great idea and a great product but I am not sure if it will work with consumer graphics cards. Aside from the power connectors possibly not fitting without modding the cards, it seems like the cards in the front are oriented "backwards" so that they face towards the front of the chassis, which would rule out blower cards. Does anyone know if this is really the case? I also wonder if it would be possible to strip the fans and shroud "backs" from those four front-facing consumer blower cards and run them in that chassis passively.

FWIW, I will underclock my cards. I've been running my 2080 Ti's at 160 Watts, that's much more power efficient and somewhat more cost efficient than the default 260 Watts. Maybe the cooling in the G292-Z42 would be good enough for that.

Re. 2080 Ti: my inference workload can be easily split among multiple cards and doesn't need much GPU RAM. I just need the best net performance per Euro. The T4 would be great for the reasons you mentioned but it works out to being a lot more expensive overall. The power efficiency of the T4 can be matched by an underclocked 2080 Ti and with the T4 you're getting a lot less CUDA cores per €. It's close between a 2080 Ti and a 2070 Super, I'll run the exact numbers on that once I've settled on a chassis. The 2080 Ti is about 2.15x more expensive than the 2070 Super and has 1.7x as many CUDA cores, and that higher cost per core seems to just about offset the greater density. Weaker cards than the 2070 Super are only marginally cheaper per CUDA core.

Good point about the ProLiant 385DL. That server actually takes 3 GPUs but indeed, there are a bunch of extra costs when you use all three slots. Some of those parts are very expensive, for example the "high performance" fan wall costs something like €150. That server is not a good fit for my needs and I'm kind of suspicious of "pure enterprise" solutions. Still, the price is incredibly good.

BlueFox · Apr 23, 2020

Going by the photos on the G292-Z42, the front four slots are indeed reversed, so that won't work with consumer cards. One other thing I forgot to mention is, can your colo handle very long chassis? The G292-Z42 for example is 80cm long. The colo rack I have isn't that deep.

fragar · Apr 23, 2020

BlueFox said:
Going by the photos on the G292-Z42, the front four slots are indeed reversed, so that won't work with consumer cards. One other thing I forgot to mention is, can your colo handle very long chassis? The G292-Z42 for example is 80cm long. The colo rack I have isn't that deep.

Yeah, the photo below shows that pretty clearly. Does it makes any sense to hack the cards somehow (strip the fan/shroud, etc)? Lots of consumer cards have grills that run the length of the card.

EDIT: here is an idea - take a blower consumer card, remove the fan, and cut out the part of the shroud on the other side of where the air normally leaves. This would only need to be done to the four cards in the front of the chassis, which have the better ventilation anyway. These cards could be underclocked 2070 Supers, running at something like 140 Watts.

EDIT 2: actually, this mod would only need to be done to the two cards in the top slots at the front. The four cards in the rear could be normal blowers and the two cards at the bottom front could be axial cards with grills that run the length of the card. So only two modded cards (and two voided warranties, etc) per chassis.

This might all seem a bit crazy but I really like this chassis. Aside from being cheap and compact, it uses the case fans really well. Every fan is right against a GPU. The 4u 8-GPU chassis like the Gigabyte G482-Z50 and the Supermicro 4124GS-TNR just seem kind of wasteful with their airflow.

Will ask about rack depth.

BlueFox · Apr 23, 2020

That could potentially work with the kind of airflow the chassis provides, though it's uncharted territory for me, so I cannot tell you with any certainty unfortunately.

fragar · Apr 23, 2020

BlueFox said:
That could potentially work with the kind of airflow the chassis provides, though it's uncharted territory for me, so I cannot tell you with any certainty unfortunately.

Fair enough.

I'm tempted to try it, worst case only two axial GPUs can go in the front and you run with 6 GPUs. Even that might be better than the alternatives.

Those case fans are 80x80x38 mm and go up to 16,300 RPM.

Cixelyn · Apr 23, 2020

Judging from the service manual diagram, the tops of the GPU seem very close to the outside edge of the case.

You might want to ask for the GPU QVL and see if there are any GPUs w/ top power connectors in that list; it might be a pretty tight fit.

BlueFox · Apr 23, 2020

There won't be. All server GPUs have the power connector on the rear.

fragar · Apr 23, 2020

Yeah, that looks really close. It looks like there is maybe 11 or 12 mm between the edge of the chassis and the power connectors on the typical consumer graphics card, pics below.

There is one 2080 Ti blower with power connectors on the back: Gigabyte GeForce RTX 2080 Ti Turbo 11G [Rev 2.0] ab € 1149,00 (2020) | Preisvergleich geizhals.eu EU

Removing the "intake" wall of a blower GPU actually seems reasonable. It wouldn't even be necessary to remove the fan, a radial fan has no notion of air direction anyway and only blows air out the back because that's where the hole is. It also wouldn't be necessary to remove the intake wall entirely, just drilling a bunch of holes in it should be enough. This might be a good idea anyway with strong case airflow, even for blowers which face in the "correct" direction.

fragar · Apr 23, 2020

I found this: Nor-Tech’s Expertly Integrated 8 GPU Server with NVIDIA RTX 2080TI

It's the same server, just a different name. So this can work.

The photo below is supposedly of a 2080 Ti of some sort. Maybe it's best to just remove the shroud and fan entirely.

fragar · Apr 23, 2020

The data center got back to me. They have two types of racks, 71 cm and 75 cm (distance between the mount rails). Only the 71 cm racks can be rented by the rack.

There are other data centers in Budapest of course but they're either significantly more expensive or can handle much fewer Watts per rack.

The depths of the potential servers are:

1. Gigabyte G482-Z52 - 880 mm
2. Gigabyte G292-Z42 - 800 mm
3. Gigabyte G241-Z10 - 820 mm
4. Tyan TN83-B8251 - 831
5. Tyan GA88B8021 - 880 mm
6. ProLiant DL385 - 730 mm
7. Supermicro 4124GS-TNR - 737 mm
8. Supermicro 2023US-TR4 - 723 mm
9. Supermicro 4023S-TRT - 647 mm

I've asked about the Supermicro 4124GS-TNR (737 mm) possibly going into those 71 cm racks.

BlueFox · Apr 23, 2020

Not your first choice, but the 747TQ-R1620B is 673mm. It should work with the ASRock board you mentioned and you'll get 4 GPUs in. I know power supplies were a concern for you, but 1620W should be sufficient.

I don't believe the 4124GS-TNR is available yet and it's not going to be cheap. The Intel version (4029GP-TRT) costs ~$5000. Be sure to look into how the PCIe topology options will work for you. There is also a bracket that's going to cover any side power connectors. See: https://www.servethehome.com/superm...19/supermicro-as-4124gs-tnr-at-sc19-gpu-view/

The 4023S-TRT only officially supports 2 GPUs. I would ensure that it has sufficient GPU power cables for your setup if you go with it. The 1280W power supplies wouldn't be adequate for more than 3 GPUs anyway.

fragar · Apr 23, 2020

BlueFox said:
Not your first choice, but the 747TQ-R1620B is 673mm. It should work with the ASRock board you mentioned and you'll get 4 GPUs in. I know power supplies were a concern for you, but 1620W should be sufficient.

I don't believe the 4124GS-TNR is available yet and it's not going to be cheap. The Intel version (4029GP-TRT) costs ~$5000. Be sure to look into how the PCIe topology options will work for you. There is also a bracket that's going to cover any side power connectors. See: https://www.servethehome.com/superm...19/supermicro-as-4124gs-tnr-at-sc19-gpu-view/

The 4023S-TRT only officially supports 2 GPUs. I would ensure that it has sufficient GPU power cables for your setup if you go with it. The 1280W power supplies wouldn't be adequate for more than 3 GPUs anyway.

Yeah, I'm starting to think that this is the best option. Even aside from the 800 mm depth, those hacks I wrote about for the G292-Z42 were probably just too convoluted.

The 4124GS-TNR would make some sense if it was available soon. PCIe topology is no problem for me (my GPUs don't talk to each other and aren't that latency-sensitive) and the bracket which blocks the top-facing GPU power connectors can surely be moved. Supermicro actually advertises an optional "0.5U LID" for "GTX card support" on the product page (4124GS-TNR | 4U | A+ Servers | Products | Super Micro Computer, Inc.):

The main advantage of the 747TQ + 4-slot board combo is cost and flexibility. The TCO seems a bit lower and the costs are more deferred - I'd rather pay the data center over five years than Supermicro now. I'm also finding some pretty decent deals on used 747TQ's, in the <$500 range.

Another advantage of the 747TQ + 4-slot board combo is that I can run this at home for a few weeks at first while the 4124GS-TNR would probably need to go into the data center ASAP. It seems better to start with a few 747TQ's and then maybe later move up to the 4124GS-TNR's than go in the other direction.

The 4023S-TRT is not a great solution, it's too expensive per GPU.

Thanks for all of the comments!

fragar · Apr 24, 2020

I'm going to go with the Supermicro AS-747TQ and Gigabyte MZ01-CE1. This is the only good server-grade solution which is currently available and fits into the racks at my intended data center, and it happens to be the cheapest of the prospective solutions and has the best ratio of capital costs to operation costs.

On the motherboard side, the Gigabyte MZ01-CE1 and the ASUS ASRock Rack EPYCD8-2T both seem perfectly adequate, that's close to a coin flip.

It seems that the Supermicro 4124GS-TNR would fit in the 71cm racks in my data center, they have 25 cm extra in front and 5 cm in back. It's not available though, and even if it was I'd prefer the self-assembled 747TQ.

Thanks for the help, especially to BlueFox.

BlueFox · Apr 24, 2020

No problem. Don't forget to get one of these since you won't be using a Supermicro motherboard in the chassis: Supermicro 15cm CBL-0084 16-Pin Front Control Panel Split / Extension Cable

fragar · Apr 24, 2020

Thanks for the heads-up. That's weird that this isn't included with the case.

I guess this part from the 747TQ part list must do the equivalent for the Supermicro boards:

Supermicro 75cm CBL-0071 16-Pin to 16-Pin Front Control Panel Cable

BlueFox · Apr 24, 2020

You'll actually still be using that cable. Normally that would go from the front panel to the motherboard headers. With a non-Supermicro motherboard, the cable will plug into the front panel on one end and the splitter on the other. The splitter is then obviously plugged into the motherboard.

Choosing a server/chassis for GPU workload

Member

Legendary Member Spam Hunter Extraordinaire

Member

Researcher

Member

Legendary Member Spam Hunter Extraordinaire

Member

Legendary Member Spam Hunter Extraordinaire

Member

Researcher

Legendary Member Spam Hunter Extraordinaire

Member

Member

Member

Legendary Member Spam Hunter Extraordinaire

Member

Member

Legendary Member Spam Hunter Extraordinaire

Member

Legendary Member Spam Hunter Extraordinaire