need advice about quad GPU build for cuDNN

fragar

Member
Feb 4, 2019
32
0
6
This is my first post here and my first computer build (as will soon become apparent :)). I work in deep learning and, after renting various servers for several years, have built the following machine for a specific workflow centered around cuDNN:

Build’s Name: Server-1
Operating System/ Storage Platform: Ubuntu Server 18.04
CPU: Threadripper 2990WX
Motherboard: X399 Taichi
Chassis: ?? (need advice)
Drives: Samsung SSD 970 EVO 1 TB M.2
RAM: Corsair Vengeance LPX 4x16 GB 3200 MHz
Graphics Cards: Quad (4x) GigaByte RTX 2080 Ti WindForce
Power Supply: Corsair AX1600i

For now I have it running on an open test bench in my office (pictures attached). My plan is to put it in a rack mount case and move it to a co-location facility once everything is done, but I have not yet purchased the rack mount case or case fans.

My problem is with cooling the graphics cards. I made the mistake of getting "open-air"-style graphics cards instead of blower-style and now I am experiencing significant thermal throttling and would like some advice about what to do.

When I run the graphics cards with their fans and shields on, and point an external 16" house fan running at max speed over the open-air case, I get the following performance:

GPU 0 (on the end, fans facing open air, backplate facing GPU 1): 100% (ie. same as running just one GPU)
GPU 1 (surrounded on both sides): 71% of one GPU
GPU 2 (surrounded on both sides): 82%
GPU 3 (on the end, fans facing GPU 2, backplate open): 91%

When I remove the fans and shields of GPUs 1-3, as shown in the pictures, and use the same external house fan to get air flow, I get the following improved performance:

GPU 0: 100%
GPU 1: 77%
GPU 2: 89%
GPU 3: 96%

So, removing the fans and shields from the graphics cards clearly helps. The performance still isn't great, but it's (I guess) acceptable. However, I am not sure if the airflow will be as good inside a case.

My questions are:

1. What rack mount server case would be best for the above components?
2. Is there anything worth doing to try to improve the GPU cooling?

It seems to me that I have the following options:

1. Put this system inside something like a Rosewill RSV-L4500 case, get the best case fans (maybe Delta PFB1212UHE-F00 ??), and accept a 10% or so loss of performance.
2. Split out the graphics cards using a PCI-e x16 extender, put the WindForce GPU fans back on, and move the cards to a separate section of the case where they have more spacing.
3. Keep the graphics cards on the motherboard (where they have dual-slot spacing) but replace the GPU cooling to either water cooling, a better passive solution, or a custom blower.

I am leaning towards #1 but am not sure. I also haven't been able to find any after-market passive or blower style cooling solutions for the RTX 2080 Ti.
 

Attachments

Last edited:

MiniKnight

Well-Known Member
Mar 30, 2012
2,999
909
113
NYC
  • Like
Reactions: fragar

maze

Active Member
Apr 27, 2013
558
84
28
Have you looked at Spotwoods open rig mining setups?

Your alternative method could be to remove the fans, remove the pci plates (or make Them a lot more open) and simpely get a 3u or so case that you Can put a few gentle typhoons or simular super High output fans in to remove the heat by pure airflow.

Edit:

https://www.amazon.com/RAIJINTEK-MORPHEUS-Superior-High-end-Cooler/dp/B071VZ7M4K
- could be an option, but the fins do turn the wrong Way :/

Or
https://www.arctic.ac/eu_en/accelero-s3.html
You could try and see if its possible to make this fit with some copper heatsinks on the chips.. with enough airflow it could be possible..
 
Last edited:
  • Like
Reactions: fragar

fragar

Member
Feb 4, 2019
32
0
6
That is really hard. I think getting the big 4U is your best bet because using cables and mounting the x16 GPUs will be harder.

If you can find a clean way to do this, perhaps in a mining case, then doing that with case fans and the fans on the GPU is best.
https://www.amazon.com/Hydra-Server-Mining-Case-Ready/dp/B07B4QHDPJ/
or
Rosewill RSV-L4000C - 4U Rackmount Server Case / Chassis for Bitcoin Mining Machine, Supports 6~8 Graphic Cards - Newegg.com

I think the rosewill may be better here.
Thanks for the comments.

Those solutions would need PCIe extenders, like this one:

Thermaltake - TT Premium PCI-E 3.0 Extender – 600mm

I wasn't able to find any really clear reports online about deep learning builds which use this approach. There were a few people discussing it in forums, but without clear conclusions. It's not clear if those cables can be routed through those cases, or how long the cables need to be (600mm may not be enough), or how good the performance will be.

This approach is widely used by crypto miners but those workloads are much less bandwidth-constrained, allowing the use of more flexible 1x to 16x connectors.

Can anyone comment on the viability of PCIe x16 extenders for deep learning builds?
 

fragar

Member
Feb 4, 2019
32
0
6
Have you looked at Spotwoods open rig mining setups?

Your alternative method could be to remove the fans, remove the pci plates (or make Them a lot more open) and simpely get a 3u or so case that you Can put a few gentle typhoons or simular super High output fans in to remove the heat by pure airflow.

Edit:

https://www.amazon.com/RAIJINTEK-MORPHEUS-Superior-High-end-Cooler/dp/B071VZ7M4K
- could be an option, but the fins do turn the wrong Way :/

Or
https://www.arctic.ac/eu_en/accelero-s3.html
You could try and see if its possible to make this fit with some copper heatsinks on the chips.. with enough airflow it could be possible..
Indeed, something like this would be my preferred approach. Strong front-to-back airflow can no doubt be achieved through a normal server case (like the Rosewill RSV-L4000) with a boatload of 120mm Delta 9000 rpm fans.

In fact, this is pretty close to what I am doing now. I removed the fans and shielding from the GigaByte 2080 Ti WindForce cards and am blowing air over them with a large room fan. Photo attached.

The problem with my approach, and with the Raijintek Morpheus, is indeed that the heatsink grooves run top-to-bottom. The S3 solves that problem, but it also has some drawbacks:

1. It doesn’t officially support the 2080 Ti.
2. It’s advertised as only dissipating up to 135W.
3. It’s also not really aimed at server builds (fe. the bumps on the backplate run top-to-bottom).

Can anyone recommend any good passive coolers which can be installed on these cards for use in a server (ie. with strong front-to-back airflow)?
 

Attachments

Last edited: