4 GPU EPYC deep learning box

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

AmusedGoose

New Member
Mar 16, 2019
24
12
3
I built a system based on EPYC 7441P with 128 GB 2666 RAM. The motherboard is an Asrock EPYCD8-2T. Equiped with 4x 1080ti.

1.jpg
Those PCIe extenders are 3M 50cm extenders, which have given zero issues so far.

Benchmarks so far show 85-90% scaling efficiency for multi GPU training (resnet50, resnet152, inceptionV4), even though traffic has to pass the CPU. Testing was done in a virtual machine which was allocated 40 (v)CPU's (10 per NUMA node) & 120GB of RAM. (Used TensorFlow 1.13 with parameter server). Not perfect but not bad either.

As far as I can tell, passing over NUMA nodes has little influence which was my main concern with AMD EPYC's CPU's. This machine will usually serve 4 virtual machines with 1 GPU per machine however, so it is anyway moot in this case.

Can anyone confirm if GPUdirect RDMA works for GTX cards? I'm convinced not but find it strange/amazing that some teams manage to get excellent efficiency on GTX clusters.
 
Last edited:

e97

Active Member
Jun 3, 2015
324
194
43
Neat! Got links for those PCI-E extenders?

Why go for 1080Ti vs Radeon VII or 2080Ti?
 

AmusedGoose

New Member
Mar 16, 2019
24
12
3
Thanks, those extenders are 3M extenders.
I chose 1080ti's since I had those laying around, otherwise 2080ti would be the choice.
No Radeon VII as of yet since many common libraries don't fully support AMD's ROCm, however that seems to be improving lately. Hopefully it will continue to get better and will commonly get included in stable release builds of Tensorflow and the likes.
 
  • Like
Reactions: chilipepperz

Spotswood

Active Member
Thanks, those extenders are 3M extenders.
I chose 1080ti's since I had those laying around, otherwise 2080ti would be the choice.
No Radeon VII as of yet since many common libraries don't fully support AMD's ROCm, however that seems to be improving lately. Hopefully it will continue to get better and will commonly get included in stable release builds of Tensorflow and the likes.
Would you ever consider a case designed to be 8 or 9U tall, with the GPUs mounted over the motherboard slots (sort'a like some of the "old" GPU mining rigs, but a little more user friendly for the wider extender cables)?
 

AmusedGoose

New Member
Mar 16, 2019
24
12
3
For that you'd be better off looking at something like the ESC8000 G4 or similar. Not too fond of machines that tall as they are expensive to run in the datacenter. Even this 4 gpu 4U machine is not really worth it in co-location since the density is low.