4 GPU EPYC deep learning box

AmusedGoose · Apr 26, 2019

I built a system based on EPYC 7441P with 128 GB 2666 RAM. The motherboard is an Asrock EPYCD8-2T. Equiped with 4x 1080ti.

Those PCIe extenders are 3M 50cm extenders, which have given zero issues so far.

Benchmarks so far show 85-90% scaling efficiency for multi GPU training (resnet50, resnet152, inceptionV4), even though traffic has to pass the CPU. Testing was done in a virtual machine which was allocated 40 (v)CPU's (10 per NUMA node) & 120GB of RAM. (Used TensorFlow 1.13 with parameter server). Not perfect but not bad either.

As far as I can tell, passing over NUMA nodes has little influence which was my main concern with AMD EPYC's CPU's. This machine will usually serve 4 virtual machines with 1 GPU per machine however, so it is anyway moot in this case.

Can anyone confirm if GPUdirect RDMA works for GTX cards? I'm convinced not but find it strange/amazing that some teams manage to get excellent efficiency on GTX clusters.

MiniKnight · Apr 26, 2019

gpudirect rdma doesn't work on GTX

Hydrogen_Bombaklot · Jul 16, 2019

I've considered a similar build on a Gigabyte board. MZ01-CE0 (rev. 1.0) | Server Motherboard - GIGABYTE U.S.A.

I'd also be using Quadro RTX 6000's on them for improved support, reliability, and performance.

e97 · Jul 17, 2019

Neat! Got links for those PCI-E extenders?

Why go for 1080Ti vs Radeon VII or 2080Ti?

AmusedGoose · Jul 22, 2019

Thanks, those extenders are 3M extenders.
I chose 1080ti's since I had those laying around, otherwise 2080ti would be the choice.
No Radeon VII as of yet since many common libraries don't fully support AMD's ROCm, however that seems to be improving lately. Hopefully it will continue to get better and will commonly get included in stable release builds of Tensorflow and the likes.

Spotswood · Jul 22, 2019

AmusedGoose said:
Thanks, those extenders are 3M extenders.
I chose 1080ti's since I had those laying around, otherwise 2080ti would be the choice.
No Radeon VII as of yet since many common libraries don't fully support AMD's ROCm, however that seems to be improving lately. Hopefully it will continue to get better and will commonly get included in stable release builds of Tensorflow and the likes.

Would you ever consider a case designed to be 8 or 9U tall, with the GPUs mounted over the motherboard slots (sort'a like some of the "old" GPU mining rigs, but a little more user friendly for the wider extender cables)?

AmusedGoose · Jul 24, 2019

For that you'd be better off looking at something like the ESC8000 G4 or similar. Not too fond of machines that tall as they are expensive to run in the datacenter. Even this 4 gpu 4U machine is not really worth it in co-location since the density is low.

Hydrogen_Bombaklot · Jul 24, 2019

Spotswood said:
Would you ever consider a case designed to be 8 or 9U tall, with the GPUs mounted over the motherboard slots (sort'a like some of the "old" GPU mining rigs, but a little more user friendly for the wider extender cables)?

You mean like this: 6U Dual PSU Aluminum Mining Case - Rosewill

Nanoxia also makes a similar case (Hydra II I believe).

Spotswood · Jul 28, 2019

Hydrogen_Bombaklot said:
You mean like this: 6U Dual PSU Aluminum Mining Case - Rosewill

Nanoxia also makes a similar case (Hydra II I believe).

Yes, very similar, but with the GPUs rotated and mounted along the back, over the expansion slots.

Search

4 GPU EPYC deep learning box

AmusedGoose

New Member

MiniKnight

Well-Known Member

Hydrogen_Bombaklot

Member

e97

Active Member

AmusedGoose

New Member

Spotswood

Active Member

AmusedGoose

New Member

Hydrogen_Bombaklot

Member

Spotswood

Active Member