Single-root vs dual-root complex and E5 vs SP for multi-GPU deep learning

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Jonathan Le Roux

New Member
Sep 23, 2019
3
0
1
We're looking to purchase two 4U nodes, one with 8 or 10 RTX 2080 Ti GPUs and the other with 8 or 10 Quadro RTX 6000 GPUs for deep learning, and I'm having trouble figuring out which architecture to go with.
We're debating between single-root and dual-root complex systems, and between E5 and Cascade Lake SP CPUs. From reading on the site, it sounds like going with a single-root complex with E5 CPUs is the best option, but I'd like to make sure we're not giving up on other advantages provided by the more recent SP CPUs, especially as the pages I found discussing the merits of the various systems are from 2018, and things may have changed since.

Some notes:
- the 2080 Ti node is to host 10 cards that we already have (Zotac, blower type), so a 10-GPU node would be better but we can consider 8-GPU configs if the speed is significantly better.
- we're planning to use NVlink bridges on each pair of GPUs in both systems.
- we are looking to buy machines based on Supermicro 4028 or 4029 systems.
 

Jonathan Le Roux

New Member
Sep 23, 2019
3
0
1
Bumping this up.
Maybe I should have simplified my question:
- For an 8 or 10 GPU Quadro RTX 6000 system, which is best between E5 and Cascade Lake SP in a system like SuperMicro's SYS-4029GP?
- Does the answer change with RTX 2080 Ti?
 

Jonathan Le Roux

New Member
Sep 23, 2019
3
0
1
I found an answer for the RTX 2080 Ti cards: GeForce Turing cards such the RTX 2080 Ti or the Titan RTX do not support peer-to-peer communication (except through NVLink, and even then only by pair), so there's no real reason to fixate on single vs dual root complex for them, and thus no reason to use the older E5 generation of CPUs.
See for example: P2P peer-to-peer on NVIDIA RTX 2080Ti vs GTX 1080Ti GPUs

For the Quadro RTX cards, P2P is supported, but the page above claims that performance improvement is limited, although they only test 2-GPU systems. Comparisons with 4 or 8-GPU systems are needed.