DeepLearning12 NVLink

Patrick

Administrator
Staff member
Dec 21, 2010
11,908
4,871
113
I have been getting restless to do another deep learning build. Today I invested in some Tesla P100 16GB GPUs.

Instead of going with PCIe cards, I decided SXM2 with NVLink.

Next items:
1. Need to do some research on whether I can put P100 in V100 trays. The V100 I believe has 6x 50gb/s links. The P100 was 4x 40. That is a big difference but if if they work with both, I will want the newer V100 trays.
2. I think this is going to be Skylake based. It looks like the E5 generations were less expensive because CPUs were less expensive. Also, motherboards were less expensive.
3. Skylake is somewhat strange. If you want 2x GPU memory, and each P100 has 16GB that is 64GB in a 4x GPU system or 128GB in an 8x GPU system. That means, at 2x is 128 or 256GB of system RAM. With Skylake the options are really 96GB, 192GB, or 384GB. With E5 128 or 256GB would be easier.
4. CPUs. What to use?

Many questions. Likely a few weeks from answers.
 
  • Like
Reactions: William and vv111y

Patrick

Administrator
Staff member
Dec 21, 2010
11,908
4,871
113
@jacket are you doing single root or dual root? Tried 10x 1080 Ti yet?
 
  • Like
Reactions: Jaket

Jaket

Member
Jan 4, 2017
76
15
8
Seattle, New York
purevoltage.com
We haven't tried running 10x 1080 Ti's yet it's mostly for one of our clients and they've only requested 8 cards so far. We have mostly used SM for them however this is the next system we will be building out.
G481-HA0 (rev. 100) | High Performance Computing System - GIGABYTE B2B Service

All of the storage options in this system seems like a great option for their requirements.

Have you found it being a big advantage using 10 cards over 8? Might be interesting to bring up to our client.
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,908
4,871
113
The major benefits are that you save 15-20% on the initial installation per GPU and some on the ongoing costs since you are using more GPUs per chassis.

I am really interested in the build-out of that Gigabyte server. It is a dual root design so are you planning to use 2x Mellanox cards and avoid the NUMA penalty?
 

cactus

Moderator
Jan 25, 2011
828
76
28
CA
I am really interested in the build-out of that Gigabyte server. It is a dual root design so are you planning to use 2x Mellanox cards and avoid the NUMA penalty?
Block diagram shows you are stuck with a built in X550-AT2 on CPU1. Only a non-GPU x16 slot off CPU0. Spec page suggested it's designed for dual Omni-Path CPUs.
 
  • Like
Reactions: Patrick

Revrnd

Member
Jan 2, 2018
30
2
8
AU
If you had a really good use case and some extra cash laying around you could always opt for one of these...

Nvidia DGX-2

On a side note, I'd love to see how these would go rendering some really intense scenes like in Ready Player One or some other CGI intense movie.

But on a serious note, just out of interest, do you guys hire these things out? Or do you use them for data analytics etc?

Love your work on the other Deep Learning machines though Patrick. Keep up the good work.
 

Revrnd

Member
Jan 2, 2018
30
2
8
AU
Mmmm.. that's nice seeing over 80 TFLOPS there. Would be great to see how something like that would go with some well rounded benchmarks.

Keep up the good work Patrick, looking forward to seeing the article when you get this machine up and running.

Would it be possible to run OctaneBench 3.x on your Deep Learning machines for a comparison by any chance? Would be great for those with an interest in 3D Rendering and render farms.
 

gigatexal

I'm here to learn
Nov 25, 2012
2,747
524
113
Portland, Oregon
alexandarnarayan.com
I have been getting restless to do another deep learning build. Today I invested in some Tesla P100 16GB GPUs.

Instead of going with PCIe cards, I decided SXM2 with NVLink.

Next items:
1. Need to do some research on whether I can put P100 in V100 trays. The V100 I believe has 6x 50gb/s links. The P100 was 4x 40. That is a big difference but if if they work with both, I will want the newer V100 trays.
2. I think this is going to be Skylake based. It looks like the E5 generations were less expensive because CPUs were less expensive. Also, motherboards were less expensive.
3. Skylake is somewhat strange. If you want 2x GPU memory, and each P100 has 16GB that is 64GB in a 4x GPU system or 128GB in an 8x GPU system. That means, at 2x is 128 or 256GB of system RAM. With Skylake the options are really 96GB, 192GB, or 384GB. With E5 128 or 256GB would be easier.
4. CPUs. What to use?

Many questions. Likely a few weeks from answers.
All this hardware porn I’m feeling guilty subbing this thread ;)
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,908
4,871
113
Running like crazy this week with travel. Update. We have an 8x SXM2 server confirmed. It is being produced and hopefully shipped and arriving here next week.

GPUs: Check
CPUs: Check
RAM: Check
NVMe SSDs: Check
Boot SSDs: Check
Mellanox 100Gb: Check
Server: Inbound!
 
  • Like
Reactions: Marsh and Revrnd

nrtc

New Member
Dec 3, 2015
15
2
3
50
The major benefits are that you save 15-20% on the initial installation per GPU and some on the ongoing costs since you are using more GPUs per chassis.
Our supplier didn't want to deliver the SYS-4028GR-TRT2 with 10x GPU, since they said it was not a configuration supported by SM.

In any case, I'm looking forward to DeepLearning12 and performance of the V100's. What DL frameworks and benchmarks do you intend to run?
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,908
4,871
113
Our supplier didn't want to deliver the SYS-4028GR-TRT2 with 10x GPU, since they said it was not a configuration supported by SM.

In any case, I'm looking forward to DeepLearning12 and performance of the V100's. What DL frameworks and benchmarks do you intend to run?
Likely mlperf but we may do our keras + TF GAN as well.

Who was the vendor BTW? Feel free to PM.
 

nrtc

New Member
Dec 3, 2015
15
2
3
50
Likely mlperf but we may do our keras + TF GAN as well.
Does mlperf support multi-gpu benchmarking? It'd be interesting to see how much NVLink helps in scaling up learning. TensorFlow's benchmarks are actually quite straightforward and quick to run, although image-classification centric.

Who was the vendor BTW? Feel free to PM.
A vendor in Europe. (pm sent)
 
  • Like
Reactions: Tha_14 and Patrick

ideabox

Member
Dec 11, 2016
69
25
18
33
I cannot find a SXM2 server barebone ;(

Looking forward to this .. Does single/dual root matter with NVLINK?
 

Patrick

Administrator
Staff member
Dec 21, 2010
11,908
4,871
113
I cannot find a SXM2 server barebone ;(

Looking forward to this .. Does single/dual root matter with NVLINK?
More on this Wed/ Thursday on STH for single root servers.

SXM2 installation is borderline scary. Servers are only sold with Teslas. You can sometimes get them with 4 of 8 populated.

Also, an update on the project coming Thursday.
 
  • Like
Reactions: Fzdog2 and Revrnd