DeepLearning12 NVLink

Discussion in 'Machine Learning, Deep Learning, and AI' started by Patrick, Jul 1, 2018.

  1. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    I have been getting restless to do another deep learning build. Today I invested in some Tesla P100 16GB GPUs.

    Instead of going with PCIe cards, I decided SXM2 with NVLink.

    Next items:
    1. Need to do some research on whether I can put P100 in V100 trays. The V100 I believe has 6x 50gb/s links. The P100 was 4x 40. That is a big difference but if if they work with both, I will want the newer V100 trays.
    2. I think this is going to be Skylake based. It looks like the E5 generations were less expensive because CPUs were less expensive. Also, motherboards were less expensive.
    3. Skylake is somewhat strange. If you want 2x GPU memory, and each P100 has 16GB that is 64GB in a 4x GPU system or 128GB in an 8x GPU system. That means, at 2x is 128 or 256GB of system RAM. With Skylake the options are really 96GB, 192GB, or 384GB. With E5 128 or 256GB would be easier.
    4. CPUs. What to use?

    Many questions. Likely a few weeks from answers.
     
    #1
    William and vv111y like this.
  2. MiniKnight

    MiniKnight Well-Known Member

    Joined:
    Mar 30, 2012
    Messages:
    2,760
    Likes Received:
    780
    popcorn time
     
    #2
  3. Jaket

    Jaket Member

    Joined:
    Jan 4, 2017
    Messages:
    62
    Likes Received:
    10
    I would love to see how AMD's CPU's work with AI, Deep learning etc. We've been building a lot of Intel systems with 8x 1080TI's however nothing with AMD as of yet.
     
    #3
  4. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    @jacket are you doing single root or dual root? Tried 10x 1080 Ti yet?
     
    #4
    Jaket likes this.
  5. Jaket

    Jaket Member

    Joined:
    Jan 4, 2017
    Messages:
    62
    Likes Received:
    10
    We haven't tried running 10x 1080 Ti's yet it's mostly for one of our clients and they've only requested 8 cards so far. We have mostly used SM for them however this is the next system we will be building out.
    G481-HA0 (rev. 100) | High Performance Computing System - GIGABYTE B2B Service

    All of the storage options in this system seems like a great option for their requirements.

    Have you found it being a big advantage using 10 cards over 8? Might be interesting to bring up to our client.
     
    #5
  6. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    The major benefits are that you save 15-20% on the initial installation per GPU and some on the ongoing costs since you are using more GPUs per chassis.

    I am really interested in the build-out of that Gigabyte server. It is a dual root design so are you planning to use 2x Mellanox cards and avoid the NUMA penalty?
     
    #6
  7. cactus

    cactus Moderator

    Joined:
    Jan 25, 2011
    Messages:
    798
    Likes Received:
    67
    Block diagram shows you are stuck with a built in X550-AT2 on CPU1. Only a non-GPU x16 slot off CPU0. Spec page suggested it's designed for dual Omni-Path CPUs.
     
    #7
    Patrick likes this.
  8. Revrnd

    Revrnd New Member

    Joined:
    Jan 2, 2018
    Messages:
    27
    Likes Received:
    1
    If you had a really good use case and some extra cash laying around you could always opt for one of these...

    Nvidia DGX-2

    On a side note, I'd love to see how these would go rendering some really intense scenes like in Ready Player One or some other CGI intense movie.

    But on a serious note, just out of interest, do you guys hire these things out? Or do you use them for data analytics etc?

    Love your work on the other Deep Learning machines though Patrick. Keep up the good work.
     
    #8
  9. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    @Revrnd we are testing allowing people to hire the big GPU systems
     
    #9
    Revrnd likes this.
  10. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    DeepLearning12 update 8x NVIDIA SXM2 16GB 800px.jpg
     
    #10
    William, ideabox, Tha_14 and 5 others like this.
  11. Revrnd

    Revrnd New Member

    Joined:
    Jan 2, 2018
    Messages:
    27
    Likes Received:
    1
    Mmmm.. that's nice seeing over 80 TFLOPS there. Would be great to see how something like that would go with some well rounded benchmarks.

    Keep up the good work Patrick, looking forward to seeing the article when you get this machine up and running.

    Would it be possible to run OctaneBench 3.x on your Deep Learning machines for a comparison by any chance? Would be great for those with an interest in 3D Rendering and render farms.
     
    #11
  12. gigatexal

    gigatexal I'm here to learn

    Joined:
    Nov 25, 2012
    Messages:
    2,472
    Likes Received:
    437
    All this hardware porn I’m feeling guilty subbing this thread ;)
     
    #12
  13. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    Running like crazy this week with travel. Update. We have an 8x SXM2 server confirmed. It is being produced and hopefully shipped and arriving here next week.

    GPUs: Check
    CPUs: Check
    RAM: Check
    NVMe SSDs: Check
    Boot SSDs: Check
    Mellanox 100Gb: Check
    Server: Inbound!
     
    #13
    Marsh and Revrnd like this.
  14. nrtc

    nrtc New Member

    Joined:
    Dec 3, 2015
    Messages:
    15
    Likes Received:
    2
    Our supplier didn't want to deliver the SYS-4028GR-TRT2 with 10x GPU, since they said it was not a configuration supported by SM.

    In any case, I'm looking forward to DeepLearning12 and performance of the V100's. What DL frameworks and benchmarks do you intend to run?
     
    #14
  15. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    Likely mlperf but we may do our keras + TF GAN as well.

    Who was the vendor BTW? Feel free to PM.
     
    #15
  16. nrtc

    nrtc New Member

    Joined:
    Dec 3, 2015
    Messages:
    15
    Likes Received:
    2
    Does mlperf support multi-gpu benchmarking? It'd be interesting to see how much NVLink helps in scaling up learning. TensorFlow's benchmarks are actually quite straightforward and quick to run, although image-classification centric.

    A vendor in Europe. (pm sent)
     
    #16
    Tha_14 and Patrick like this.
  17. ideabox

    ideabox Member

    Joined:
    Dec 11, 2016
    Messages:
    68
    Likes Received:
    23
    I cannot find a SXM2 server barebone ;(

    Looking forward to this .. Does single/dual root matter with NVLINK?
     
    #17
  18. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    More on this Wed/ Thursday on STH for single root servers.

    SXM2 installation is borderline scary. Servers are only sold with Teslas. You can sometimes get them with 4 of 8 populated.

    Also, an update on the project coming Thursday.
     
    #18
    Fzdog2 and Revrnd like this.
  19. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,156
    Likes Received:
    4,114
    Well, time to get started DeepLearning12 Box in Data Center.jpg
     
    #19
    Fzdog2, Tha_14, William and 3 others like this.
  20. MiniKnight

    MiniKnight Well-Known Member

    Joined:
    Mar 30, 2012
    Messages:
    2,760
    Likes Received:
    780
    What system is that?
     
    #20
Similar Threads: DeepLearning12 NVLink
Forum Title Date
Machine Learning, Deep Learning, and AI NEW OEM NVIDIA Tesla NVLink P100 SXM2 16GB CoWoS HBM2 Aug 31, 2018

Share This Page