Xeon E-2xxx vs Xeon D-2xxx for ML mule

Discussion in 'Machine Learning, Deep Learning, and AI' started by Styp, Nov 13, 2018.

  1. Styp

    Styp New Member

    Joined:
    Aug 1, 2018
    Messages:
    6
    Likes Received:
    0
    Hey,

    I am in for upgrading some hardware and I came to the conclusion that I need an ML training mule separated from my current workstation.

    Can anyone explain the advantages and disadvantages of going with eighter platform for this use-case? I am considering to run 2 1080ti as for know they are in my workstation.

    Thanks!

    Martin
     
    #1
  2. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,520
    Likes Received:
    4,450
    If I were doing 2x 1080 Ti I would look at Intel Xeon E5 (V4.)

    The Intel Xeon E-2100 series does not have the bandwidth for two PCIe 3.0 x16 lanes. The Xeon D-2100 does. Intel Xeon E5 will allow you to use a single root complex which is good for NVIDIA nccl when you distribute training to more than one GPU. Xeon E5 pricing is good since you can pick up used gear. You can also have more RAM and still have PCIe lanes for NICs.
     
    #2
  3. Styp

    Styp New Member

    Joined:
    Aug 1, 2018
    Messages:
    6
    Likes Received:
    0
    Interestingly, I never considered PCI-E lanes while looking at the different platforms.
    How big is the impact on 2x x8 vs 2x x16? How much CPU are you usually recommending in your builds?
     
    #3
  4. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,520
    Likes Received:
    4,450
    It can be a decent size impact. Using the E5 V3/V4 would allow you to have a single PCIe root complex. That means you can use nccl for your two GPUs which would yield a ~20-30% performance speedup over two GPUs on a Xeon D platform.

    On the 2x 8 v. 2x 16 what you will run into is having to pass data from one GPU to the other. I would recommend x16 if possible. Also, consider you may want to add a network card and/or NVMe storage.
     
    #4
  5. Styp

    Styp New Member

    Joined:
    Aug 1, 2018
    Messages:
    6
    Likes Received:
    0
    You might move this to the DL section...

    How can I check weather NCCL is used with tf? I am using an E5 V3 quad GPU rig at the place I work and often do multi GPU training but never questioned the optimization potential. Of course async augmentation and stuff yes, but only as far as I can control it from a software engineering side.

    I am just questioning that because 2 1080ti could be replaced with one 2080ti down the road, I don't want the fastest rig at home. I just need a 'little' compute for personal research...
     
    #5
  6. Styp

    Styp New Member

    Joined:
    Aug 1, 2018
    Messages:
    6
    Likes Received:
    0
    To come back to this topic, @Patrick. Do you have a recommendation on a GPU focused mainboard. I don't need tons of storage, but a E5 V4 would be nice. Have to opt for a PCIe NVME for dataset handling.

    Cheers!
    Martin
     
    #6
  7. Deslok

    Deslok Well-Known Member

    Joined:
    Jul 15, 2015
    Messages:
    1,001
    Likes Received:
    106
    Supermicro has the X10SRA and X10SRL which would both offer you a ton of PCIe
    the SRA offers up to 16/8/8/8 for pcie 3.0 bandwith or 16/16/x/8(use the last slot for nvme?) the SRL unfortunatly only does 8x to any of it's slots but offers more slots total depending on how you approach NVME storage(unfortuantly neither offers m.2 but carriers are cheap as long as you don't need a switch chip on them) also the SRL is more appropriate for a server chassis cooling wise the SRA a desktop
    X10SRA | Motherboards | Products | Super Micro Computer, Inc.
    X10SRL-F | Motherboards | Products | Super Micro Computer, Inc.
     
    #7
Similar Threads: Xeon E-2xxx
Forum Title Date
Machine Learning, Deep Learning, and AI Core i7 vs. Xeon vs. Core X-Series Aug 5, 2017

Share This Page