Xeon E-2xxx vs Xeon D-2xxx for ML mule

Discussion in 'Machine Learning, Deep Learning, and AI' started by Styp, Nov 13, 2018.

  1. Styp

    Styp New Member

    Joined:
    Aug 1, 2018
    Messages:
    3
    Likes Received:
    0
    Hey,

    I am in for upgrading some hardware and I came to the conclusion that I need an ML training mule separated from my current workstation.

    Can anyone explain the advantages and disadvantages of going with eighter platform for this use-case? I am considering to run 2 1080ti as for know they are in my workstation.

    Thanks!

    Martin
     
    #1
  2. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,196
    Likes Received:
    4,148
    If I were doing 2x 1080 Ti I would look at Intel Xeon E5 (V4.)

    The Intel Xeon E-2100 series does not have the bandwidth for two PCIe 3.0 x16 lanes. The Xeon D-2100 does. Intel Xeon E5 will allow you to use a single root complex which is good for NVIDIA nccl when you distribute training to more than one GPU. Xeon E5 pricing is good since you can pick up used gear. You can also have more RAM and still have PCIe lanes for NICs.
     
    #2
  3. Styp

    Styp New Member

    Joined:
    Aug 1, 2018
    Messages:
    3
    Likes Received:
    0
    Interestingly, I never considered PCI-E lanes while looking at the different platforms.
    How big is the impact on 2x x8 vs 2x x16? How much CPU are you usually recommending in your builds?
     
    #3
  4. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,196
    Likes Received:
    4,148
    It can be a decent size impact. Using the E5 V3/V4 would allow you to have a single PCIe root complex. That means you can use nccl for your two GPUs which would yield a ~20-30% performance speedup over two GPUs on a Xeon D platform.

    On the 2x 8 v. 2x 16 what you will run into is having to pass data from one GPU to the other. I would recommend x16 if possible. Also, consider you may want to add a network card and/or NVMe storage.
     
    #4
  5. Styp

    Styp New Member

    Joined:
    Aug 1, 2018
    Messages:
    3
    Likes Received:
    0
    You might move this to the DL section...

    How can I check weather NCCL is used with tf? I am using an E5 V3 quad GPU rig at the place I work and often do multi GPU training but never questioned the optimization potential. Of course async augmentation and stuff yes, but only as far as I can control it from a software engineering side.

    I am just questioning that because 2 1080ti could be replaced with one 2080ti down the road, I don't want the fastest rig at home. I just need a 'little' compute for personal research...
     
    #5
Similar Threads: Xeon E-2xxx
Forum Title Date
Machine Learning, Deep Learning, and AI Core i7 vs. Xeon vs. Core X-Series Aug 5, 2017

Share This Page