x9drg-qf and 4x2080ti

Discussion in 'Machine Learning, Deep Learning, and AI' started by iblik.94, Jul 26, 2019.

  1. iblik.94

    iblik.94 New Member

    Joined:
    Nov 16, 2018
    Messages:
    3
    Likes Received:
    4
    Hello everyone,
    I need problem solving

    I have a board from supermicro x9drg-qf, to which I added 4x 2080ti
    but when all 4 cards are turned on, the system will reboot.
    if only 3 cards are on, the system is OK

    I suspect the power supply, there are 2 redundant sources 1400 watts, but while running one dog draws about 500 watts

    I tried one 1600 watts power supply, but without changes

    I have 3 such machines and all 3 behave the same
    the cards are about 75 degrees

    do you have any idea?
     
    #1
  2. T_Minus

    T_Minus Moderator

    Joined:
    Feb 15, 2015
    Messages:
    6,828
    Likes Received:
    1,484
    #2
  3. larrysb

    larrysb New Member

    Joined:
    Nov 7, 2018
    Messages:
    2
    Likes Received:
    1
    From first hand experience, the RTX2080 ti can pull an enormous amount of power under deep-learning loads. They can easily peak over 320watts each. The character of the load is very intermittent, meaning it jumps up and down a lot. Many power supplies are not capable of keeping up with the rapid peak loading imposed. The Nvidia 2080ti Founder's edition is factory overclocked and has a power limit set of 320 watts in firmware. Aftermarket cards are also often overclocked for the gaming markets too. These can draw a heck of a lot of power.

    The wiring to each 8-pin header needs to be as short as possible, and each needs an independent cable back to the power supply. Do not use dual cables, nor 4-pin drive to 8-pin PCIe power adapters.

    FWIW, the EVGA 1600 T2 is the most stable power supply I've found for this application. I've had issues with some other very respectable brand name power supplies that could not handle the surging loads from DL applications. They seem to handle steady loads easily, but couldn't keep up with the surging demand of the DL loads very well.

    My Xeon E5-V3 workstation is an Asus X99 workstation board and I'm running 2x Titan RTX cards in it. t can draw enough power under DL loads to make the office lights blink quite noticeably. I've run that one with 4x Titan-V and 4x 1080ti. I went to 2x RTX since there no 4-way NVLink bridges available to the public. It's the same motherboard Nvidia uses in their DGX Workstation.

    You can use the 'nvidia-smi' command to reduce the clocks and impose lower power limits on the cards at run time. That will allow you to troubleshoot if it is a power problem. 4x cards may just be too many.

    I have two workstations with 2x RTX-Titan cards each and I have to run them on different electrical circuits because they can pop the circuit breaker to the office if on the same outlet.
     
    #3
    vv111y likes this.

Share This Page