I would like to build an AMD based GPU server for deep learning with 8x GTX/RTX cards, however it seems that the server I want doesn't exist.
If I understand correctly, for deep learning I would want all my GPU's and 50G nic on a single NUMA node, like with the ASUS ESC8000 G4. The servers I can find for AMD have the PCIe lanes spread over the NUMA nodes. (EPYCD8-2T mobo and G291-Z20 server eg.)
Will the Rome architecture solve this problem; ie. will all cores be able to talk to all PCIe devices and PCIe to PCIe at low latency/high bandwidth?
Is there no AMD server in existence that does 8 GPU's single root by using 2x PEX8796 for example?
other questions;
Does a NIC have to be connected to the same NUMA node (when in a cluster)?
How well do GTX cards scale over multiple machines for machine learning (eg. 4x 8 GPU servers)?
If I understand correctly, for deep learning I would want all my GPU's and 50G nic on a single NUMA node, like with the ASUS ESC8000 G4. The servers I can find for AMD have the PCIe lanes spread over the NUMA nodes. (EPYCD8-2T mobo and G291-Z20 server eg.)
Will the Rome architecture solve this problem; ie. will all cores be able to talk to all PCIe devices and PCIe to PCIe at low latency/high bandwidth?
Is there no AMD server in existence that does 8 GPU's single root by using 2x PEX8796 for example?
other questions;
Does a NIC have to be connected to the same NUMA node (when in a cluster)?
How well do GTX cards scale over multiple machines for machine learning (eg. 4x 8 GPU servers)?
Last edited: