Microsoft HGX-1 at the AI Hardware Summit

Patrick Kennedy · Sep 19, 2018

We checked out the Microsoft HGX-1 at the AI Hardware Summit in Mountain View California this week and saw how Microsoft is rising to scale to 32x GPU architectures.

The post Microsoft HGX-1 at the AI Hardware Summit appeared first on ServeTheHome.

Mam89 · Sep 20, 2018

Is there any redundancy in this system, or is it a fill the rack and hope nothing fails system?

I'm curious how the pcie fabric between chassis fairs vs the internal nvlink of the gpus. Wouldn't it be slower moving data between the nodes or does it even matter?

Did Microsoft say what workflows, specifically, this was targeted at?

Overall I'm really confused on how this works.

Patrick · Sep 20, 2018

Mam89 said:
Is there any redundancy in this system, or is it a fill the rack and hope nothing fails system?

I'm curious how the pcie fabric between chassis fairs vs the internal nvlink of the gpus. Wouldn't it be slower moving data between the nodes or does it even matter?

Did Microsoft say what workflows, specifically, this was targeted at?

Overall I'm really confused on how this works.

Great questions. There are 6 power supplies IIRC for power redundancy. Fans are also paired up for redundancy. If the GPUs die, they die. Fair point.

NVLink is usually faster, but it turns out a lot of the deep learning guys are pushing data back to the CPUs because it is easier. So PCIe is a known quantity while NVLink is a bit more exotic. You are right that the DGX-2 / HGX-2 would be faster with the 3kW+ NVLink switching fabric.

Mam89 · Sep 20, 2018

Thanks for the reply Patrick. I think I just needed a sanity check for why this even exists.

I can see a software driven SAN-like gpgpu system with a separate fabric, redundent heads, switches, etc being pretty cool.

In fact somethimg like that would be killer for all kinds of workloads, done right.

But this just looks like a bomb taking up rackspace. Unless I'm seriously missing something...

Search

Microsoft HGX-1 at the AI Hardware Summit

Patrick Kennedy

Guest

Mam89

Member

Patrick

Administrator

Mam89

Member