A100 poor performance?

iceisfun · Feb 27, 2021

Is anyone else doing inferance on a100 with pytorch?

We run a number of vision machines with the following cards

RTX Titan, RTX 3090, SXM A100

The important metric for us is frame to class latency with bursts of frames, we batch results up to N frames per batch and count the round trip time as the latency for all the frames in that batch.

All of our machines run the same code, models, batch sizes, cuda drivers, etc.

It seems like the fastest machine is the RTX 3090 machine, slowest is the SXM A100 and the RTX Titans are right in the middle?

I must be missing something with the A100-SXM4-40GB modules, we were testing the 3090 on pytorch just to get an idea of how ampere was going to work out.

larrysb · Feb 28, 2021

I think that's probably right, especially in a single context and the images are relatively small. I don't have A100's at all though.

The A100's would likely scale better for bigger, more parallel jobs, especially for the training side.

Search

A100 poor performance?

iceisfun

Member

larrysb

Active Member