A100 poor performance?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.


Jul 19, 2014
Is anyone else doing inferance on a100 with pytorch?

We run a number of vision machines with the following cards

RTX Titan, RTX 3090, SXM A100

The important metric for us is frame to class latency with bursts of frames, we batch results up to N frames per batch and count the round trip time as the latency for all the frames in that batch.

All of our machines run the same code, models, batch sizes, cuda drivers, etc.

It seems like the fastest machine is the RTX 3090 machine, slowest is the SXM A100 and the RTX Titans are right in the middle?

I must be missing something with the A100-SXM4-40GB modules, we were testing the 3090 on pytorch just to get an idea of how ampere was going to work out.
  • Like
Reactions: vv111y


Active Member
Nov 7, 2018
I think that's probably right, especially in a single context and the images are relatively small. I don't have A100's at all though.

The A100's would likely scale better for bigger, more parallel jobs, especially for the training side.