Is anyone else doing inferance on a100 with pytorch?
We run a number of vision machines with the following cards
RTX Titan, RTX 3090, SXM A100
The important metric for us is frame to class latency with bursts of frames, we batch results up to N frames per batch and count the round trip time as the latency for all the frames in that batch.
All of our machines run the same code, models, batch sizes, cuda drivers, etc.
It seems like the fastest machine is the RTX 3090 machine, slowest is the SXM A100 and the RTX Titans are right in the middle?
I must be missing something with the A100-SXM4-40GB modules, we were testing the 3090 on pytorch just to get an idea of how ampere was going to work out.
We run a number of vision machines with the following cards
RTX Titan, RTX 3090, SXM A100
The important metric for us is frame to class latency with bursts of frames, we batch results up to N frames per batch and count the round trip time as the latency for all the frames in that batch.
All of our machines run the same code, models, batch sizes, cuda drivers, etc.
It seems like the fastest machine is the RTX 3090 machine, slowest is the SXM A100 and the RTX Titans are right in the middle?
I must be missing something with the A100-SXM4-40GB modules, we were testing the 3090 on pytorch just to get an idea of how ampere was going to work out.