On paper you can expect around 150TFlops in TF32 (18bit)
A100 SXM4 40G card has TF32 155TFlops
// The drive card has more rops over tesla A100 models, which should increase its performance processing images in int8 over normal A100 cards.
(Like resnet50 which typically runs with int8 precision -- non-image processing wise it might be bit slower than 40Gig SXM4 card)
This would be good comparison for AI workloads
View attachment 39905
(you can potentially expect int8, int4 and binary tops to be some 15-30% faster on the drive card -- i don't see them meaning quite a lot over 12bit precision)