yeah. P100 isn’t quite half, more like 2/3. 18.8 (4xP100) vs 28 (4xV100)
“Meh” tensor > 0 tensor (P100)
my work doesn’t “need” Volta. But the Volta implementation of MPS is better. MPS defaults to the older version on Pascal and older. So I greatly prefer it Multi-Process Service :: GPU Deployment and Management Documentation
My workload “mostly” fits in ~12GB on the Titan Vs I have now, but I get some 5% of failures that end up running out of memory. The 16GB V100 will solve that. The workload doesn’t use the GPUs in nvlink, it treats them individually. I’m not better off with 2x 3090 because it sucks for FP64. The job takes ~5 mins on a TitanV and like 30-40mins on a 3090.
“Meh” tensor > 0 tensor (P100)
my work doesn’t “need” Volta. But the Volta implementation of MPS is better. MPS defaults to the older version on Pascal and older. So I greatly prefer it Multi-Process Service :: GPU Deployment and Management Documentation
My workload “mostly” fits in ~12GB on the Titan Vs I have now, but I get some 5% of failures that end up running out of memory. The 16GB V100 will solve that. The workload doesn’t use the GPUs in nvlink, it treats them individually. I’m not better off with 2x 3090 because it sucks for FP64. The job takes ~5 mins on a TitanV and like 30-40mins on a 3090.