YesWho is Ian? Are you referring to me? Because I have all those cards and my name is Ian. I am “Ian&Steve” on Einstein.
I presume if you enable ECC and retest, you will get slower, and closer to Titan VV100 - 2500/3 = 833s per task
(wouldn't 3080Ti be faster than Titan V? If on cuda app its sitting more closely to mem bandwidth?)
//(I would still recommend trying to do tensors I presume it will give you greater performance uplift if you can pass through validation - tensors as i recall can also speed up memory "TMA" tho not sure if it works with anything but hopper.)
1. NVIDIA Hopper Tuning Guide — Hopper Tuning Guide 12.8 documentation
The programming guide for tuning CUDA Applications for GPUs based on the Hopper GPU Architecture.
and you can try/see if this one works on ampere/volta
1. NVIDIA Hopper Tuning Guide — Hopper Tuning Guide 12.8 documentation
The programming guide for tuning CUDA Applications for GPUs based on the Hopper GPU Architecture.
this part of code doesn't state you need hopper *(should work on ampere) - and I think this is what you want

NVIDIA nvCOMP
A high-speed data compression and decompression library optimized for NVIDIA GPUs.
[A100] cuCompressibleMemory not work · Issue #68 · NVIDIA/cuda-samples
When I run the sample(cudaCompressibleMemory, my env: cuda 11.2 ), I found the result: L2 Compression Ratio = 0%. This is abnormal, but I counld not find the reason. Are there some special configur...

CUDALibrarySamples/nvCOMP at master · NVIDIA/CUDALibrarySamples
CUDA Library Samples. Contribute to NVIDIA/CUDALibrarySamples development by creating an account on GitHub.


and a vid about it
Last edited: