Ok, I got it working in my Chinese adapter (with modified 5V directly from the power supply). It needed the fan thermal sensor to be removed and its trace to be cut, because unlike V100 and P100, there is no hole where they put the sensor. I mounted an sxm2 heatsink. The difference is that the holes are 4mm more apart than on SXM2. I made a slot with a bit of filing, so now I can use the heatsink for both the V100 and the A100.
Performance-wise, using the
Triton Matmul example, it is 7% slower than the real A100 SXM4 40Gb for the biggest size in BF16. For real-world testing, it's 25% slower than the real A100 for small (200M param) Transformer training, but even like this is 2x as fast as a 3090. The testing might be a bit unfair because the A100 SXM4 is in a DGX, with a much more powerful and much newer CPU than my desktop with the Drive A100 and the 3090, although this should have minimal influence. Also, my desktop uses PCI-E gen 3, and DGX uses gen 4.
Cooling down the card silently is very challenging. Currently, I have a server fan alu-taped to the heatsink, and it sounds like a jet engine. The 8cm Noctua fans do not provide anywhere near enough cooling power to keep the card cool. I'd like to hear any suggestions on how to cool it down silently without water cooling.
The idle power consumption of the Drive A100 is 48W compared to 20W for the 3090.