I use 1.5mm thermal pads on the VRMs and no thermal pads on the coils. I was also trying to put thermal pads on those, but no matter how long I tried, I couldn't get them right. Either they don't make contact, or they are too thick, and the core doesn't make contact (the thermal paste is not spread out). Do you happen to know what the proper thickness of the pads is?
I'm still using the modified adapter, but as far as the modifications go, I'm pretty sure that it receives enough power. That said, I can't verify the thickness of the traces that they used below the socket.
My motherboard and CPU only support PCIe Gen 3. I doubt that this can be an issue because this happens only when the card goes above 450W.
I'm wondering if my card has some issue with the power limits, as even the SXM4 "real" A100 has a power limit of 400W, and this card clearly goes above it. If I change the shape of the matmul from above from 32768 to 16386, the power level is constantly around 420W, and the card is stable. This is clearly more than the official specification for any version of A100. Can this be related to the card being the CS version and not the production one?
Can somebody try the script form above and check the maximum power usage of their card using nvidia-smi?