Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gsrcrxsi

Active Member
Dec 12, 2018
352
115
43
Yes, it can be used on the supermicro aom-sxmv board
hmm. interesting. was there any need for some kind of modification or configuration to the AOM-SXMV board to get them recognized?

i wouldnt have expected a fringe GPU like the A100 drive to work, when even a P100 doesnt work on it.
 

Leiko

New Member
Aug 15, 2021
8
0
1
hmm. interesting. was there any need for some kind of modification or configuration to the AOM-SXMV board to get them recognized?

i wouldnt have expected a fringe GPU like the A100 drive to work, when even a P100 doesnt work on it.
iirc they work out of the box on the adapters (when on linux). I expect them to do the same on aom-sxmv. I will try
 

Leiko

New Member
Aug 15, 2021
8
0
1
Will add for anyone thinking of trying it that
- the heatsink mount is not standard sxm2 (might be sxm4 ?)
- the cards are indeed missing a few SMs 108 -> 96
- linux driver works out of the box but windows needs to be modded
 

xdever

New Member
Jun 29, 2021
8
0
1
Oh, the heatsink sounds bad. I got a card from eBay, but I have not yet been able to test it. How did you solve the problem? Were you able to mount it somehow?
 

xdever

New Member
Jun 29, 2021
8
0
1
Oh, the heatsink sounds bad. I got a card from eBay, but I have not yet been able to test it. How did you solve the problem? Were you able to mount it somehow?
By overlapping the picture of this card and a v100, it seems it should be easy to drill a new set of holes in the heatsink and it looks like that would solve the problem. I'm still curious about your solution.
 

Leiko

New Member
Aug 15, 2021
8
0
1
By overlapping the picture of this card and a v100, it seems it should be easy to drill a new set of holes in the heatsink and it looks like that would solve the problem. I'm still curious about your solution.
In china, custom brackets are being sold with the card sometimes. One seller was selling some standalone but it was expensive. Im almost sure the width is sxm4 and ive seen pics of sxm4 coolers mountend on those cards.

worst case scenario you can always just bolt one side of the heatsink as it doesnt rely on pressure
 

xdever

New Member
Jun 29, 2021
8
0
1
Ok, I got it working in my Chinese adapter (with modified 5V directly from the power supply). It needed the fan thermal sensor to be removed and its trace to be cut, because unlike V100 and P100, there is no hole where they put the sensor. I mounted an sxm2 heatsink. The difference is that the holes are 4mm more apart than on SXM2. I made a slot with a bit of filing, so now I can use the heatsink for both the V100 and the A100.

Performance-wise, using the Triton Matmul example, it is 7% slower than the real A100 SXM4 40Gb for the biggest size in BF16. For real-world testing, it's 25% slower than the real A100 for small (200M param) Transformer training, but even like this is 2x as fast as a 3090. The testing might be a bit unfair because the A100 SXM4 is in a DGX, with a much more powerful and much newer CPU than my desktop with the Drive A100 and the 3090, although this should have minimal influence. Also, my desktop uses PCI-E gen 3, and DGX uses gen 4.

Cooling down the card silently is very challenging. Currently, I have a server fan alu-taped to the heatsink, and it sounds like a jet engine. The 8cm Noctua fans do not provide anywhere near enough cooling power to keep the card cool. I'd like to hear any suggestions on how to cool it down silently without water cooling.

The idle power consumption of the Drive A100 is 48W compared to 20W for the 3090.
 

gsrcrxsi

Active Member
Dec 12, 2018
352
115
43
Ok, I got it working in my Chinese adapter (with modified 5V directly from the power supply). It needed the fan thermal sensor to be removed and its trace to be cut, because unlike V100 and P100, there is no hole where they put the sensor. I mounted an sxm2 heatsink. The difference is that the holes are 4mm more apart than on SXM2. I made a slot with a bit of filing, so now I can use the heatsink for both the V100 and the A100.

Performance-wise, using the Triton Matmul example, it is 7% slower than the real A100 SXM4 40Gb for the biggest size in BF16. For real-world testing, it's 25% slower than the real A100 for small (200M param) Transformer training, but even like this is 2x as fast as a 3090. The testing might be a bit unfair because the A100 SXM4 is in a DGX, with a much more powerful and much newer CPU than my desktop with the Drive A100 and the 3090, although this should have minimal influence. Also, my desktop uses PCI-E gen 3, and DGX uses gen 4.

Cooling down the card silently is very challenging. Currently, I have a server fan alu-taped to the heatsink, and it sounds like a jet engine. The 8cm Noctua fans do not provide anywhere near enough cooling power to keep the card cool. I'd like to hear any suggestions on how to cool it down silently without water cooling.

The idle power consumption of the Drive A100 is 48W compared to 20W for the 3090.
can you post a pic of the heatsink modification necessary?
 

xdever

New Member
Jun 29, 2021
8
0
1
can you post a pic of the heatsink modification necessary?
This is the whole contraption for now: Drive A100.

Unfortunately, the power consumption regularly spikes to 400W, and it makes plenty of coil whine, maybe because the Chinese adapter doesn't have the NVLink populated, which has a bunch of grounds. I'm also worried about the power connectors, perhaps I should add one more.

This power consumption explains why I killed some of the servers.
 

Leiko

New Member
Aug 15, 2021
8
0
1
This is the whole contraption for now: Drive A100.

Unfortunately, the power consumption regularly spikes to 400W, and it makes plenty of coil whine, maybe because the Chinese adapter doesn't have the NVLink populated, which has a bunch of grounds. I'm also worried about the power connectors, perhaps I should add one more.

This power consumption explains why I killed some of the servers.
Looks like the old version, the latest one seems to have (weirdly enough) QS instead of CS engraved on the heatspreader. QS has more perf from the info ive seen on chinese internet.

heatsink is sxm4 mount thats why you had to modify it.

the card also ISNT (the a100 neither but its closer) a full ga100 implementation. Its only 96SM and 4 hbm2 stacks.