Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

CyklonDX

Well-Known Member
Nov 8, 2022
1,622
578
113
No, i'm fine - idk what i'm reading anymore. I have supermicro papers saying that number is with sparsity, another one saying its not and vice versa...
 

Leiko

Member
Aug 15, 2021
38
7
8
No, i'm fine - idk what i'm reading anymore. I have supermicro papers saying that number is with sparsity, another one saying its not and vice versa...
I benchmarked bf16 /w fp32 acc on DRIVE A100 to 250TF with cublasLt. I’m betting the real A100 does better (it should do 312TF because it has 108 SM instead of 96 for the Drive version)
We would try benchmarking a 4090 if you have one
 

blackcat1402

New Member
Dec 10, 2024
18
3
3
I did a comparison test of PG199 vs V100 for LLM reasoning, using the same prompt word and MS 3.1 24B Q4. This model can be completely packed into either GPU:
V100 16G SXM2 reasoning results: prompt: 1386 tokens, 29490.74 tokens/s ⁄ response: 773 tokens, 38.17 tokens/s

A100 32G SXM2 reasoning results: prompt: 1386 tokens, 33001.07 tokens/s ⁄ response: 813 tokens, 48.24 tokens/s

A100 is 26.4% faster than V100. This is in line with Nvidia's historical pattern of two-year-old GPUs and ~30% performance improvement per generation. It is said that the 5090 is also about 30% faster than the 4090 in LLM reasoning.
 

Attachments

CyklonDX

Well-Known Member
Nov 8, 2022
1,622
578
113
(just as curiosity - attempted to run about same thing)
ms 3.1 24B instruct 2503 on windows lmstudio, amd 25.3.1, rocm 1.21, memory compression disabled
7900xtx rocm prompt 1.4K tokens / response 832 tokens, 40.30 token/s @ pure fp16
*i.e. asking it to generate response that has 813 tokens
[COLOR=rgba(212, 221, 224, 0.5)]40.30 tok/sec

832 tokens

0.44s to first token

Stop reason: EOS Token Found[/COLOR]

(2k tokens actually brings it down to 37tok/s)

Makes me wonder how 9070xt is better


// m nemo 13B 2407 trained for lmstudio (fully compatible)
at prompt 2K tokens / response 1433 tokens 58.93 tok/sec @ pure fp16
*very nice
1742453190635.png
 

Attachments

Last edited:
  • Like
Reactions: blackcat1402

petersikora

New Member
Jan 22, 2025
1
0
1
Hi,

Can you try do some benchmarks using vLLM and V1 engine on this version of A100?
This should make more sense, couse V100 can't use V1 engine that is optimized for Ampera architecture and newer (Fast attention etc.).
 

kdawg

New Member
Mar 1, 2025
1
0
1
For those with the 3x8 pin adapter card, are the correct cables 2 x 8 pin PCIe connectors and 1 x 8 pin EPS connector? I thought the EPS one was supposed to be compatible with the CPU cables but it seems like it only works for fits for the cable end that plugs into the power supply. Any products/adapters used? Thanks
 

gsrcrxsi

Active Member
Dec 12, 2018
428
146
43
For those with the 3x8 pin adapter card, are the correct cables 2 x 8 pin PCIe connectors and 1 x 8 pin EPS connector? I thought the EPS one was supposed to be compatible with the CPU cables but it seems like it only works for fits for the cable end that plugs into the power supply. Any products/adapters used? Thanks
I only used the 2x8-pin VGA connectors and left the EPS one empty.
 
  • Like
Reactions: kdawg

MOD1870

New Member
Apr 24, 2025
1
0
1
They seem to be pretty certain about the traces missing, I actually asked about whether its possible to get it working some how, their feedback was they have had success by migrating the whole chip to a PCIe doner. I assume they have tested the circuits out on the SXM2 board if they have gone to that regard, but thats all the info I have.
Did you try the nvlink for qs version, does it working?
 

wheat_field

New Member
Nov 23, 2024
2
0
1
Anybody have any numbers on what the minimum max power draw is? Like, if you were to lock the graphics clocks to 1140MHz and run model training or something else intensive, what's the power draw? Preemptive thank you to anyone willing to share the numbers, considering buying one but I'm a little concerned about the 400W+ power draw.
 

MilkyWeight

New Member
Mar 15, 2024
16
2
3
Anybody have any numbers on what the minimum max power draw is? Like, if you were to lock the graphics clocks to 1140MHz and run model training or something else intensive, what's the power draw? Preemptive thank you to anyone willing to share the numbers, considering buying one but I'm a little concerned about the 400W+ power draw.
I think it was like around 300 with short spikes 320, 350. I can run a test in a day or two if this is important to you. If so, reply here.
 

pingyuin

New Member
Oct 30, 2024
13
7
3
I don't know if it's really necessary to do that. With three fans and five heat pipes on an open rack, the temperature of the PG199 inference model can be controlled below 70 degrees. Will this affect the secondary sale?
When inferencing this card is constrained by memory bandwidth and power consumption rarely goes higher than 250w. If you also do some other math on this card than the only option you have without delid is to lower the core clocks. When hitting 450w and more the card tends to fall off the PCIe bus to protect itself. One of the possible reason for this is overheating no matter how many pipes you use outside of IHS because TIM between IHS and core/memory have insufficient thermal conductivity. Replacing it with indium-gallium alloy could win you hundred of MHz.
 
  • Like
Reactions: blackcat1402

blackcat1402

New Member
Dec 10, 2024
18
3
3
I just haven't paid attention for a few months, and almost all the low-priced PG199s on eBay are sold out. Maybe more people are discovering the trick of using this computing card.