Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

CyklonDX

Well-Known Member
Nov 8, 2022
1,531
511
113
No, i'm fine - idk what i'm reading anymore. I have supermicro papers saying that number is with sparsity, another one saying its not and vice versa...
 

Leiko

Member
Aug 15, 2021
38
7
8
No, i'm fine - idk what i'm reading anymore. I have supermicro papers saying that number is with sparsity, another one saying its not and vice versa...
I benchmarked bf16 /w fp32 acc on DRIVE A100 to 250TF with cublasLt. I’m betting the real A100 does better (it should do 312TF because it has 108 SM instead of 96 for the Drive version)
We would try benchmarking a 4090 if you have one
 

blackcat1402

New Member
Dec 10, 2024
13
3
3
I did a comparison test of PG199 vs V100 for LLM reasoning, using the same prompt word and MS 3.1 24B Q4. This model can be completely packed into either GPU:
V100 16G SXM2 reasoning results: prompt: 1386 tokens, 29490.74 tokens/s ⁄ response: 773 tokens, 38.17 tokens/s

A100 32G SXM2 reasoning results: prompt: 1386 tokens, 33001.07 tokens/s ⁄ response: 813 tokens, 48.24 tokens/s

A100 is 26.4% faster than V100. This is in line with Nvidia's historical pattern of two-year-old GPUs and ~30% performance improvement per generation. It is said that the 5090 is also about 30% faster than the 4090 in LLM reasoning.
 

Attachments

CyklonDX

Well-Known Member
Nov 8, 2022
1,531
511
113
(just as curiosity - attempted to run about same thing)
ms 3.1 24B instruct 2503 on windows lmstudio, amd 25.3.1, rocm 1.21, memory compression disabled
7900xtx rocm prompt 1.4K tokens / response 832 tokens, 40.30 token/s @ pure fp16
*i.e. asking it to generate response that has 813 tokens
[COLOR=rgba(212, 221, 224, 0.5)]40.30 tok/sec

832 tokens

0.44s to first token

Stop reason: EOS Token Found[/COLOR]

(2k tokens actually brings it down to 37tok/s)

Makes me wonder how 9070xt is better


// m nemo 13B 2407 trained for lmstudio (fully compatible)
at prompt 2K tokens / response 1433 tokens 58.93 tok/sec @ pure fp16
*very nice
1742453190635.png
 

Attachments

Last edited:
  • Like
Reactions: blackcat1402

petersikora

New Member
Jan 22, 2025
1
0
1
Hi,

Can you try do some benchmarks using vLLM and V1 engine on this version of A100?
This should make more sense, couse V100 can't use V1 engine that is optimized for Ampera architecture and newer (Fast attention etc.).
 

kdawg

New Member
Mar 1, 2025
1
0
1
For those with the 3x8 pin adapter card, are the correct cables 2 x 8 pin PCIe connectors and 1 x 8 pin EPS connector? I thought the EPS one was supposed to be compatible with the CPU cables but it seems like it only works for fits for the cable end that plugs into the power supply. Any products/adapters used? Thanks
 

gsrcrxsi

Active Member
Dec 12, 2018
423
143
43
For those with the 3x8 pin adapter card, are the correct cables 2 x 8 pin PCIe connectors and 1 x 8 pin EPS connector? I thought the EPS one was supposed to be compatible with the CPU cables but it seems like it only works for fits for the cable end that plugs into the power supply. Any products/adapters used? Thanks
I only used the 2x8-pin VGA connectors and left the EPS one empty.
 
  • Like
Reactions: kdawg