Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Leiko · Mar 19, 2025

CyklonDX said:
idk anymore, i'm sure you are right. its 4am.

If you really need to check by yourself, here is the A100 whitepaper https://images.nvidia.com/aem-dam/e...ter/nvidia-ampere-architecture-whitepaper.pdf
The tensor flops are page 36

CyklonDX · Mar 19, 2025

No, i'm fine - idk what i'm reading anymore. I have supermicro papers saying that number is with sparsity, another one saying its not and vice versa...

Leiko · Mar 19, 2025

CyklonDX said:
No, i'm fine - idk what i'm reading anymore. I have supermicro papers saying that number is with sparsity, another one saying its not and vice versa...

I benchmarked bf16 /w fp32 acc on DRIVE A100 to 250TF with cublasLt. I’m betting the real A100 does better (it should do 312TF because it has 108 SM instead of 96 for the Drive version)
We would try benchmarking a 4090 if you have one

blackcat1402 · Mar 19, 2025

I did a comparison test of PG199 vs V100 for LLM reasoning, using the same prompt word and MS 3.1 24B Q4. This model can be completely packed into either GPU:
V100 16G SXM2 reasoning results: prompt: 1386 tokens, 29490.74 tokens/s ⁄ response: 773 tokens, 38.17 tokens/s

A100 32G SXM2 reasoning results: prompt: 1386 tokens, 33001.07 tokens/s ⁄ response: 813 tokens, 48.24 tokens/s

A100 is 26.4% faster than V100. This is in line with Nvidia's historical pattern of two-year-old GPUs and ~30% performance improvement per generation. It is said that the 5090 is also about 30% faster than the 4090 in LLM reasoning.

CyklonDX · Mar 19, 2025

(just as curiosity - attempted to run about same thing)
ms 3.1 24B instruct 2503 on windows lmstudio, amd 25.3.1, rocm 1.21, memory compression disabled
7900xtx rocm prompt 1.4K tokens / response 832 tokens, 40.30 token/s @ pure fp16
*i.e. asking it to generate response that has 813 tokens
[COLOR=rgba(212, 221, 224, 0.5)]40.30 tok/sec
•
832 tokens
•
0.44s to first token
•
Stop reason: EOS Token Found[/COLOR]
(2k tokens actually brings it down to 37tok/s)

Makes me wonder how 9070xt is better

// m nemo 13B 2407 trained for lmstudio (fully compatible)
at prompt 2K tokens / response 1433 tokens 58.93 tok/sec @ pure fp16
*very nice

petersikora · Mar 21, 2025

Hi,

Can you try do some benchmarks using vLLM and V1 engine on this version of A100?
This should make more sense, couse V100 can't use V1 engine that is optimized for Ampera architecture and newer (Fast attention etc.).

kdawg · Apr 11, 2025

For those with the 3x8 pin adapter card, are the correct cables 2 x 8 pin PCIe connectors and 1 x 8 pin EPS connector? I thought the EPS one was supposed to be compatible with the CPU cables but it seems like it only works for fits for the cable end that plugs into the power supply. Any products/adapters used? Thanks

gsrcrxsi · Apr 11, 2025

kdawg said:
For those with the 3x8 pin adapter card, are the correct cables 2 x 8 pin PCIe connectors and 1 x 8 pin EPS connector? I thought the EPS one was supposed to be compatible with the CPU cables but it seems like it only works for fits for the cable end that plugs into the power supply. Any products/adapters used? Thanks

I only used the 2x8-pin VGA connectors and left the EPS one empty.

MOD1870 · Apr 24, 2025

jenapper said:
They seem to be pretty certain about the traces missing, I actually asked about whether its possible to get it working some how, their feedback was they have had success by migrating the whole chip to a PCIe doner. I assume they have tested the circuits out on the SXM2 board if they have gone to that regard, but thats all the info I have.

Did you try the nvlink for qs version, does it working?

MilkyWeight · Apr 24, 2025

MOD1870 said:
Did you try the nvlink for qs version, does it working?

The GPU itself needs to be modified not the board. I found a modder who can do it, minimum quantity is 50 GPUs.

wheat_field · Apr 29, 2025

Anybody have any numbers on what the minimum max power draw is? Like, if you were to lock the graphics clocks to 1140MHz and run model training or something else intensive, what's the power draw? Preemptive thank you to anyone willing to share the numbers, considering buying one but I'm a little concerned about the 400W+ power draw.

MilkyWeight · Apr 29, 2025

wheat_field said:
Anybody have any numbers on what the minimum max power draw is? Like, if you were to lock the graphics clocks to 1140MHz and run model training or something else intensive, what's the power draw? Preemptive thank you to anyone willing to share the numbers, considering buying one but I'm a little concerned about the 400W+ power draw.

I think it was like around 300 with short spikes 320, 350. I can run a test in a day or two if this is important to you. If so, reply here.

wheat_field · Apr 30, 2025

MilkyWeight said:
I think it was like around 300 with short spikes 320, 350. I can run a test in a day or two if this is important to you. If so, reply here.

Thanks, would appreciate that!

MilkyWeight · Apr 30, 2025

wheat_field said:
Thanks, would appreciate that!

300-340. Maybe if you hit a perfect shape it gets to 350. This is on an SXM2 board. Might be different on an SXM2 adapter.

pingyuin · Jun 9, 2025

Found an interesting photo of delidded PG199 on Xianyu 闲鱼 - 闲不住？上闲鱼！ It seems pretty straightforward and not so many tools needed to do so.

blackcat1402 · Jun 12, 2025

pingyuin said:
Found an interesting photo of delidded PG199 on Xianyu 闲鱼 - 闲不住？上闲鱼！ It seems pretty straightforward and not so many tools needed to do so.

I don't know if it's really necessary to do that. With three fans and five heat pipes on an open rack, the temperature of the PG199 inference model can be controlled below 70 degrees. Will this affect the secondary sale?

pingyuin · Jun 14, 2025

blackcat1402 said:
I don't know if it's really necessary to do that. With three fans and five heat pipes on an open rack, the temperature of the PG199 inference model can be controlled below 70 degrees. Will this affect the secondary sale?

When inferencing this card is constrained by memory bandwidth and power consumption rarely goes higher than 250w. If you also do some other math on this card than the only option you have without delid is to lower the core clocks. When hitting 450w and more the card tends to fall off the PCIe bus to protect itself. One of the possible reason for this is overheating no matter how many pipes you use outside of IHS because TIM between IHS and core/memory have insufficient thermal conductivity. Replacing it with indium-gallium alloy could win you hundred of MHz.

blackcat1402 · Jun 15, 2025

Thanks for sharing the insights.

blackcat1402 · Jun 15, 2025

I just haven't paid attention for a few months, and almost all the low-priced PG199s on eBay are sold out. Maybe more people are discovering the trick of using this computing card.

Wayz · Jun 20, 2025

blackcat1402 said:
I just haven't paid attention for a few months, and almost all the low-priced PG199s on eBay are sold out. Maybe more people are discovering the trick of using this computing card.

I saw your post on xianyu.

does pg199 is worth it? below 1500usd?

Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Member

Well-Known Member

Member

New Member

Attachments

Well-Known Member

Attachments

New Member

New Member

Active Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member