Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
Maybe it doesn’t. You can try a different driver. Or maybe you need a specific driver to get it to work properly. The link you posted before, also shows N/A for the power limit in the Nvidia-smi output
 

Leiko

New Member
Aug 15, 2021
16
0
1
I just know for a fact that its normal if the max power is N/A in nvidia-smi
 

xdever

New Member
Jun 29, 2021
18
0
1
I added another power line for 5V and added some capacitors, which doesn't help. If I close the side of the box, the GPU runs at 83degC instead of 81, and it shuts down noticeably earlier, so it should be some thermal issue. My best guess is VRM. Since in my country no 5-6mm thick thermal pads are available, I used three layers of 2mm ones; maybe that's the problem.
 

Leiko

New Member
Aug 15, 2021
16
0
1
You are running a 440W tdp card with a heatsink made for a 250W tdp card. Its impressive that the card works without a power limit :0
 

xdever

New Member
Jun 29, 2021
18
0
1
Given that I can find 0 information on the card, originally I thought the card was 250W as it is SXM2 and it's 12V instead of 48 like the full version (now I don't understand why they switched to 48V, maybe just for easier PCB routing). I saw the incompatibility of the mounting points a day before I traveled home to where my server is, so I couldn't yet order an SXM4 version. I thought as long as the temps are under control, I will be fine. And the 81deg is fine compared to the other GPUs I used. It looks like it might provide insufficient cooling at the VRMs. Maybe the SXM4 ones have a vapor chamber there as well, or just more fins, idk.
 

xdever

New Member
Jun 29, 2021
18
0
1
Can somebody tell me what heatsink this card uses? SXM4 doesn't have the pads for the VRMs like SXM2 does, and also overlapping the image of an SXM4 heatsink with the card seems to show that the mounting points still don't match (although I found only a pretty bad image of the SXM4 from the bottom). Has anyone actually tried to fit an SXM4 heatsink on it?
 

Leiko

New Member
Aug 15, 2021
16
0
1
Can somebody tell me what heatsink this card uses? SXM4 doesn't have the pads for the VRMs like SXM2 does, and also overlapping the image of an SXM4 heatsink with the card seems to show that the mounting points still don't match (although I found only a pretty bad image of the SXM4 from the bottom). Has anyone actually tried to fit an SXM4 heatsink on it?
I have seen multiple people using sxm4 heatsink on them on xianyu
 

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
Can somebody tell me what heatsink this card uses? SXM4 doesn't have the pads for the VRMs like SXM2 does, and also overlapping the image of an SXM4 heatsink with the card seems to show that the mounting points still don't match (although I found only a pretty bad image of the SXM4 from the bottom). Has anyone actually tried to fit an SXM4 heatsink on it?
some SXM2 heatsinks dont have pads for the VRMs either (the copper 2U ones). I'm guessing they are intended to be cooled by massive chassis airflow instead. but i do like that the 3U SXM2 heatsinks have the VRM cooling directly so it's not much of a worry and you can run quieter fans.
 

pingyuin

New Member
Oct 30, 2024
3
0
1
Can somebody tell me what heatsink this card uses? SXM4 doesn't have the pads for the VRMs like SXM2 does, and also overlapping the image of an SXM4 heatsink with the card seems to show that the mounting points still don't match (although I found only a pretty bad image of the SXM4 from the bottom). Has anyone actually tried to fit an SXM4 heatsink on it?
This Bykski N-N...-Taobao Malaysia looks exactly what you gonna need, isn't it?

BTW, haven't you tested MIG option https://docs.nvidia.com/datacenter/tesla/mig-user-guide/ in A100/32GB; is it working at all in this card or it has to be A100 equipped with 40/80GB?
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,228
424
83
*(i would dissuade from getting those automotive a100 unless you need fp16 tensor, they aren't all that stronger than v100)
 

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
*(i would dissuade from getting those automotive a100 unless you need fp16 tensor, they aren't all that stronger than v100)
What about TF32?

32GB of faster memory is nice too.

cheapest A100 Drive SXM2 on eBay is about $1200
Cheapest V100 32GB SXM2 on eBay is about $500
 
Last edited:

xdever

New Member
Jun 29, 2021
18
0
1
*(i would dissuade from getting those automotive a100 unless you need fp16 tensor, they aren't all that stronger than v100)
I can't run tests now because the GPU is without a heatsink in a different country than I am, but what I can confirm is that when getting gradients through LLama 2 7B loaded in bfloat16 without any quantization, it's almost as fast as the real A100, and it is much faster than a 3090. And the 3090 is much faster than the V100. On top of it, V100 supports only float16 but not bfloat16, which means that you need gradient scaling in order to keep the range of the gradients in meaningful bounds and keep the training stable. Also, OpenAI is not very keen on maintaining Triton for V100, which is the basis of torch.compile(). (for more than half of a year, they had a bug that resulted in returning 0s for all matmuls V100 tensor cores).

If I still have access to the compute cluster of my old lab, which had V100s, I can run a direct benchmark against V100s in February, when I will be near my GPU/desktop machine again. I trained NNs in mixed precision on V100s, 3090s, and real A100s for years, and I'm pretty sure it is way faster.

This Bykski N-N...-Taobao Malaysia looks exactly what you gonna need, isn't it?
Kind of looks good with a bit of filing (see my other sxm2 heatsink that I used for my tests), but how do I get access to this without being in China?

BTW, haven't you tested MIG option NVIDIA Multi-Instance GPU User Guide r560 in A100/32GB; is it working at all in this card or it has to be A100 equipped with 40/80GB?
I haven't, 32Gb of memory is not that much :).
 

pingyuin

New Member
Oct 30, 2024
3
0
1
Kind of looks good with a bit of filing (see my other sxm2 heatsink that I used for my tests), but how do I get access to this without being in China?
Dunno, why not ask them if they are willing to ship it to you? In this case you probably gonna need to register on taobao, but it's rather simple task if you choose so called 'business account'. And also the same N-NVV100-NVLink-X thing is easily available on eBay, but sadly not for the same money.

There is another option Superbuy - The Best Taobao Agent Help You Shop,Shipping From China
 
Last edited:

CyklonDX

Well-Known Member
Nov 8, 2022
1,228
424
83
What about TF32?
I can't run tests now because the GPU

A100 Drive 32GB SXM2 & A100 Tesla 40G SXM4
HBM2e 1.87TB/s (latency, better but still bad - i'm guessing but it will likely top out at ~750GB/s-1.4TB/s, A100 can't reach full throughput due to locality issue.)
FP16 77.97 TFlops (Only good reason to buy this - good for AI)
FP32 19.49 TFlops
FP64 9.7 TFlops
TDP 400W

V100 Tesla 32GB SXM2
HBM2 898GB/s (latency)
FP16 33.3 TFlops
FP32 15.6 TFlops
FP64 7.8 TFlops
TDP 250W


A 3090 24GB Turbo/Blower
GDDR6X 938 GB/s (better than v100, including better latency - con lack of ecc - but do note ecc does slow your operations by some 20-25%)
FP16 35.5 TFLops
FP32 35.58 TFlops (almost twice as A100 Drive/Automotive model)
FP64 556 GFLops
TDP 350W
 
Last edited:
  • Like
Reactions: piranha32

gsrcrxsi

Active Member
Dec 12, 2018
367
121
43
A100 Drive 32GB SXM2 & A100 Tesla 40G SXM4
HBM2e 1.87TB/s (latency, better but still bad - i'm guessing but it will likely top out at ~750GB/s-1.4TB/s, A100 can't reach full throughput due to locality issue.)
FP16 77.97 TFlops (Only good reason to buy this - good for AI)
FP32 19.49 TFlops
FP64 9.7 TFlops
TDP 400W

V100 Tesla 32GB SXM2
HBM2 898GB/s (latency)
FP16 33.3 TFlops
FP32 15.6 TFlops
FP64 7.8 TFlops
TDP 250W


A 3090 24GB Turbo/Blower
GDDR6X 938 GB/s (better than v100, including better latency - con lack of ecc - but do note ecc does slow your operations by some 20-25%)
FP16 35.5 TFLops
FP32 35.58 TFlops (almost twice as A100 Drive/Automotive model)
FP64 556 GFLops
TDP 350W
im not sure this answers my question about TF32.

as I understand it, TF32 is a tensor accelerated version of FP32, and first available in the A100 (maybe all Ampere), it’s not available on Volta at all. My suggestion was that if you can do TF32, that might be another reason to get this, and not only FP16

 

xdever

New Member
Jun 29, 2021
18
0
1
FP32 35.58 TFlops (almost twice as A100 Drive/Automotive model)
This is true for the full A100 as well: For FP32, the gaming GPUs, or the A6000-series, are significantly faster.

FP16 77.97 TFlops (Only good reason to buy this - good for AI)
These are non-tensor core TFlops. The Tensor cores should be around 270-280 Tflops for the Drive A100. The 3090 should also be around ~71 with BF16 I/O and FP32 accumulated (typically used for NNs) and 142 with FB16 I/O and BF16 accumulate.

Another significant difference between the gaming GPUs and the A100s series is that for the gaming ones if you use FP32 accumulate then the tensorcore flops are halved, while for the A100s it's identical to BF16 accumulate.

Again, I don't have access to the GPU now, but when I had, I actively used it for 2 weeks along with the real A100s in our cluster, compared the runtime of my models with the full training loop, and the only difference I noticed is consistent with having 96 SMs vs 108SMs for the real A100.
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,228
424
83
im not sure this answers my question about TF32.
On paper you can expect around 150TFlops in TF32 (18bit)
A100 SXM4 40G card has TF32 155TFlops


// The drive card has more rops over tesla A100 models, which should increase its performance processing images in int8 over normal A100 cards.
(Like resnet50 which typically runs with int8 precision -- non-image processing wise it might be bit slower than 40Gig SXM4 card)

This would be good comparison for AI workloads
1730833222280.png
(you can potentially expect int8, int4 and binary tops to be some 15-30% faster on the drive card -- i don't see them meaning quite a lot over 12bit precision)
 
Last edited:
  • Like
Reactions: piranha32

Leiko

New Member
Aug 15, 2021
16
0
1
So I can now confirm. The Screw holes are 3.6cm x 6.9cm apart when they should be 3.2cm x 6.9cm apart for normal sxm2/sxm4 heatsinks.
And about the performance, this has a ga100 which can do fp16 tc /w fp32 acc at the same speed as fp16 acc (which is not the case for consumer gpus even 4090 is nerfed to half performance for this).
It should (if you get the drive a100 version and not the ES / QS pg199) be around 10% slower than real a100 when doing fp16 math.