Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Leiko

Member
Aug 15, 2021
38
7
8
On paper you can expect around 150TFlops in TF32 (18bit)
A100 SXM4 40G card has TF32 155TFlops


// The drive card has more rops over tesla A100 models, which should increase its performance processing images in int8 over normal A100 cards.
(Like resnet50 which typically runs with int8 precision -- non-image processing wise it might be bit slower than 40Gig SXM4 card)

This would be good comparison for AI workloads
View attachment 39905
(you can potentially expect int8, int4 and binary tops to be some 15-30% faster on the drive card -- i don't see them meaning quite a lot over 12bit precision)
where did you find the correct ROPs count ? The one on gpu database is wrong (the number of SMs is also wrong there)
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,639
584
113
I must've read it wrong / clearly shows 128 rops and 192 tmu's.
1732633109883.png

must've mistaken it in my memory with this one here


// its even weaker than i remembered. *(i've got that gpu-z screen from some chinese site - don't have remember where anymore)
 
Last edited:

Leiko

Member
Aug 15, 2021
38
7
8
I must've read it wrong / clearly shows 128 rops and 192 tmu's.
View attachment 40186

must've mistaken it in my memory with this one here


// its even weaker than i remembered. *(i've got that gpu-z screen from some chinese site - don't have remember where anymore)
I'm pretty sure you were correct about it being better for int8 than regular A1001732747759386.png1732747773723.png
 

xdever

Member
Jun 29, 2021
34
4
8
How did you solve it? I bought the water block but I couldn't test it yet. If you use water cooling, what kind of radiator/pump/fans do you use to cool down all the cards together?
 

pingyuin

New Member
Oct 30, 2024
13
7
3
How did you solve it? I bought the water block but I couldn't test it yet. If you use water cooling, what kind of radiator/pump/fans do you use to cool down all the cards together?
There is one very effective way of watercooling in winter time, that is 'tap water with restriction'. For example with restriction figures no more than 15 litres per hour of waste core temperature is less than 50℃
Code:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA DRIVE-PG199-PROD        Off |   00000000:02:00.0 Off |                    0 |
| N/A   48C    P0            399W /  N/A  |   14623MiB /  32768MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      7623      C   ./gst                                       14610MiB |
+-----------------------------------------------------------------------------------------+
 

gsrcrxsi

Active Member
Dec 12, 2018
428
146
43
are you able to power limit these cards?
There is one very effective way of watercooling in winter time, that is 'tap water with restriction'. For example with restriction figures no more than 15 litres per hour of waste core temperature is less than 50℃
Code:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA DRIVE-PG199-PROD        Off |   00000000:02:00.0 Off |                    0 |
| N/A   48C    P0            399W /  N/A  |   14623MiB /  32768MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      7623      C   ./gst                                       14610MiB |
+-----------------------------------------------------------------------------------------+
what heatsink did you use?
 

pingyuin

New Member
Oct 30, 2024
13
7
3
are you able to power limit these cards?

what heatsink did you use?
No. Power limiting is not working fo this card (at least through nvidia-smi interface)
Code:
[pingyuin@ml ~]$ nvidia-smi -i 0 -pl 300
Changing power management limit is not supported in current scope for GPU: 00000000:02:00.0.
All done.

[pingyuin@ml ~]$ sudo nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE

==============NVSMI LOG==============

Timestamp                                 : Mon Dec 30 13:03:21 2024
Driver Version                            : 565.57.01
CUDA Version                              : 12.7

Attached GPUs                             : 1
GPU 00000000:02:00.0
    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 462 MiB
        Used                              : 14627 MiB
        Free                              : 17681 MiB
    BAR1 Memory Usage
        Total                             : 16384 MiB
        Used                              : 5 MiB
        Free                              : 16379 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 100 %
        Memory                            : 100 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : 0 %
        OFA                               : 0 %
    GPU Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 100 %
        Min                               : 91 %
        Avg                               : 99 %
    Memory Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 100 %
        Min                               : 33 %
        Avg                               : 85 %
    ENC Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    DEC Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    JPG Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    OFA Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    GPU Power Readings
        Power Draw                        : 408.05 W
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Power Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    GPU Memory Power Readings
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Clocks
        Graphics                          : 1260 MHz
        SM                                : 1260 MHz
        Memory                            : 1404 MHz
        Video                             : 1140 MHz
    Applications Clocks
        Graphics                          : 1260 MHz
        Memory                            : 1404 MHz
    Default Applications Clocks
        Graphics                          : 1260 MHz
        Memory                            : 1404 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1275 MHz
        SM                                : 1275 MHz
        Memory                            : 1404 MHz
        Video                             : 1155 MHz
    Max Customer Boost Clocks
        Graphics                          : 1260 MHz
    SM Clock Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    Memory Clock Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
I use this water block Bykski N-N...-Taobao Malaysia
 

gsrcrxsi

Active Member
Dec 12, 2018
428
146
43
That’s an SXM2 waterblock (I have some of those too). But earlier posts indicated that sxm2 heatsinks would need to be modified by filing out the screw holes since the hole spacing was 4mm wider on the A100 Drive. Did you have the same experience? Did you have to do that modification?

mind sharing the PCIe adapter board you used too?
 

efschu3

Active Member
Mar 11, 2019
195
80
28
Code:
from pynvml import *
nvmlInit()
device = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceSetGpuLockedClocks(device,10,135)
nvmlDeviceSetMemClkVfOffset(device,0)
nvmlDeviceSetGpcClkVfOffset(device,0)
nvmlDeviceSetPowerManagementLimit(device,300000)
I reduce idle power usage of my V100s with above script. It may work with the A100
 
  • Like
Reactions: blackcat1402

pingyuin

New Member
Oct 30, 2024
13
7
3
That’s an SXM2 waterblock (I have some of those too). But earlier posts indicated that sxm2 heatsinks would need to be modified by filing out the screw holes since the hole spacing was 4mm wider on the A100 Drive. Did you have the same experience? Did you have to do that modification?

mind sharing the PCIe adapter board you used too?
No. It's straightforward IMG_1135.JPGIMG_1136.JPG

I use this adapter card 商品详情
 
  • Like
Reactions: CyklonDX

xdever

Member
Jun 29, 2021
34
4
8
Interesting I was referencing this older post earlier in the thread. https://forums.servethehome.com/ind...sd-nvidia-drive-a100.43196/page-2#post-439472 where he needed to file out the holes on his SXM2 heatsink (I also have this same heatsink) to mount to his A100 Drive unit.

I wonder why you didn’t need to modify anything where he did? The heatsink frame doesn’t look any different between yours and his.
There are 2 variants of this heatsink. The normal SXM2 and what they call NVV100, which I can't find anything about. The NVV100 is the right size (36mm instead of the 32mm for SMX2). You can see the two options here.
 

gsrcrxsi

Active Member
Dec 12, 2018
428
146
43
Oh wow. That clears it up. I bought four of the NVV100 Bykski blocks to use on my AOM-SXMV setup with 4x V100s but I ended up sticking with the 3U air coolers instead since they’re more than adequate. I didn’t even realize they were a different size since I never attempted to mount them. I just assumed they’d fit since they had V100 in the name. I’m guessing the NVV100-NVLink-X might be for the SXM3 V100?

pic comparing the 3U SXM2 air cooler (left) to the NVV100 waterblock (right).


I guess I have the right heatsinks already if I ever want to use these lol
 
Last edited:

CyklonDX

Well-Known Member
Nov 8, 2022
1,639
584
113
Just curious, what cars have this?
I hear mercedes benz has it.

*it lists here a lot of chinese brands, and ai brands (but most known would be mercedes, volvo, hyundai.)
// tesla was using amd once upon time - but no clue if its still the case. *(there were hints of tesla investing into mi300's a while ago for their self-driving systems but who knows really it may be for spacex as it makes more sense there with their fp64 pipe - elon likely has hands in all cookie jars.)
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
428
146
43
finally got my SXM2 adapter board to test this thing out. I got a production unit, came sealed in the antistatic bag, in the original Nvidia box (matching serials). The card still shows the "QS" part number on the die though like pingyuin shows.

Can confirm that power limiting does not work. but that might be a driver issue. it seems like some kind of software lock. when you google the error, you get a lot of laptop users that had the same problem with newer driver versions. unknown if any specific driver will allow power limiting though. mostly for the loads I'm running, the locked max core clock being low pretty much limits power use, similar thing that I see on my titan Vs (locked to 1335)

interesting behavior with the clock speeds. seems to be stuck in P0 state all the time.
Max core clock 1260MHz (down from A100, 1410MHz boost)
Mem clock 1404MHz (up from A100, 1215MHz)
so slower core clock and faster mem clock. I wonder if they boosted the memory clock to try and compensate for losing an HBM stack (only 4 here, vs 5 on A100)

temps are a little higher than I would have expected being on water cooling, about 60C when running a ~250W load. maybe the heatsink doesnt have a solid mount or just because of the heat spreader and not direct die contact.

so far for my uses, it's about 50% faster than a V100, and like more than twice as fast as a Titan V (I'm not doing much AI/ML stuff, more memory bound FP32/FP64 stuff). for some of my work types, it's basically as fast as a 4090, but with more VRAM.



 

gsrcrxsi

Active Member
Dec 12, 2018
428
146
43
Oh, and I dont think anyone else mentioned it. but the A100-Drive does not come with captive screws on the SXM2 unit itself like the V100/P100 does

so you need:
4x M3-0.5 16mm long
4x M3-0.5 12mm long
 
  • Like
Reactions: blackcat1402

gsrcrxsi

Active Member
Dec 12, 2018
428
146
43
I don't run Windows, so no GPU-z. the specs are basically as others have reported.
(96 SMs \ 6144 cores \ 32GB VRAM \ 32MB L2 cache)

This system:
Xeon E5-2697Av4 16c/32t, 64GB DDR4-2400
Asus X99-E WS 10G (PCIe gen 3.0x16)
Ubuntu 24.04, 6.12 mainline kernel, Nvidia 550.135 driver

I don't really run much AI stuff. but I did run this benchmark to actually stress the card. 240 TFLops in this highly specific test. but it puts it about 33% better than a RTX4090 in absolute best case AI performance.



I think the most I saw was ~440W. the card probably never throttles until it hits some temp limit. I was a little worried about not being able to power limit the card, but since the clocks are locked at just 1260MHz anyway, it doesnt seem to matter on all of the things i'm actually doing.

BOINC performance:
GPUGRID ATMML (MPS@40%, 3x tasks) - 20M ppd - 155W
GPUGRID Quantum Chemistry - TBD
Einstein BRP7 (custom optimized app, MPS@40%, 3x tasks) - 3.0M ppd - 235W
Einstein O3AS (v1.15 app, MPS@40%, 3x tasks) - 2.8M ppd - 225W
Minecraft@home (no MPS, 1x task) - 1.0M ppd - 95W

ran some Primegrid numbers, but honestly their OpenCL apps aren't well suited for this card. it's not "slow" just kind of middle of the road performance. might be decent efficiency though with low-ish power draw. Primegrid apps love a lot of cores and a lot of L2 cache, that's where the 40-series (and probably 50-series) shines.

Primegrid:
AP27 (no MPS, 1x) - 1.5M ppd - 175W
GFN17 (no MPS, 1x) - 230K ppd - 114W
GFN19 (no MPS, 1x) - 488K ppd - 180W