Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Leiko · Nov 26, 2024

CyklonDX said:
On paper you can expect around 150TFlops in TF32 (18bit)
A100 SXM4 40G card has TF32 155TFlops

// The drive card has more rops over tesla A100 models, which should increase its performance processing images in int8 over normal A100 cards.
(Like resnet50 which typically runs with int8 precision -- non-image processing wise it might be bit slower than 40Gig SXM4 card)

This would be good comparison for AI workloads
View attachment 39905
(you can potentially expect int8, int4 and binary tops to be some 15-30% faster on the drive card -- i don't see them meaning quite a lot over 12bit precision)

where did you find the correct ROPs count ? The one on gpu database is wrong (the number of SMs is also wrong there)

CyklonDX · Nov 26, 2024

I must've read it wrong / clearly shows 128 rops and 192 tmu's.

must've mistaken it in my memory with this one here

'Enhanced' Nvidia A100 GPUs appear in China's second-hand market — new cards surpass sanctioned counterparts with 7,936 CUDA cores and 96GB HBM2 memory

These A100 cards come with 15% more CUDA cores and 20% more HBM2 than the 'normal' A100 PCIe.

www.tomshardware.com

// its even weaker than i remembered. *(i've got that gpu-z screen from some chinese site - don't have remember where anymore)

Leiko · Nov 27, 2024

CyklonDX said:
I must've read it wrong / clearly shows 128 rops and 192 tmu's.
View attachment 40186

must've mistaken it in my memory with this one here

'Enhanced' Nvidia A100 GPUs appear in China's second-hand market — new cards surpass sanctioned counterparts with 7,936 CUDA cores and 96GB HBM2 memory

These A100 cards come with 15% more CUDA cores and 20% more HBM2 than the 'normal' A100 PCIe.

www.tomshardware.com

// its even weaker than i remembered. *(i've got that gpu-z screen from some chinese site - don't have remember where anymore)

I'm pretty sure you were correct about it being better for int8 than regular A100

limstash · Dec 24, 2024

We installed it through the PCIe adapter and it worked fine, but heat dissipation was a big problem

xdever · Dec 24, 2024

How did you solve it? I bought the water block but I couldn't test it yet. If you use water cooling, what kind of radiator/pump/fans do you use to cool down all the cards together?

pingyuin · Dec 29, 2024

xdever said:
How did you solve it? I bought the water block but I couldn't test it yet. If you use water cooling, what kind of radiator/pump/fans do you use to cool down all the cards together?

There is one very effective way of watercooling in winter time, that is 'tap water with restriction'. For example with restriction figures no more than 15 litres per hour of waste core temperature is less than 50℃

Code:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA DRIVE-PG199-PROD        Off |   00000000:02:00.0 Off |                    0 |
| N/A   48C    P0            399W /  N/A  |   14623MiB /  32768MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      7623      C   ./gst                                       14610MiB |
+-----------------------------------------------------------------------------------------+

gsrcrxsi · Dec 29, 2024

are you able to power limit these cards?

pingyuin said:

There is one very effective way of watercooling in winter time, that is 'tap water with restriction'. For example with restriction figures no more than 15 litres per hour of waste core temperature is less than 50℃

Code:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA DRIVE-PG199-PROD        Off |   00000000:02:00.0 Off |                    0 |
| N/A   48C    P0            399W /  N/A  |   14623MiB /  32768MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      7623      C   ./gst                                       14610MiB |
+-----------------------------------------------------------------------------------------+

what heatsink did you use?

pingyuin · Dec 30, 2024

gsrcrxsi said:
are you able to power limit these cards?

what heatsink did you use?

No. Power limiting is not working fo this card (at least through nvidia-smi interface)

Code:

[pingyuin@ml ~]$ nvidia-smi -i 0 -pl 300
Changing power management limit is not supported in current scope for GPU: 00000000:02:00.0.
All done.

[pingyuin@ml ~]$ sudo nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE

==============NVSMI LOG==============

Timestamp                                 : Mon Dec 30 13:03:21 2024
Driver Version                            : 565.57.01
CUDA Version                              : 12.7

Attached GPUs                             : 1
GPU 00000000:02:00.0
    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 462 MiB
        Used                              : 14627 MiB
        Free                              : 17681 MiB
    BAR1 Memory Usage
        Total                             : 16384 MiB
        Used                              : 5 MiB
        Free                              : 16379 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 100 %
        Memory                            : 100 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : 0 %
        OFA                               : 0 %
    GPU Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 100 %
        Min                               : 91 %
        Avg                               : 99 %
    Memory Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 100 %
        Min                               : 33 %
        Avg                               : 85 %
    ENC Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    DEC Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    JPG Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    OFA Utilization Samples
        Duration                          : 14.02 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    GPU Power Readings
        Power Draw                        : 408.05 W
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Power Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    GPU Memory Power Readings
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Clocks
        Graphics                          : 1260 MHz
        SM                                : 1260 MHz
        Memory                            : 1404 MHz
        Video                             : 1140 MHz
    Applications Clocks
        Graphics                          : 1260 MHz
        Memory                            : 1404 MHz
    Default Applications Clocks
        Graphics                          : 1260 MHz
        Memory                            : 1404 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1275 MHz
        SM                                : 1275 MHz
        Memory                            : 1404 MHz
        Video                             : 1155 MHz
    Max Customer Boost Clocks
        Graphics                          : 1260 MHz
    SM Clock Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    Memory Clock Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A

I use this water block Bykski N-N...-Taobao Malaysia

gsrcrxsi · Dec 30, 2024

That’s an SXM2 waterblock (I have some of those too). But earlier posts indicated that sxm2 heatsinks would need to be modified by filing out the screw holes since the hole spacing was 4mm wider on the A100 Drive. Did you have the same experience? Did you have to do that modification?

mind sharing the PCIe adapter board you used too?

efschu3 · Dec 30, 2024

Code:

from pynvml import *
nvmlInit()
device = nvmlDeviceGetHandleByIndex(0)
nvmlDeviceSetGpuLockedClocks(device,10,135)
nvmlDeviceSetMemClkVfOffset(device,0)
nvmlDeviceSetGpcClkVfOffset(device,0)
nvmlDeviceSetPowerManagementLimit(device,300000)

I reduce idle power usage of my V100s with above script. It may work with the A100

pingyuin · Dec 30, 2024

gsrcrxsi said:
That’s an SXM2 waterblock (I have some of those too). But earlier posts indicated that sxm2 heatsinks would need to be modified by filing out the screw holes since the hole spacing was 4mm wider on the A100 Drive. Did you have the same experience? Did you have to do that modification?

mind sharing the PCIe adapter board you used too?

No. It's straightforward

I use this adapter card 商品详情

gsrcrxsi · Dec 30, 2024

Interesting I was referencing this older post earlier in the thread. https://forums.servethehome.com/ind...sd-nvidia-drive-a100.43196/page-2#post-439472 where he needed to file out the holes on his SXM2 heatsink (I also have this same heatsink) to mount to his A100 Drive unit.

I wonder why you didn’t need to modify anything where he did? The heatsink frame doesn’t look any different between yours and his.

xdever · Dec 30, 2024

gsrcrxsi said:
Interesting I was referencing this older post earlier in the thread. https://forums.servethehome.com/ind...sd-nvidia-drive-a100.43196/page-2#post-439472 where he needed to file out the holes on his SXM2 heatsink (I also have this same heatsink) to mount to his A100 Drive unit.

I wonder why you didn’t need to modify anything where he did? The heatsink frame doesn’t look any different between yours and his.

There are 2 variants of this heatsink. The normal SXM2 and what they call NVV100, which I can't find anything about. The NVV100 is the right size (36mm instead of the 32mm for SMX2). You can see the two options here.

gsrcrxsi · Dec 30, 2024

Oh wow. That clears it up. I bought four of the NVV100 Bykski blocks to use on my AOM-SXMV setup with 4x V100s but I ended up sticking with the 3U air coolers instead since they’re more than adequate. I didn’t even realize they were a different size since I never attempted to mount them. I just assumed they’d fit since they had V100 in the name. I’m guessing the NVV100-NVLink-X might be for the SXM3 V100?

pic comparing the 3U SXM2 air cooler (left) to the NVV100 waterblock (right).

http://imgur.com/a/kA8dDWx

I guess I have the right heatsinks already if I ever want to use these lol

d450d112 · Jan 14, 2025

Just curious, what cars have this?

CyklonDX · Jan 14, 2025

d450d112 said:
Just curious, what cars have this?

I hear mercedes benz has it.

NVIDIA DRIVE Partner Ecosystem

Accelerating automotive breakthroughs

www.nvidia.com

*it lists here a lot of chinese brands, and ai brands (but most known would be mercedes, volvo, hyundai.)
// tesla was using amd once upon time - but no clue if its still the case. *(there were hints of tesla investing into mi300's a while ago for their self-driving systems but who knows really it may be for spacex as it makes more sense there with their fp64 pipe - elon likely has hands in all cookie jars.)

gsrcrxsi · Jan 17, 2025

finally got my SXM2 adapter board to test this thing out. I got a production unit, came sealed in the antistatic bag, in the original Nvidia box (matching serials). The card still shows the "QS" part number on the die though like pingyuin shows.

Can confirm that power limiting does not work. but that might be a driver issue. it seems like some kind of software lock. when you google the error, you get a lot of laptop users that had the same problem with newer driver versions. unknown if any specific driver will allow power limiting though. mostly for the loads I'm running, the locked max core clock being low pretty much limits power use, similar thing that I see on my titan Vs (locked to 1335)

interesting behavior with the clock speeds. seems to be stuck in P0 state all the time.
Max core clock 1260MHz (down from A100, 1410MHz boost)
Mem clock 1404MHz (up from A100, 1215MHz)
so slower core clock and faster mem clock. I wonder if they boosted the memory clock to try and compensate for losing an HBM stack (only 4 here, vs 5 on A100)

temps are a little higher than I would have expected being on water cooling, about 60C when running a ~250W load. maybe the heatsink doesnt have a solid mount or just because of the heat spreader and not direct die contact.

so far for my uses, it's about 50% faster than a V100, and like more than twice as fast as a Titan V (I'm not doing much AI/ML stuff, more memory bound FP32/FP64 stuff). for some of my work types, it's basically as fast as a 4090, but with more VRAM.

gsrcrxsi · Jan 17, 2025

Oh, and I dont think anyone else mentioned it. but the A100-Drive does not come with captive screws on the SXM2 unit itself like the V100/P100 does

so you need:
4x M3-0.5 16mm long
4x M3-0.5 12mm long

CyklonDX · Jan 17, 2025

Care to provide some gpu-z, and some performance outlook?

Thanks

gsrcrxsi · Jan 17, 2025

I don't run Windows, so no GPU-z. the specs are basically as others have reported.
(96 SMs \ 6144 cores \ 32GB VRAM \ 32MB L2 cache)

This system:
Xeon E5-2697Av4 16c/32t, 64GB DDR4-2400
Asus X99-E WS 10G (PCIe gen 3.0x16)
Ubuntu 24.04, 6.12 mainline kernel, Nvidia 550.135 driver

I don't really run much AI stuff. but I did run this benchmark to actually stress the card. 240 TFLops in this highly specific test. but it puts it about 33% better than a RTX4090 in absolute best case AI performance.

I think the most I saw was ~440W. the card probably never throttles until it hits some temp limit. I was a little worried about not being able to power limit the card, but since the clocks are locked at just 1260MHz anyway, it doesnt seem to matter on all of the things i'm actually doing.

BOINC performance:
GPUGRID ATMML (MPS@40%, 3x tasks) - 20M ppd - 155W
GPUGRID Quantum Chemistry - TBD
Einstein BRP7 (custom optimized app, MPS@40%, 3x tasks) - 3.0M ppd - 235W
Einstein O3AS (v1.15 app, MPS@40%, 3x tasks) - 2.8M ppd - 225W
Minecraft@home (no MPS, 1x task) - 1.0M ppd - 95W

ran some Primegrid numbers, but honestly their OpenCL apps aren't well suited for this card. it's not "slow" just kind of middle of the road performance. might be decent efficiency though with low-ish power draw. Primegrid apps love a lot of cores and a lot of L2 cache, that's where the 40-series (and probably 50-series) shines.

Primegrid:
AP27 (no MPS, 1x) - 1.5M ppd - 175W
GFN17 (no MPS, 1x) - 230K ppd - 114W
GFN19 (no MPS, 1x) - 488K ppd - 180W

Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Member

Well-Known Member

Member

New Member

Member

New Member

Active Member

New Member

Active Member

Active Member

New Member

Active Member

Member

Active Member

New Member

Well-Known Member

Active Member

Active Member

Well-Known Member

Active Member