Automotive A100 SXM2 for FSD? (NVIDIA DRIVE A100)

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Leiko

Member
Aug 15, 2021
38
6
8
I am still just getting info from the community member who designs heatsinks and mods for the PG199.

Once I get my hands on the cards and heatsink in a few days I will post more updates.

I did ask them, they said my heatsink comes with the A100 golden hood.
A100 sxm4’s heatsink has both the wrong screw distance and nothing for the side components.
I was considering them until I found some huge sxm2 ones with the necessary extruded edges to cool the side components.

though depending on where you got your cards from, the sellers sometimes put a custom bracket which has standard sxm2 screw spacing (some chinese sellers do)
 
  • Like
Reactions: gsrcrxsi

gsrcrxsi

Active Member
Dec 12, 2018
420
141
43
I saw that one on XianYu. Too expensive though. And I think it might not be very good cooling.
 

xdever

Member
Jun 29, 2021
34
4
8
Has anyone here tested AOM-SXMV with these cards? Do they work? I guess giving up half the bandwidth because the 2 instead of 4 PCIe connections is worth it only if the NVLINK works. Does it work?
 

pingyuin

New Member
Oct 30, 2024
11
5
3
new pg199 adapter in Chinese community but only with limited numbers for non-China players, it is a pity they are out of stock now:
Sadly, it doesn't look right thing to use with PG199:
- no wapor chamber
- no pads for VRMs
- two 8pin power connectors are considered to be enough only for 360W (this can draw nearly .5kW)

BUT, one thing looks particularly promising, that is headers for NVlink; can you test if they work at all?
 

gsrcrxsi

Active Member
Dec 12, 2018
420
141
43
i thought you needed to use both mezzanine connectors for nvlink to work, but that board only uses one.

and is the heatsink even for a A100 drive? the XianYu listing only says V100, so maybe the heatsink mount hole spacing is not right for A100
 
Last edited:

xdever

Member
Jun 29, 2021
34
4
8
i thought you needed to use both mezzanine connectors for nvlink to work, but that board only uses one.
You do need them both. All the NVLINK signals are on the other one on V100/P100. But I'm unsure if the card itself has the NVLINK routed to the connector based on some of the comments on this thread.
 

Leiko

Member
Aug 15, 2021
38
6
8
You do need them both. All the NVLINK signals are on the other one on V100/P100. But I'm unsure if the card itself has the NVLINK routed to the connector based on some of the comments on this thread.
These cards are used in pairs in some cars with nvlink being used (thats what a friend told me and I trust him)
 
  • Like
Reactions: blackcat1402

blackcat1402

New Member
Dec 10, 2024
13
2
3
I saw that one on XianYu. Too expensive though. And I think it might not be very good cooling.
This version uses a 5-heat pipe cooling solution and increases the size of the copper heat sink. The seller says it is better than the traditional turbofan + 3 heat pipe solution.
 

gsrcrxsi

Active Member
Dec 12, 2018
420
141
43
This version uses a 5-heat pipe cooling solution and increases the size of the copper heat sink. The seller says it is better than the traditional turbofan + 3 heat pipe solution.
But it makes no contact with the VRMs. That’s a problem.

it might be better than the wildly bad 3-heat pipe version (some of them even have 0 heat pipes, just the copper block). But it’s still not enough to cool a 400W+ card I think.
 
  • Like
Reactions: blackcat1402

blackcat1402

New Member
Dec 10, 2024
13
2
3
Sadly, it doesn't look right thing to use with PG199:
- no wapor chamber
- no pads for VRMs
- two 8pin power connectors are considered to be enough only for 360W (this can draw nearly .5kW)

BUT, one thing looks particularly promising, that is headers for NVlink; can you test if they work at all?
it have 5 heat pipes, i understand they are vapor chambers as you side.
there indeed no pads for VRMs, I purchased a large piece of 18.5w/mk thermal pad with 8mm thick and mannual cut them into proper size and thickness for adaptation. Also, I use 19.3w/mk thermal paste. it can alleviate the situation but i cannot solve the thermal issue for long time running.
for the 3rd question, i fully agree with you, it indeed have power limitations due to 8 pin power limit.
 

blackcat1402

New Member
Dec 10, 2024
13
2
3
i thought you needed to use both mezzanine connectors for nvlink to work, but that board only uses one.

and is the heatsink even for a A100 drive? the XianYu listing only says V100, so maybe the heatsink mount hole spacing is not right for A100
you are right.The dimensions of these heat sinks are the same. The hole spacing of the mounting screw holes for V100 is 3.2cm, and the hole spacing of PG199 A100 is 3.6cm. They are not compatible. When I wanted to buy another PG199 heat sink, the seller told me that someone had bought the last three last night. Because of the different circulation, the only heat sinks left are those that fit the hole spacing of V100. I am planning to find a way to drill holes and transform V100 into P199.
 

pingyuin

New Member
Oct 30, 2024
11
5
3
you are right.The dimensions of these heat sinks are the same. The hole spacing of the mounting screw holes for V100 is 3.2cm, and the hole spacing of PG199 A100 is 3.6cm. They are not compatible. When I wanted to buy another PG199 heat sink, the seller told me that someone had bought the last three last night. Because of the different circulation, the only heat sinks left are those that fit the hole spacing of V100. I am planning to find a way to drill holes and transform V100 into P199.
I'm not quite follow you why struggle with something which won't work properly anyway? You can purchase the right waterblock Bykski N-N...-Taobao Malaysia and it just works no hassle. Sometimes they have this discounted like 500RMB.
Yes, you'd have to build custom waterloop but it's completely doable no worries.
The correct adapter 商品详情 (do not forget to purchase the backplate); sad to say it has become quite pricey

Found another option for PG199 闲鱼 - 闲不住?上闲鱼! with XT60 (like 60 Amps RMS) connector, if you have server PSU for this (most of them are quite noisy, truth to be told)
 
Last edited:
  • Like
Reactions: blackcat1402

pingyuin

New Member
Oct 30, 2024
11
5
3
it have 5 heat pipes, i understand they are vapor chambers as you side.
there indeed no pads for VRMs, I purchased a large piece of 18.5w/mk thermal pad with 8mm thick and mannual cut them into proper size and thickness for adaptation. Also, I use 19.3w/mk thermal paste. it can alleviate the situation but i cannot solve the thermal issue for long time running.
for the 3rd question, i fully agree with you, it indeed have power limitations due to 8 pin power limit.
It really depends on what you are going to do with PG199: eg. for model training power draw does not look scary at all (like 220W median)
 
  • Like
Reactions: blackcat1402

xdever

Member
Jun 29, 2021
34
4
8
It really depends on what you are going to do with PG199: eg. for model training power draw does not look scary at all (like 220W median)
That's my observation as well, but the reason is a bit sad: the RAM is not enough to train a big enough model that can saturate the tensor cores. If you finetune only some early layer or embedding or LoRA on a 7-8B model, that is still doable and the power draw goes above 400W constantly.
 
  • Like
Reactions: blackcat1402

blackcat1402

New Member
Dec 10, 2024
13
2
3
I'm not quite follow you why struggle with something which won't work properly anyway? You can purchase the right waterblock Bykski N-N...-Taobao Malaysia and it just works no hassle. Sometimes they have this discounted like 500RMB.
Yes, you'd have to build custom waterloop but it's completely doable no worries.
The correct adapter 商品详情 (do not forget to purchase the backplate); sad to say it has become quite pricey

Found another option for PG199 闲鱼 - 闲不住?上闲鱼! with XT60 (like 60 Amps RMS) connector, if you have server PSU for this (most of them are quite noisy, truth to be told)
I agree with what you said. But I always have an inexplicable obsession, that is, to insist on air cooling and put all graphics cards into an ATX case.
 

jenapper

New Member
May 21, 2023
10
7
3
WhatsApp Image 2025-02-23 at 23.48.39.jpeg

I have received the GPUs, the server to host the GPUs, and the heatsinks.

The heatsinks are adapted onto the cards with a special bracket made by a seller on Xianyu.

I am running these GPUs on Inspur's NF5468m5, it supports up to 8 SXM2 GPUs.

I can confirm NVLink does not work, because they are missing the traces on these GPUs.

They are a bit too tall for the case so I am going to have to substitute with acrylic or something with cut outs, but the chassis provides enough power and cooling for them. The cards do run hot, and the auto fan tuning does not work well with these GPUs since they do not report their temperature back to BMC.

The heatsinks are also super scary to install, the screws require a substantial amount of force to mount, I have used 5 layers of 0.5mm pads (I know they don't work well stacking with each other, but that was all I had) to cool the VRMs.

The cards draw upwards of 400W each, so be prepared to rip those fans, in my testing so far 50% keeps them cool if the environment is 21-22 Celsius intake. Keep in mind I only have 4 of them currently, so it may change if you use more of them. The GPUs also fall of the bus when they reach just over 100C, so cooling is definitely a point to take note of.

I have not had any success power limiting them via nvidia-smi.

My seller also tells me they do not recognize on Dell servers, so I went with this Inspur model which seems to work alright aside from GPU temperature communication.
 

blackcat1402

New Member
Dec 10, 2024
13
2
3
View attachment 42055

I have received the GPUs, the server to host the GPUs, and the heatsinks.

The heatsinks are adapted onto the cards with a special bracket made by a seller on Xianyu.

I am running these GPUs on Inspur's NF5468m5, it supports up to 8 SXM2 GPUs.

I can confirm NVLink does not work, because they are missing the traces on these GPUs.

They are a bit too tall for the case so I am going to have to substitute with acrylic or something with cut outs, but the chassis provides enough power and cooling for them. The cards do run hot, and the auto fan tuning does not work well with these GPUs since they do not report their temperature back to BMC.

The heatsinks are also super scary to install, the screws require a substantial amount of force to mount, I have used 5 layers of 0.5mm pads (I know they don't work well stacking with each other, but that was all I had) to cool the VRMs.

The cards draw upwards of 400W each, so be prepared to rip those fans, in my testing so far 50% keeps them cool if the environment is 21-22 Celsius intake. Keep in mind I only have 4 of them currently, so it may change if you use more of them. The GPUs also fall of the bus when they reach just over 100C, so cooling is definitely a point to take note of.

I have not had any success power limiting them via nvidia-smi.

My seller also tells me they do not recognize on Dell servers, so I went with this Inspur model which seems to work alright aside from GPU temperature communication.
Thanks for sharing the experience. It is quite helpful. I also met "The GPUs also fall of the bus when they reach just over 100C" issue. It seems 550F1 product version may have lower power limit than QS and CS version. However, I need to do more experiments to confirm this.
 

jenapper

New Member
May 21, 2023
10
7
3
Thanks for sharing the experience. It is quite helpful. I also met "The GPUs also fall of the bus when they reach just over 100C" issue. It seems 550F1 product version may have lower power limit than QS and CS version. However, I need to do more experiments to confirm this.
I have had SOME luck with running:
nvidia-smi -ac 1404,1260

I think the GPUs are running less hot, but I haven't verified exactly. I applied those settings and the turned up the fans slightly simultaneously and they have been stable so far.
 
  • Like
Reactions: blackcat1402