SXM2 over PCIe

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Andrw

New Member
Sep 17, 2025
4
0
1
Just joined. I hope I'm ok to post here. I bought a 32gb sxm2 v100 and one if them 60 dollar adapter boards. It came in today. I've been pulling my hair out all night trying to get it to work. Thinking did I break it, am I an idiot. I'm a bit of a tech hoarder so I've had it in several machines. Finally it's showing up after swapping my x16 riser for an x8 riser. Any idea why? I know the x16 riser works.
How are you getting on? Have you tried it without riser? What was the problem? I have one 16GB card that works fine at x16. I'm waiting for another 32GB card and another PCIe adapter with three fan outputs. What length of riser did you use?
 

jaytee

New Member
Jul 22, 2025
3
0
1
How are you getting on? Have you tried it without riser? What was the problem? I have one 16GB card that works fine at x16. I'm waiting for another 32GB card and another PCIe adapter with three fan outputs. What length of riser did you use?
Why buy 32GB cards? Aren't they much more expensive?
 

jaytee

New Member
Jul 22, 2025
3
0
1
Yes. But some people want more than 16GB to run larger models or other things needing more vram
I understand.
I also want more vRAM. I bought multiple 16gb v100's because they are cheaper.
I just want to understand why some people buy 32gb v100's instead of multiple 16gb v100's since the 16gb are much cheaper - or not?
I could understand that decision if someone needs only 32gb because then you would have only one gpu and less problems.
But if you buy more than one, why spending more money?
I can understand spending more if you have not many PCIe-Slots - but if you have, why spending more money than you have to?
 

gsrcrxsi

Active Member
Dec 12, 2018
437
147
43
Not all tasks or models can easily scale across multiple GPUs. And GPU-GPU bandwidth (with or without nvlink) can be limiting.

There’s arguments for and against it. Just depends what your actual use case is.
 
  • Like
Reactions: Underscore

Andrw

New Member
Sep 17, 2025
4
0
1
I understand.
I also want more vRAM. I bought multiple 16gb v100's because they are cheaper.
I just want to understand why some people buy 32gb v100's instead of multiple 16gb v100's since the 16gb are much cheaper - or not?
I could understand that decision if someone needs only 32gb because then you would have only one gpu and less problems.
But if you buy more than one, why spending more money?
I can understand spending more if you have not many PCIe-Slots - but if you have, why spending more money than you have to?
Inference scenario:

Choose a single 32GB V100 If you work with large models or datasets that won't fit in 16GB. 3-10x tokens per second compared with 2x 16GB V100.

Choose two 16GB V100s If your models fit in 16GB and your primary constraint is parallelism. 2x request per second compared with 1x 32GB V100.
 

bill4

New Member
Jul 22, 2025
4
0
1
How are you getting on? Have you tried it without riser? What was the problem? I have one 16GB card that works fine at x16. I'm waiting for another 32GB card and another PCIe adapter with three fan outputs. What length of riser did you use?
It's been working fine, I'm able to run fairly sizable models at reasonable speeds. Still on 6 inch 8x riser. I tried several x16 risers none work. Never figured out why. It's in a very old server maybe that has something to do with it. Its a dell r720. I chose that because I already had it, and it has 512gb of ram for cpu offloading for running even bigger models very slowly. I plan on adding a second 32gb v100 at some point
 

Andrw

New Member
Sep 17, 2025
4
0
1
It's been working fine, I'm able to run fairly sizable models at reasonable speeds. Still on 6 inch 8x riser. I tried several x16 risers none work. Never figured out why. It's in a very old server maybe that has something to do with it. Its a dell r720. I chose that because I already had it, and it has 512gb of ram for cpu offloading for running even bigger models very slowly. I plan on adding a second 32gb v100 at some point
Thanks for the reply, I'll be careful with the raiser. Even without it, my board sometimes goes into x8 mode. About larger models (Inference), as I understand it, it's better to run it on the CPU if there's not enough space for the GPU. Otherwise, the PCIe (even Gen5) will be a bottleneck and be significantly slower. I haven't yet figured out why I need two cards for the Inference scenario.
 

bill4

New Member
Jul 22, 2025
4
0
1
Thanks for the reply, I'll be careful with the raiser. Even without it, my board sometimes goes into x8 mode. About larger models (Inference), as I understand it, it's better to run it on the CPU if there's not enough space for the GPU. Otherwise, the PCIe (even Gen5) will be a bottleneck and be significantly slower. I haven't yet figured out why I need two cards for the Inference scenario.
In my experience speed is still better only doing a partial cpu offload and getting as many layers on the gpu as possible. But again it's very old cpus. I'm primarily running gemma-3-27b-it-qat-q4_0 as it fits entirely in the gpu unless I need to turn the context way up for a large prompt. I've downloaded some other models but, haven't had time to mess with it in a few weeks, I've been trying to get my other ai servers up and running that do text to voice, voice to text and image generation. Just down to the text to voice.
 

Andrw

New Member
Sep 17, 2025
4
0
1
In my experience speed is still better only doing a partial cpu offload and getting as many layers on the gpu as possible. But again it's very old cpus. I'm primarily running gemma-3-27b-it-qat-q4_0 as it fits entirely in the gpu unless I need to turn the context way up for a large prompt. I've downloaded some other models but, haven't had time to mess with it in a few weeks, I've been trying to get my other ai servers up and running that do text to voice, voice to text and image generation. Just down to the text to voice.
Please share what you chose for voice-to-text implementation.
 

bill4

New Member
Jul 22, 2025
4
0
1
Please share what you chose for voice-to-text implementation.
That was the first one I got working, it's been a couple years. Its a project called subsai. It's meant to generate subtitles for videos using whisper. But once I got it working I'm not really using most of its features just using batch files to control whisper. I have not tried to connect it to my frontend yet, I'm waiting until everything is working Overall the speed and accuracy is pretty good even on the ewaste i'm running it on. It's a 13th gen celeron with a k80.
 

gsrcrxsi

Active Member
Dec 12, 2018
437
147
43
Does anyone know about the -N variants of the V100? Where they come from? Their history? Any other inform about their specs?

I have several normal SXM2 V100s that I’ve been running for a while. All have a default power limit of 300W and configurable from 150-300W. Memory clock of 877MHz.

But my friend buy one on eBay recently and it identifies itself with the device name of “Tesla V100-SXM2-16GB-N” rather than the normal name that I see on mine which is just “Tesla V100-SXM2-16GB”. His -N version was considerably slower on the same workloads and upon further inspection found that it has a default power limit of just 160W and configurable from 150-240W. And has a slower memory clock at 810MHz (roughly 8% reduction).

So does anyone have any more information about this model and how to identify it before you buy it? Or what it was originally used in? Some custom SKU from a hyperscaler or something?
 

d450d112

New Member
Jun 30, 2024
9
0
1
Does anyone know about the -N variants of the V100? Where they come from? Their history? Any other inform about their specs?

I have several normal SXM2 V100s that I’ve been running for a while. All have a default power limit of 300W and configurable from 150-300W. Memory clock of 877MHz.

But my friend buy one on eBay recently and it identifies itself with the device name of “Tesla V100-SXM2-16GB-N” rather than the normal name that I see on mine which is just “Tesla V100-SXM2-16GB”. His -N version was considerably slower on the same workloads and upon further inspection found that it has a default power limit of just 160W and configurable from 150-240W. And has a slower memory clock at 810MHz (roughly 8% reduction).

So does anyone have any more information about this model and how to identify it before you buy it? Or what it was originally used in? Some custom SKU from a hyperscaler or something?
I wonder if it means 'neutered'