SXM2 over PCIe

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

CyklonDX

Well-Known Member
Nov 8, 2022
1,313
459
83
Just FYI in case someone looking
Titan V has recently gone cheaps
Can be bought at around 400-500 usd on ebay.

For those interested here's of cards i had for my own research (non-sxm2 - compute fp16 oriented)
 
Last edited:
  • Like
Reactions: piranha32

gsrcrxsi

Active Member
Dec 12, 2018
371
121
43
Yeah they’ve been around that price for quite a while. I bought a bunch of them. 15 in service now.

still trying to replace them with V100 SXM2 setups tho. Supply of the SXMV boards seems to have dried up quick.
 
  • Like
Reactions: CyklonDX

piranha32

Well-Known Member
Mar 4, 2023
311
261
63
Just FYI in case someone looking
Titan V has recently gone cheaps
Can be bought at around 400-500 usd on ebay.

For those interested here's of cards i had for my own research (non-sxm2 - compute fp16 oriented)
Could you please open access to the spreadsheet? Currently it's locked behind a request to access.
 

Underscore

New Member
Oct 21, 2023
6
0
1
Just FYI in case someone looking
Titan V has recently gone cheaps
Can be bought at around 400-500 usd on ebay.

For those interested here's of cards i had for my own research (non-sxm2 - compute fp16 oriented)
Are Titan V's even remotely worth it anymore? 12GB of Volta is notably worse than 11GB Turing—the 2080ti—even with HBM, and at that price point you could get the modded 22GB variant per @bayleyw's suggestion. Yeah no FP64 but you get that sweet INT4 instead, and you mentioned FP16 so 2080ti's about on par. Longer support and RT cores're a cool plus.

So the V100 seems to be the better option all-in-all.
 
Last edited:

CyklonDX

Well-Known Member
Nov 8, 2022
1,313
459
83
The 22G variant or RTX 5-8k are definitely better deal in terms of vram; the int8 or tensor performance is small price to pay for capacity.
In terms of performance in int8 in nv cards its almost always 4xfp32 since volta if i recall correctly - tho it doesn't scale that well in reality, Titan V only produced 3.9x fp32 when you disabled ecc on vram with ecc it was more of a 3.4x rendering it slower than 2080.

While Titan V is weak, 2080/Ti is weaker in out of the box in terms of time ~ but the difference is couple seconds at most.
If one is looking for fp16, int8 the 3080 is much better, and cheap option the performance is so much greater making both Titan V or 2080 just outdated retro-ware - nice cards to hang on the wall.

(same when you compare it to 7900xtx it just blows things out of the water as long as there's some support for amd, the language models i tested few days ago LM Studio - Discover and run local LLMs ~ i was quite surprised at performance, i ran similar models on 3080ti a year ago and it was taking like 1.2sec per response, so i was stunned since 7900xtx it was instantaneous - if i ever go into it again i may write up my results for 3080ti and 7900xtx/or and if i get 40 series or some other gpu - for now i have too many.)
 
Last edited:

bayleyw

Active Member
Jan 8, 2014
332
108
43
Titan V/Quadro GV100 are the last fp64-capable cards with display outputs so they have some value for scientific simulations, especially if you're a researcher running commercial software that doesn't like living on the cloud. For language modeling, in rough order of viability:

  • 3090/3090 Ti 24GB: probably all you ever need, for bs=1 inference the only faster cards are A100 and H100 which orders of magnitude more expensive. Also supports NVLINK'ed pairs for an improved training experience - get 48GB for half the price of an A6000, and faster too.
  • A6000 48GB: for the rich among us (or small startups). 2x the VRAM for 4x the price. Actually slower than a 3090 because it uses GDDR6, not GDDR6X. Build NVLINK'ed pairs and get 96GB for $7,000 - save seven grand over an A100 80GB!
  • RTX 8000 48GB: the poor man's version of the A6000, but Turing is not as well supported by frameworks as Ampere
  • 4090: so fast it's a weapon, and also supports Transformer Engine. Thanks to our dear friend George, also supports peermem on Linux. Not worth the extra money for batch size 1 inference, but might be worth it for training because it supports fp8
  • 2080 Ti 22GB: slow as balls, but feature rich: int8 tensor cores, NVLINK, two-slot blowers available. Not worth it for anything less than four cards, but really convenient in 4x/8x configs since you don't need to jump through hoops with risers.
 
  • Like
Reactions: piranha32

CyklonDX

Well-Known Member
Nov 8, 2022
1,313
459
83
for fp64 it might be worth to look at amd with zluda; there's been plenty of development on amd/windows side to run on directML (MI100, 210 might see a new life with those)
(just last night i managed to run stable diff on 7900xtx on windows10 ~ i don't have good comparison at this time with nv cards - as its not comfyui workflow alike)
 
Last edited:
  • Like
Reactions: piranha32

gsrcrxsi

Active Member
Dec 12, 2018
371
121
43
very nice. I bought a set of waterblocks with mine also, but never got around to installing them yet. The 3U air coolers keep things cool enough for me for now.

where'd you get the board from? I've been negotiating with the Chinese sellers and pretty much no one is willing to sell just the board anymore, at least for a reasonable price, and only as a whole package deal with case/fans/PSU/board/GPUs/2U-heatsink/PCIe cables. which is a fine setup, and honestly not terribly priced, but you could piece it together cheaper a la carte if you don't care about the case. I also feel less comfortable shipping something that big and heavy from china (much more expensive, higher risk of damage, no recourse for shipping damage claims)

I’m in the process of buying a second board from a private seller however. But I still would like to buy 2-3 more after that.
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
371
121
43
Early adopter then lol.

I like the setups. It would have been cooler if we could use the onboard oculink connectors rather than PCIe adapters, but I’m glad some folks finally made them.
 
  • Like
Reactions: MildHotSauce

MildHotSauce

New Member
Mar 7, 2023
9
3
3

gsrcrxsi

Active Member
Dec 12, 2018
371
121
43
The cool thing is that it only takes a total of 3 open PCIE slots to run this. Two for the PCIE SC conversion cables and one for the Oculink card.

I used this one

PCIe x16 Gen4 with ReDriver to OCulink 8i Dual Port Add in Card | eBay

Then you have 4 GPUs with NVLink
looking at the specs, it’s not clear to me that you need the oculink outputs at all unless you’re trying to access the GPUs directly over the network. Are you doing this? Is there something you’re using it for that doesn’t work without the oculink connection? I’m not using them at all. Only the PCIe slot connectors.
 

MildHotSauce

New Member
Mar 7, 2023
9
3
3
looking at the specs, it’s not clear to me that you need the oculink outputs at all unless you’re trying to access the GPUs directly over the network. Are you doing this? Is there something you’re using it for that doesn’t work without the oculink connection? I’m not using them at all. Only the PCIe slot connectors.
Weird, yeah it wasn't working unless I plugged two of the four oculink connectors in.
 

gsrcrxsi

Active Member
Dec 12, 2018
371
121
43
Weird, yeah it wasn't working unless I plugged two of the four oculink connectors in.
what's your primary use-case? what application(s)?
was it that the whole GPU array wasn't recognized without your oculink connections? or only that nvlink didnt work?

me personally, I only use them as individual GPUs, so each of the 4 GPUs runs a separate instance of an application. but best I can tell nvlink is still working fine from the few commands to check it like the nvlink bandwidth tests and whatnot.
 

MildHotSauce

New Member
Mar 7, 2023
9
3
3
what's your primary use-case? what application(s)?
was it that the whole GPU array wasn't recognized without your oculink connections? or only that nvlink didnt work?

me personally, I only use them as individual GPUs, so each of the 4 GPUs runs a separate instance of an application. but best I can tell nvlink is still working fine from the few commands to check it like the nvlink bandwidth tests and whatnot.
It wasn't recognized at all. Now that I think of it, it might of been when I was messing around with a Supermicro riser on just one end that it wasn't recognized without one set of oculink cables in, this was before I got the special cables for both ends.
 

SirSkitzo

New Member
Jul 2, 2024
1
0
1
What are the odds of getting a similar setup working with this guy? Would love some guidance on what to look for in these enterprise SXM2 standalone boards. Sounds like straight to CPU PCIe x16 with occulink adapter card(s)/cables and the SXM2 GPU? How do I make sure it's not tied to the OEM like OP mentioned?

Thanks to anyone that can help me out and for making this thread in general!

Dell HRGXG
 

Attachments