SXM2 over PCIe

CyklonDX · Apr 22, 2024

Just FYI in case someone looking
Titan V has recently gone cheaps
Can be bought at around 400-500 usd on ebay.

For those interested here's of cards i had for my own research (non-sxm2 - compute fp16 oriented)

https://docs.google.com/spreadsheets/d/1Dyz8ZPLFMZ0rHPfpzf0R5wlj6ALcipxHy2hYA7RmZYw/edit?usp=sharing

gsrcrxsi · Apr 22, 2024

Yeah they’ve been around that price for quite a while. I bought a bunch of them. 15 in service now.

still trying to replace them with V100 SXM2 setups tho. Supply of the SXMV boards seems to have dried up quick.

piranha32 · Apr 22, 2024

CyklonDX said:
Just FYI in case someone looking
Titan V has recently gone cheaps
Can be bought at around 400-500 usd on ebay.

For those interested here's of cards i had for my own research (non-sxm2 - compute fp16 oriented)

https://docs.google.com/spreadsheets/d/1Dyz8ZPLFMZ0rHPfpzf0R5wlj6ALcipxHy2hYA7RmZYw/edit?usp=sharing

Could you please open access to the spreadsheet? Currently it's locked behind a request to access.

CyklonDX · Apr 22, 2024

piranha32 said:
Could you please open access to the spreadsheet? Currently it's locked behind a request to access.

sorry changed; open viewer access granted to all.

Underscore · Apr 22, 2024

CyklonDX said:
Just FYI in case someone looking
Titan V has recently gone cheaps
Can be bought at around 400-500 usd on ebay.

For those interested here's of cards i had for my own research (non-sxm2 - compute fp16 oriented)

https://docs.google.com/spreadsheets/d/1Dyz8ZPLFMZ0rHPfpzf0R5wlj6ALcipxHy2hYA7RmZYw/edit?usp=sharing

Are Titan V's even remotely worth it anymore? 12GB of Volta is notably worse than 11GB Turing—the 2080ti—even with HBM, and at that price point you could get the modded 22GB variant per @bayleyw's suggestion. Yeah no FP64 but you get that sweet INT4 instead, and you mentioned FP16 so 2080ti's about on par. Longer support and RT cores're a cool plus.

So the V100 seems to be the better option all-in-all.

CyklonDX · Apr 22, 2024

The 22G variant or RTX 5-8k are definitely better deal in terms of vram; the int8 or tensor performance is small price to pay for capacity.
In terms of performance in int8 in nv cards its almost always 4xfp32 since volta if i recall correctly - tho it doesn't scale that well in reality, Titan V only produced 3.9x fp32 when you disabled ecc on vram with ecc it was more of a 3.4x rendering it slower than 2080.

While Titan V is weak, 2080/Ti is weaker in out of the box in terms of time ~ but the difference is couple seconds at most.
If one is looking for fp16, int8 the 3080 is much better, and cheap option the performance is so much greater making both Titan V or 2080 just outdated retro-ware - nice cards to hang on the wall.

(same when you compare it to 7900xtx it just blows things out of the water as long as there's some support for amd, the language models i tested few days ago LM Studio - Discover and run local LLMs ~ i was quite surprised at performance, i ran similar models on 3080ti a year ago and it was taking like 1.2sec per response, so i was stunned since 7900xtx it was instantaneous - if i ever go into it again i may write up my results for 3080ti and 7900xtx/or and if i get 40 series or some other gpu - for now i have too many.)

bayleyw · Apr 22, 2024

Titan V/Quadro GV100 are the last fp64-capable cards with display outputs so they have some value for scientific simulations, especially if you're a researcher running commercial software that doesn't like living on the cloud. For language modeling, in rough order of viability:

3090/3090 Ti 24GB: probably all you ever need, for bs=1 inference the only faster cards are A100 and H100 which orders of magnitude more expensive. Also supports NVLINK'ed pairs for an improved training experience - get 48GB for half the price of an A6000, and faster too.
A6000 48GB: for the rich among us (or small startups). 2x the VRAM for 4x the price. Actually slower than a 3090 because it uses GDDR6, not GDDR6X. Build NVLINK'ed pairs and get 96GB for $7,000 - save seven grand over an A100 80GB!
RTX 8000 48GB: the poor man's version of the A6000, but Turing is not as well supported by frameworks as Ampere
4090: so fast it's a weapon, and also supports Transformer Engine. Thanks to our dear friend George, also supports peermem on Linux. Not worth the extra money for batch size 1 inference, but might be worth it for training because it supports fp8
2080 Ti 22GB: slow as balls, but feature rich: int8 tensor cores, NVLINK, two-slot blowers available. Not worth it for anything less than four cards, but really convenient in 4x/8x configs since you don't need to jump through hoops with risers.

CyklonDX · Apr 23, 2024

for fp64 it might be worth to look at amd with zluda; there's been plenty of development on amd/windows side to run on directML (MI100, 210 might see a new life with those)
(just last night i managed to run stable diff on 7900xtx on windows10 ~ i don't have good comparison at this time with nv cards - as its not comfyui workflow alike)

Search

SXM2 over PCIe

CyklonDX

Well-Known Member

gsrcrxsi

Active Member

piranha32

Active Member

CyklonDX

Well-Known Member

Underscore

New Member

CyklonDX

Well-Known Member

bayleyw

Active Member

CyklonDX

Well-Known Member