GPU recommendation for mass text processing, summarizing and data analysis, serving API requests etc

nutsnax · Mar 25, 2024

I'm pretty new to AI and just started dabbling with LM Studio, Ollama, API's etc. I think I'm ready to start more complex things and was looking for GPU recommendations as well as model recommendations. I plan on doing large amounts of text processing as well as data analysis (large CSV and JSON data).

I'm looking for a GPU that can do this and is reasonably priced. If I don't have to spend $500 I'd rather not, but if I do I'd rather get something that will definitely do the job.

low-end that I was considering:
-Tesla P40
-Radeon MI50 or MI25
-Radeon v340

on the high-end of my budget:
-Radeon MI60
-multiple of the low-end cards

bayleyw · Mar 26, 2024

under $500: 2080 Ti 22G ($430)
best deal: used 3090 (about $750)

you want a relatively recent nvidia card with tensor cores, ideally ampere or newer but for now turing also works. you also need enough vram to hold the model, so about 5 bits per parameter, plus the KV cache which can vary depending on the framework but I've heard 1MB/token for 7B models. if you are doing unstructured text processing you will have a lot of input tokens, so the problem will be compute bound and the KV cache will be fairly large.

you can get multiple cards to hold larger models, but outside of MLC-LLM none of the other consumer frameworks give you a speedup on more than one card. TGI, vLLM, and TensorRT-LLM do, but they are challenging to set up and (I think) don't support q4, only q8.

nutsnax · Mar 27, 2024

bayleyw said:
under $500: 2080 Ti 22G ($430)
best deal: used 3090 (about $750)

you want a relatively recent nvidia card with tensor cores, ideally ampere or newer but for now turing also works. you also need enough vram to hold the model, so about 5 bits per parameter, plus the KV cache which can vary depending on the framework but I've heard 1MB/token for 7B models. if you are doing unstructured text processing you will have a lot of input tokens, so the problem will be compute bound and the KV cache will be fairly large.

you can get multiple cards to hold larger models, but outside of MLC-LLM none of the other consumer frameworks give you a speedup on more than one card. TGI, vLLM, and TensorRT-LLM do, but they are challenging to set up and (I think) don't support q4, only q8.

Darn, I wish I'd seen this sooner - I played around with my laptop GPU (Radeon 6800m with 12GB RAM) and AMDGPU/ROCM seemed to work OK with a 7b Q4_K_S model. So I ordered a Radeon MI60 because it seems to be compatible and has 32GB HBM2 and it was right in line with the higher end of my budget for this.

I don't even see any 2080 22GB cards on ebay (maybe I'm missing something) - I imagine these things get snatched up as soon as they are available. I don't need (or want) Windows so I'm fine running AMD hardware and will be running this in a server anyway with a bunch of EPYC hardware and an Asrock ROME2d16.

I'm trying to get the initial text processing out of the way and then eventually want to begin learning to train, maybe feeding it more recent data etc. So much to learn here.

T_Minus · Mar 27, 2024

They are on ebay, there is a seller who mods and sells them. They sell via ebay they have them listed currently.

Why not save up a bit longer though and just get a 3090 TI? Unless those have changed in price drastically (increase) that I missed.

nutsnax · Mar 27, 2024

T_Minus said:
They are on ebay, there is a seller who mods and sells them. They sell via ebay they have them listed currently.

Why not save up a bit longer though and just get a 3090 TI? Unless those have changed in price drastically (increase) that I missed.

I'd rather not get a card that has been mined on, so it's likely that whatever inexpensive and used 3090 that I find would be used.... and still expensive

New ones are super expensive it seems.

I'll see what I can do with this MI60. 32GB HBM2 on a glorified vega will probably do what I need it to (I think?)

If it ends up being garbage, I can try to sell it, bite the bullet and buy the new card.

bayleyw · Mar 27, 2024

the 22G 2080 Ti are on aliexpress, they are hard to find on ebay (there is one vendor who claims to be in palo alto on ebay, but he is frequently OOS).

I shouldn't be so hard on amd, after all we need competition in the industry

and amd is getting better at supporting compute use cases on their hardware. the problem at this point is amd will blatantly ignore bugs and refuse to acknowledge their gpus exist. for example, optimum-amd *only supports Aldebaran* - forget about consumer parts, it doesn't even support MI100 or the W7900. that's kind of bad, because you don't know whether you'll get good performance on MI100 or whether the W7900 tensor cores will accelerate the libraries.

mining is...probably less intensive than datacenter use since mining only stresses the memory and the cards are usually underclocked to save power.

nutsnax · Mar 27, 2024

bayleyw said:
the 22G 2080 Ti are on aliexpress, they are hard to find on ebay (there is one vendor who claims to be in palo alto on ebay, but he is frequently OOS).

I shouldn't be so hard on amd, after all we need competition in the industry and amd is getting better at supporting compute use cases on their hardware. the problem at this point is amd will blatantly ignore bugs and refuse to acknowledge their gpus exist. for example, optimum-amd *only supports Aldebaran* - forget about consumer parts, it doesn't even support MI100 or the W7900. that's kind of bad, because you don't know whether you'll get good performance on MI100 or whether the W7900 tensor cores will accelerate the libraries.

mining is...probably less intensive than datacenter use since mining only stresses the memory and the cards are usually underclocked to save power.

Thanks that was helpful. I didn't even think about aliexpress. If this MI60 doesn't work out I'll look to get a 2080ti. The MI60 looks to still be GCN so I hope it's still a viable card.

Totally agree on competition, it seems like NVIDIA absolutely owns the AI space which is disappointing.

Search

GPU recommendation for mass text processing, summarizing and data analysis, serving API requests etc

nutsnax

Active Member

bayleyw

Active Member

nutsnax

Active Member

T_Minus

Build. Break. Fix. Repeat

nutsnax

Active Member

bayleyw

Active Member

nutsnax

Active Member