searching some more the community support seems to finally pick up and quickly outpace AMD
I know this was about the V620 initially but the current conclusion for _that_ hasn't changed.
All AMD lobbying/nagging I do myself is for the Radeon VII Pro / gfx906
I'd also include
this reddit thread. It's the same guy who wrote the P40 kernels for flash attention back in the day. While it may not be as fast as vLLM, I personally prefer
llama.cpp because of wider GGUF model availability and quant support, and ease of running/switching models. Coupled with
llama-swap, it is a pretty powerful combo for loading or switching between multiple models.
A caveat with llama.cpp on gfx906 is that -sm row still doesn't work for most models, but I don't mind because I use them to load larger MoE models anyways.
My workhorse is a dual 2667v4 which can theoretically power two MI50 though I would stick with one. Price wise it's all in favor of the MI50 especially if the ecosystem now picks up and it's no longer such a brain cancer trigger to make it work with ROCm (at least the debian builds have it enabled in ROCm 7, though all warnings apply, after all AMD still isn't putting it back on "working but unsupported" status.
If you have ~350W to spare on your PSU, you can install two of them comfortably. I limit mine to 170W and haven't noticed any degredation in performance. They idle at 16-22W, whether there's a model loaded or not.
Setting up ROCm was a 15 minute affair on my first system, and that was before all the tutorials. The issue with the TensorFiles was solved with a google search and opening the first result. The rest was following AMD's own instructions.
I feel I should add a disclaimer, or warning: you should really think about if you want to use something where the vendor does everything in their power to fight you being able to use it. I know they're improving but - so far - the actions are not yet making the difference that would make the warning obsolete. I know if I had gone with a much, much more expensive NVidia I would have had more time to do something,
AMD's support has always been less than stellar. Look at the driver fiasco over the weekend with RDNA1/RDNA2. But in the case of the Mi50, they work in Linux out of the box without doing anything.
At the end of the day, it's a tradeoff. In the current market, if you want support, you need to spend several thousands per card because we're in an AI bubble. Heck, ECC DDR4 is becoming unobtaainium now.
I personally think the Mi50s are great pieces of kit for the price (I got mine for 135 a piece). They offer the same VRAM of the 5090 at 4090 VRAM bandwidth. Compute is better than the P40 at FP32, and much much much much better than the P40 at FP16. There's nothing in the market that comes even close.
I built a 192GB VRAM rig for ~€1.6k.
and if I would go with some utterly unsupported Tenstorrent I would at least learn everything from the basics upwards.
That's not a fair comparison. The Mi50 supports Vulkan on linux out of the box, AFAIK. So, it was always functional, if not optimized for.
As much as I'm rooting for Jim Keller, Tenstorrent requires you to write your own inference code. If you can do that, I'm fairly sure you'll find a job at Tenstorrent themselves.