AMD v620 Shroud/Fans options

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

cesarinpillin

New Member
Sep 4, 2025
20
0
1
To resume: Bought a V620, looking for cooling solutions.
Details: I'm on Vancouver, BC, Canada.

As per my row of comments here: https://forums.servethehome.com/index.php?threads/amd-radeon-pro-v620-32gb-gddr6-gpus-565.47945

Been trying to find a proper fan/blower for it.
Most people seem to use the classic adapter+snail blower at the end of the card.
Others recommend a custom 3d printer shroud.


Problems: AliExpress V620 snail cooler blowers(which look almost identical to Mi50 ones) cost approximately 40% more than the MI50 ones and they want to slap shipping on top (the mi50's are free shipping).
No vendor knows if there is compatibility ( I assume they just print the 3d models someone else created for the Mi50 and call it a day after adding the blower).
Note that I lack of 3d printer and the local stores have not answered my quote requests.

So.. questions:

  1. Are the rear (where the power connectors are) screws similar between the Mi50 and the V620? (from a visible standpoint they seem very similar but then I do not own a mi50).
  2. Would it be better to use a custom full shroud than a snail blower?
  3. My budget was kind of blown right now. So I want to keep costs low. Or should I just bite the bullet and buy the overpriced V620 versions thru AliExpress?
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,820
659
113
Most server gpu's have screws in same positions (amd and nv) thus all 3d printed snail blowers will work on all datacenter gpu's as they are shipped for server/workstation standard. *only differences in most nv cards you don't have the top screw - you only have 3 on the bottom. Size wise they are also the same.

The mentioned full shroud replacement will likely need the 3d model adjusted.
Most of those snail shrouds, and full shrouds do not include fan - keep that in mind. You will have to spend more.


Alternative option is to buy high pressure fan, and mount it outside in pull configuration like supermicro workstation do. (its also what i use for passive cards.)
Вентилятор SuperMicro MCP-320-74702-0N-KIT GPU Kit for passive GPU support  с доставкой по РФ. Оптовая цена от официального IT Дистрибьютора - 3Logic  Group



2. What is better?
custom full shroud gives you more space in your case *depending obviously what solution you choose.
all solutions will most likely be loud.
 

custom90gt

Active Member
Nov 17, 2016
358
135
43
41
Sorry what about the $20 ones on ebay? I don't know if they work well or not, but 20 seems reasonable.

My local library offers access to a 3d printer, any chance yours does? If you had a design already made, I could also print you one and snail mail it for the price of shipping (I have my own printer).
 

cesarinpillin

New Member
Sep 4, 2025
20
0
1
Sorry what about the $20 ones on ebay? I don't know if they work well or not, but 20 seems reasonable.

My local library offers access to a 3d printer, any chance yours does? If you had a design already made, I could also print you one and snail mail it for the price of shipping (I have my own printer).
Im checking one of them, but so far all of them are 30+ USD + 20 or more on shipping.


So far I noticed 4 options:

  1. Full shroud replacement for large fan.
  2. Small shroud replacement with a blower fan. <-- only seen compatible with MI50
  3. a blower fan at the end using an adapter (snail style)
  4. and the idea Cyklon says.. a high pressure fan on the outside.
2 and 4 seem very loud. Im looking for #1 that is compatible with V620.. and #3 is very expensive for V620's.


Also thanks for the offer. If I run out of options I might take your offer.
 

cesarinpillin

New Member
Sep 4, 2025
20
0
1
Jesus, finally got a response from one of the local 3d printing companies.. They want an insane 100 CAD for a full shroud with no adjusts and 60 CAD for the snail fan style lol.
 

cesarinpillin

New Member
Sep 4, 2025
20
0
1
Hah that's insane, well actually genius because it's like $1 worth of material. I can't find any V620 fan shrouds to print, but I do see the Mi50 ones if you can use them? This one is sweet AMD Instinct MI50 shroud by Bit Matter | Download free STL model | Printables.com
Pretty sure the MI50 shroud is incompatible. The snail style rear fan might be. I mean the size is the same but the difference is the screws I believe.
I assume it might be usable if you use a wider hole on the horizontal space.

Anyway I did bit the bullet and I ended buying one from AliExpress supposedly for MI620.
Well see how it goes.
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,820
659
113
if holes are larger just use washers; if holes are too small just screw it through the plastic.
 

cesarinpillin

New Member
Sep 4, 2025
20
0
1
I ended buying a 3d printer and print it myself after so many hilarious issues.

The bought piece from Aliexpress came defective for example.. and printing locally was a fortune.

Thanks everyone!

Now I wonder if there is a way to restore the 4 pin fan header to control the fan properly.
 

JMN57

New Member
Apr 26, 2026
2
0
1
I ended buying a 3d printer and print it myself after so many hilarious issues.

The bought piece from Aliexpress came defective for example.. and printing locally was a fortune.

Thanks everyone!

Now I wonder if there is a way to restore the 4 pin fan header to control the fan properly.
Couple of questions.
Are you running a V620 with a fan shroud?
Does it keep it cool enough?
How loud is it?
 

cesarinpillin

New Member
Sep 4, 2025
20
0
1
Couple of questions.
Are you running a V620 with a fan shroud?
Does it keep it cool enough?
How loud is it?
Not a shroud, ended suing an snail blower style (laptop version, sawed off) and 3d printed the adapter myself..

And it is loud on anything above 60% fan speed.

But temps stay lower than 60C at 100% comfyui style use with 80% vram usage with a 25C ambient temp.
 

Attachments

JMN57

New Member
Apr 26, 2026
2
0
1
Not a shroud, ended suing an snail blower style (laptop version, sawed off) and 3d printed the adapter myself..

And it is loud on anything above 60% fan speed.

But temps stay lower than 60C at 100% comfyui style use with 80% vram usage with a 25C ambient temp.
Are you using it for LLMs? If you are, do you mind sharing how the V620 is performing for you? I'm debating getting one for an external GPU for my LLM server - thanks!
 

cesarinpillin

New Member
Sep 4, 2025
20
0
1
Are you using it for LLMs? If you are, do you mind sharing how the V620 is performing for you? I'm debating getting one for an external GPU for my LLM server - thanks!
I'm using it for llama.cpp and comfyui mostly.

What tests are you looking for?
 

mike622

New Member
Dec 17, 2019
3
1
3
Not to answer for the OP, but I've had a Radeon Pro V260 for quite a while now and it mostly collects dust. I hook it up from time to time for testing purposes, but that is about it.

The first hurdle is cooling. It's a datacenter card made to be used in a chassis with lots of front-to-back airflow. I've used different cooling solutions until ultimately settling on what looks like the same 3D printed fan shroud used by @cesarinpillin. It's the most effective at cooling the card while being the least obnoxious, but it is still loud under full load. The blower style fan makes a less annoying sound than any of the square fans I've used which had adequate CFM needed to cool the card (ranging from 40mm to 92mm). The plans I used are from AMD Instinct MI50 blower fan adapter V2 by MagentL and the fan is a EFH-08E12W-JP01 sourced from Aliexpress. The form factor of this shroud is also better for leaving room to either side of the card, while only being a few milimeters longer than the fan shroud I used for a 92mm square fan.

Second hurdle is getting the amdgpu driver to actually recognize the card. Depending on the motherboard used I've had to enable a combination of driver options on the kernel command line in order for it to work correctly. I want to say I had at least one Supermicro motherboard that I never could get the GPU to work on, but that may have been prior to me uncovering the driver options needed.

Depending on what you want to do it may suffice. Token generation speeds are decent. Infill/Prompt processing speeds are nowhere near even an entry level RTX 50xx card. This makes a difference if you're using this for anything with large contexts (agentic coding, for example). I've included some simple inference benchmarks at the end of this post if you're interested. I've also found it to be much slower at image generation than CUDA cards using ComfyUI, but I don't have comparison numbers for that at the moment.

These are a (relatively) inexpensive way to get access to decent amounts of VRAM. I don't regret buying it from a tinkering perspective. From an actual daily use perspective it just isn't worth it to me. Plenty of people still use MI50/60s, so maybe I'm in the minority. The V620 will be faster at prompt processing than the MI50/60, but a little slower at token generation (faster compute, slower memory).

llama-bench benchmarks for a small dense model and a small MoE model:


Radeon Pro V620
Vulkan


Code:
llama-2-7b.Q4_0.gguf

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro V620 (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           pp512 |       1958.75 ± 1.71 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp2048 |       1462.38 ± 1.15 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp8192 |        966.03 ± 1.97 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |         pp16384 |        636.17 ± 0.31 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           tg128 |         98.83 ± 0.54 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           pp512 |       1929.11 ± 0.58 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp2048 |       1756.91 ± 0.70 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp8192 |       1122.17 ± 1.64 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |         pp16384 |        688.10 ± 1.83 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           tg128 |        103.08 ± 0.37 |

build: 9725a313b (8931)


Qwen3.6-35B-A3B-Q4_K_M.gguf

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro V620 (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           pp512 |     1440.56 ± 110.32 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp2048 |       1648.30 ± 8.34 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp8192 |      1510.54 ± 11.60 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           tg128 |        106.34 ± 0.03 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           pp512 |      1660.98 ± 12.82 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp2048 |       1617.02 ± 8.97 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp8192 |       1349.95 ± 4.90 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           tg128 |        105.53 ± 0.07 |

build: 9725a313b (8931)

ROCm

Code:
llama-2-7b.Q4_0.gguf

ggml_cuda_init: found 1 ROCm devices (Total VRAM: 32752 MiB):
  Device 0: AMD Radeon Pro V620, gfx1030 (0x1030), VMM: no, Wave Size: 32, VRAM: 32752 MiB
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |           pp512 |      1742.70 ± 38.29 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |          pp2048 |       1057.40 ± 2.04 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |          pp8192 |        504.65 ± 0.77 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |         pp16384 |        307.70 ± 0.38 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |           tg128 |         90.30 ± 1.10 |

build: 9725a313b (8931)

(At some point with ROCm 7.x, using flash attention with ROCm and llama.cpp stopped working.  I never had a need to troubleshoot any further.)

Qwen3.6-35B-A3B-Q4_K_M.gguf

llama.ccp crashed when testing this model using the ROCm backend, requiring a reboot.  It crashed again after a reboot, despite the other tests working fine.


5060 Ti 16GB
Vulkan



Code:
llama-2-7b.Q4_0.gguf (Single GPU)

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5060 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           pp512 |       3408.82 ± 4.05 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp2048 |       2902.69 ± 1.49 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp8192 |       1771.24 ± 0.33 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |         pp16384 |       1147.12 ± 0.07 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           tg128 |         94.15 ± 0.02 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           pp512 |       3867.34 ± 2.78 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp2048 |       3765.72 ± 0.60 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp8192 |       3330.34 ± 0.50 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |         pp16384 |       2876.08 ± 1.03 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           tg128 |         98.35 ± 0.16 |

build: 9725a313b (8931)


Qwen3.6-35B-A3B-Q4_K_M.gguf (Two GPUs)

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5060 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = NVIDIA GeForce RTX 5060 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           pp512 |      2640.57 ± 26.06 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp2048 |      3514.45 ± 23.01 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp8192 |      3200.99 ± 10.07 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           tg128 |        109.04 ± 0.16 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           pp512 |      2619.15 ± 32.63 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp2048 |      3615.66 ± 17.72 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp8192 |       3483.38 ± 5.47 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           tg128 |        109.40 ± 0.51 |

build: 9725a313b (8931)

CUDA


Code:
llama-2-7b.Q4_0.gguf (Single GPU)

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15849 MiB):
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |           pp512 |     3843.67 ± 115.86 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |          pp2048 |       3150.07 ± 4.28 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |          pp8192 |       1987.37 ± 0.28 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |         pp16384 |       1262.31 ± 0.24 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |           tg128 |         96.82 ± 0.12 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |           pp512 |      4504.33 ± 81.38 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |          pp2048 |       4153.88 ± 1.21 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |          pp8192 |       3028.83 ± 0.16 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |         pp16384 |       2264.26 ± 0.37 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |           tg128 |         99.74 ± 0.12 |

build: 9725a313b (8931)


Qwen3.6-35B-A3B-Q4_K_M.gguf (Two GPUs)

ggml_cuda_init: found 2 CUDA devices (Total VRAM: 31699 MiB):
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB
  Device 1: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |           pp512 |      2520.92 ± 15.17 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |          pp2048 |      3357.68 ± 11.58 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |          pp8192 |       3381.27 ± 8.51 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |           tg128 |        107.09 ± 0.38 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |           pp512 |      2530.95 ± 10.20 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |          pp2048 |      3494.11 ± 15.07 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |          pp8192 |       3772.53 ± 2.35 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |           tg128 |        114.71 ± 0.19 |

build: 9725a313b (8931)
 
  • Like
Reactions: CyklonDX

cesarinpillin

New Member
Sep 4, 2025
20
0
1
Not to answer for the OP, but I've had a Radeon Pro V260 for quite a while now and it mostly collects dust. I hook it up from time to time for testing purposes, but that is about it.

The first hurdle is cooling. It's a datacenter card made to be used in a chassis with lots of front-to-back airflow. I've used different cooling solutions until ultimately settling on what looks like the same 3D printed fan shroud used by @cesarinpillin. It's the most effective at cooling the card while being the least obnoxious, but it is still loud under full load. The blower style fan makes a less annoying sound than any of the square fans I've used which had adequate CFM needed to cool the card (ranging from 40mm to 92mm). The plans I used are from AMD Instinct MI50 blower fan adapter V2 by MagentL and the fan is a EFH-08E12W-JP01 sourced from Aliexpress. The form factor of this shroud is also better for leaving room to either side of the card, while only being a few milimeters longer than the fan shroud I used for a 92mm square fan.

Second hurdle is getting the amdgpu driver to actually recognize the card. Depending on the motherboard used I've had to enable a combination of driver options on the kernel command line in order for it to work correctly. I want to say I had at least one Supermicro motherboard that I never could get the GPU to work on, but that may have been prior to me uncovering the driver options needed.

Depending on what you want to do it may suffice. Token generation speeds are decent. Infill/Prompt processing speeds are nowhere near even an entry level RTX 50xx card. This makes a difference if you're using this for anything with large contexts (agentic coding, for example). I've included some simple inference benchmarks at the end of this post if you're interested. I've also found it to be much slower at image generation than CUDA cards using ComfyUI, but I don't have comparison numbers for that at the moment.

These are a (relatively) inexpensive way to get access to decent amounts of VRAM. I don't regret buying it from a tinkering perspective. From an actual daily use perspective it just isn't worth it to me. Plenty of people still use MI50/60s, so maybe I'm in the minority. The V620 will be faster at prompt processing than the MI50/60, but a little slower at token generation (faster compute, slower memory).

llama-bench benchmarks for a small dense model and a small MoE model:


Radeon Pro V620
Vulkan


Code:
llama-2-7b.Q4_0.gguf

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro V620 (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           pp512 |       1958.75 ± 1.71 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp2048 |       1462.38 ± 1.15 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp8192 |        966.03 ± 1.97 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |         pp16384 |        636.17 ± 0.31 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           tg128 |         98.83 ± 0.54 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           pp512 |       1929.11 ± 0.58 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp2048 |       1756.91 ± 0.70 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp8192 |       1122.17 ± 1.64 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |         pp16384 |        688.10 ± 1.83 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           tg128 |        103.08 ± 0.37 |

build: 9725a313b (8931)


Qwen3.6-35B-A3B-Q4_K_M.gguf

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro V620 (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           pp512 |     1440.56 ± 110.32 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp2048 |       1648.30 ± 8.34 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp8192 |      1510.54 ± 11.60 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           tg128 |        106.34 ± 0.03 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           pp512 |      1660.98 ± 12.82 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp2048 |       1617.02 ± 8.97 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp8192 |       1349.95 ± 4.90 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           tg128 |        105.53 ± 0.07 |

build: 9725a313b (8931)

ROCm

Code:
llama-2-7b.Q4_0.gguf

ggml_cuda_init: found 1 ROCm devices (Total VRAM: 32752 MiB):
  Device 0: AMD Radeon Pro V620, gfx1030 (0x1030), VMM: no, Wave Size: 32, VRAM: 32752 MiB
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |           pp512 |      1742.70 ± 38.29 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |          pp2048 |       1057.40 ± 2.04 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |          pp8192 |        504.65 ± 0.77 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |         pp16384 |        307.70 ± 0.38 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | ROCm       | 999 |  0 |           tg128 |         90.30 ± 1.10 |

build: 9725a313b (8931)

(At some point with ROCm 7.x, using flash attention with ROCm and llama.cpp stopped working.  I never had a need to troubleshoot any further.)

Qwen3.6-35B-A3B-Q4_K_M.gguf

llama.ccp crashed when testing this model using the ROCm backend, requiring a reboot.  It crashed again after a reboot, despite the other tests working fine.


5060 Ti 16GB
Vulkan



Code:
llama-2-7b.Q4_0.gguf (Single GPU)

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5060 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           pp512 |       3408.82 ± 4.05 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp2048 |       2902.69 ± 1.49 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |          pp8192 |       1771.24 ± 0.33 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |         pp16384 |       1147.12 ± 0.07 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  0 |           tg128 |         94.15 ± 0.02 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           pp512 |       3867.34 ± 2.78 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp2048 |       3765.72 ± 0.60 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |          pp8192 |       3330.34 ± 0.50 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |         pp16384 |       2876.08 ± 1.03 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     | 999 |  1 |           tg128 |         98.35 ± 0.16 |

build: 9725a313b (8931)


Qwen3.6-35B-A3B-Q4_K_M.gguf (Two GPUs)

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5060 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = NVIDIA GeForce RTX 5060 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           pp512 |      2640.57 ± 26.06 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp2048 |      3514.45 ± 23.01 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |          pp8192 |      3200.99 ± 10.07 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  0 |           tg128 |        109.04 ± 0.16 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           pp512 |      2619.15 ± 32.63 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp2048 |      3615.66 ± 17.72 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |          pp8192 |       3483.38 ± 5.47 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | Vulkan     | 999 |  1 |           tg128 |        109.40 ± 0.51 |

build: 9725a313b (8931)

CUDA


Code:
llama-2-7b.Q4_0.gguf (Single GPU)

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15849 MiB):
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |           pp512 |     3843.67 ± 115.86 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |          pp2048 |       3150.07 ± 4.28 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |          pp8192 |       1987.37 ± 0.28 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |         pp16384 |       1262.31 ± 0.24 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  0 |           tg128 |         96.82 ± 0.12 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |           pp512 |      4504.33 ± 81.38 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |          pp2048 |       4153.88 ± 1.21 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |          pp8192 |       3028.83 ± 0.16 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |         pp16384 |       2264.26 ± 0.37 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CUDA       | 999 |  1 |           tg128 |         99.74 ± 0.12 |

build: 9725a313b (8931)


Qwen3.6-35B-A3B-Q4_K_M.gguf (Two GPUs)

ggml_cuda_init: found 2 CUDA devices (Total VRAM: 31699 MiB):
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB
  Device 1: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |           pp512 |      2520.92 ± 15.17 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |          pp2048 |      3357.68 ± 11.58 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |          pp8192 |       3381.27 ± 8.51 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  0 |           tg128 |        107.09 ± 0.38 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |           pp512 |      2530.95 ± 10.20 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |          pp2048 |      3494.11 ± 15.07 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |          pp8192 |       3772.53 ± 2.35 |
| qwen35moe 35B.A3B Q4_K - Medium |  19.91 GiB |    34.66 B | CUDA       | 999 |  1 |           tg128 |        114.71 ± 0.19 |

build: 9725a313b (8931)

Kinda strange, all the OSes I tried immediately picked up the V620, specially after installing the AMD drivers and ROCM.
All sensors came up too.

How long had you tested?
 

mike622

New Member
Dec 17, 2019
3
1
3
Running Fedora Server 42, 43, and Rawhide on a few different Asrock and Supermicro motherboards I've just gotten in the habit of adding the following to the kernel boot options:

pci=realloc=off amdgpu.gpu_recovery=1 amdgpu.mcbp=0

It's possible other distributions set those options out of the box, or maybe they aren't needed for your hardware.

I'm not sure what you mean by "How long had you tested?" The length of testing would not have any bearing on the card being recognized. At least on a few of my motherboards without the correct driver options amdgpu wouldn't load correctly and the GPU wouldn't be recognized by tools such as rocm-smi, vulkaninfo, etc.
 

cesarinpillin

New Member
Sep 4, 2025
20
0
1
Running Fedora Server 42, 43, and Rawhide on a few different Asrock and Supermicro motherboards I've just gotten in the habit of adding the following to the kernel boot options:

pci=realloc=off amdgpu.gpu_recovery=1 amdgpu.mcbp=0
Yeah I used similar flags but then I used Ubuntu and Proxmox.

It's possible other distributions set those options out of the box, or maybe they aren't needed for your hardware.
Most likely the cause?

I'm not sure what you mean by "How long had you tested?" The length of testing would not have any bearing on the card being recognized. At least on a few of my motherboards without the correct driver options amdgpu wouldn't load correctly and the GPU wouldn't be recognized by tools such as rocm-smi, vulkaninfo, etc.
I mean as in how long has been since you tested your video card, of course "testing" as in performing specifically testing is irrelevant.
I mean Testing as in owning the device and using it under various configurations.
 

mike622

New Member
Dec 17, 2019
3
1
3
I've had the V620 for just over a year. I used it regularly for a few months when I first got it, then I replaced it. Now it only comes out if I need some extra VRAM (rarely the case these days) or if I want to see how Vulkan performance is progressing.

A lot has changed from then to now. The Vulkan backend in llama.cpp performs considerably better now. Local models have made massive strides.

I'm not trying to crap on the choice of a V620 Pro. In fact, I think it's an overlooked option for a lot of people. It just has limitations.
 

akapug

New Member
May 20, 2026
1
0
1
> The V620 will be faster at prompt processing than the MI50/60, but a little slower at token generation (faster compute, slower memory).

has anyone tried one of each in the same system, pushing pp to the v620 and tg to the mi60? i have an epyc supermicro board sitting around, and it might be fun to try to have claude optimize the relationship between the two?
 

monala

New Member
Jun 3, 2026
1
0
1
For your V620 cooling situation, the easiest low-cost approach is usually sticking with a blower-style fan, since the fan bearings Supplier and airflow path are designed specifically to handle the GPU’s heat and pressure without creating vibration or premature wear. The rear screws near the power connectors on the V620 do look similar to the Mi50, but even slight differences in spacing or thread depth can affect how the blower mounts and whether the bearing-loaded fan spins freely. A custom full shroud can work, but it often requires precise alignment to avoid stressing the blower bearings and the GPU heatsink, which can be tricky without a 3D printer or proper templates. If you want to minimize risk to the card and the blower bearings, your safest bet—though more expensive—is buying the official V620 blower from AliExpress, since it’s designed to mount correctly and keep the Precision spindle bearings operating within spec.
 
Last edited: