Proxmox & LLM Anyone?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,883
2,219
113
Anyone using proxmox and passing through a 5090 or rtx6000 to a VM For LLM and using the rest of the compute\ram for other, normal VMs?

Did you notice any LLM performance hit?

Anyone on a EPYC or modern Intel server CPU running CPU based LLM in a VM and NVIDIA based LLM in another? How's that working for you?
 

ano

Well-Known Member
Nov 7, 2022
777
336
63
me! minimal performance hit, biggest issue is not beeing able to passthrough a enough cpu cores vs available, but not really an issues.

running backups on vms though with gpu passthrough is giving some trouble
2x 9b13 with rtx pro 6000 (as-4125)

cpu based llm is on todo

happy to share results/resources, started on it only a few weeks ago
 
  • Like
Reactions: T_Minus

foureight84

Well-Known Member
Jun 26, 2018
458
387
63
I use Proxmox and 2 RTX 3090. I have tried VM and LXC and they are both performant and no noticeable differences. I ended up sticking with LXC instead since it's quicker to bootup. I'm on Cascade Lake.
 
  • Like
Reactions: T_Minus

louie1961

Well-Known Member
May 15, 2023
587
298
63
I have an RTX 2000 Ada in my main Proxmox node. I pass it through to my Debian VM that is my docker host. I have installed the Nvidia toolkit for docker, and I can share the GPU across all my docker workloads that can use it: Ollama/OpenwebUI, Immich, PaperlessNGX, n8n, Whisper, etc.
 

epicurean

Active Member
Sep 29, 2014
828
99
28
I have an RTX 2000 Ada in my main Proxmox node. I pass it through to my Debian VM that is my docker host. I have installed the Nvidia toolkit for docker, and I can share the GPU across all my docker workloads that can use it: Ollama/OpenwebUI, Immich, PaperlessNGX, n8n, Whisper, etc.
Hi, are you able to give a guide on how you did this? I am hoping to share my A2000 card with frigate and home assistant in a proxmox node
 

louie1961

Well-Known Member
May 15, 2023
587
298
63
Sure:

Proxmox Host Setup:
Enable IOMMU in BIOS (AMD: AMD-Vi, Intel: VT-d)
Edit /etc/default/grub:
For AMD: Add amd_iommu=on iommu=pt to GRUB_CMDLINE_LINUX_DEFAULT
For Intel: Add intel_iommu=on iommu=pt
Run update-grub and reboot
Load VFIO modules - add to /etc/modules:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Then run "update-initramfs -u -k all" and reboot

Verify IOMMU: Run "dmesg | grep -e DMAR -e IOMMU" (should show enabled)

VM Configuration:
Shut down your VM
In Proxmox: Hardware → Add → PCI Device → Select your GPU → Check "All Functions"
Start VM

Inside Debian 13 VM:
Edit /etc/apt/sources.list.d/debian.sources - ensure each Components line has:
Components: main contrib non-free-firmware non-free

Add Nvidia container toolkit repo:
run "curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg"

then rrun "echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/amd64 /" | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list"

Install everything:
run "sudo apt update && sudo apt install -y dkms linux-headers-$(uname -r) nvidia-driver nvidia-container-toolkit"

Configure Docker and reboot:
run "sudo nvidia-ctk runtime configure --runtime=docker"
then run "sudo reboot"

Test:

run "nvidia-smi"
run "docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi"

Common Issues:

1. If nvidia-smi fails: Check /usr/sbin/dkms status. If status shows "added" not "installed", run "sudo /usr/sbin/dkms install nvidia-current/<version>"
2. Missing contrib repo causes dependency errors - it's required
3. DKMS must be installed before/with nvidia-driver or kernel module won't build. DKMS requires the kernel headers that match your current kernel. You may want to install "linux-headers-amd64" using apt, so that when the kernel updates, the headers will automatically update too.


In your docker compose files you need to include the highlighted section for containers using the GPU

services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- /mnt/ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all

capabilities: [gpu]
 

epicurean

Active Member
Sep 29, 2014
828
99
28
This command
"docker run --rm --gpus all nvidia/cuda:13.2.0-base-ubuntu24.04 nvidia-smi"

for me gives me this error after it pulls down the image:
docker: Error response from daemon: failed to discover GPU vendor from CDI: no known GPU vendor found
 

louie1961

Well-Known Member
May 15, 2023
587
298
63
The Docker daemon probably didn't pick up the nvidia runtime config. Check /etc/docker/daemon.json It should contain a section that looks like this:
JSON:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime"
        }
    }
}
If its missing or looks wrong run the following commands

Bash:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
If those don't work, try these

Bash:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
sudo systemctl restart docker
 

epicurean

Active Member
Sep 29, 2014
828
99
28
The Docker daemon probably didn't pick up the nvidia runtime config. Check /etc/docker/daemon.json It should contain a section that looks like this:
JSON:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime"
        }
    }
}
If its missing or looks wrong run the following commands

Bash:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
If those don't work, try these

Bash:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
sudo systemctl restart docker
Thank you again !

in the docker compose file, if I intend to use the GPU for LLM , frigate and homeassistant( which may not in the same proxmox host), is that possible? how would the docker compose file look like?