Vue d'ensemble

llama-swap is a lightweight proxy server that provides automatic model swapping for llama.cpp (llama-server). It hot-swaps models on demand based on API requests so you can serve many GGUF models from a single endpoint without restarting. Features: automatic model loading/unloading, macros, aliases, groups for multi-model concurrency, TTL auto-unload, streaming log viewer, and OpenAI-compatible API. SETUP: 1. Place your GGUF model files in the Models path below. 2. Create a config.yaml (see template https://github.com/PikkonMG/Unraid-docker-templates/blob/main/examples/llama-swap/example-llama-swap-config.yaml) and place it in the Config path. 3. In your config.yaml, reference models as /models/yourmodel.gguf 4. For NVIDIA GPU: install the Unraid Nvidia plugin, select the cuda tag, and set ExtraParams to: --gpus all 5. For AMD GPU: select the rocm tag and set ExtraParams to: --device /dev/kfd --device /dev/dri --group-add video --security-opt seccomp=unconfined 6. For Intel iGPU or other Vulkan-capable GPUs: select the vulkan tag and set ExtraParams to: --device /dev/dri 7. For CPU only: select the cpu tag and remove ExtraParams entirely

Arguments d'exécution

Interface utilisateur Web: http://[IP]:[PORT:8080]/ui
Réseau: bridge
Privilégié: false
Paramètres supplémentaires: --gpus all

Configuration du modèle

Web UI / API PortPorttcp

Port for the OpenAI-compatible API and log viewer web UI.

Cible: 8080
Défaut: 8080
Valeur: 8080

Models DirectoryPathro

Path to the folder containing your GGUF model files. Referenced in config.yaml as /models/filename.gguf

Cible: /models
Défaut: /mnt/user/appdata/llama-swap/models
Valeur: /mnt/user/appdata/llama-swap/models

Config DirectoryPathrw

Path to the folder containing your config.yaml file. Hot-reload should be enabled via -watch-config.

Cible: /config
Défaut: /mnt/user/appdata/llama-swap/config
Valeur: /mnt/user/appdata/llama-swap/config

NVIDIA_VISIBLE_DEVICESVariable

Which NVIDIA GPU(s) to expose. Use 'all' or a specific GPU UUID from the Nvidia plugin settings page. Only needed for NVIDIA tags.

Défaut: all
Valeur: all

NVIDIA_DRIVER_CAPABILITIESVariable

NVIDIA driver capabilities to expose. Use 'compute,utility' for LLM inference or 'all' for full access.

Défaut: all
Valeur: all

Catégories

AI Productivity Tools > Utilities Other

Liens

Modèle Soutien Registre Proprojet

Détails

Référentiel

ghcr.io/mostlygeek/llama-swap:cuda

Registre

https://github.com/mostlygeek/llama-swap

Dernière mise à jour2026-05-31

Première vue2026-04-06

Exécutez llama-swap sur Unraid.

llama-swap est listé dans Community Apps pour Unraid OS. Explorez Unraid pour créer un serveur domestique flexible, un NAS ou un laboratoire domestique.

Explorez Unraid OS