llama-swap
Docker 应用程序 from PikkonMG's Repository
概述
llama-swap is a lightweight proxy server that provides automatic model swapping for llama.cpp (llama-server).
It hot-swaps models on demand based on API requests so you can serve many GGUF models from a single endpoint without restarting.
Features: automatic model loading/unloading, macros, aliases, groups for multi-model concurrency, TTL auto-unload, streaming log viewer, and OpenAI-compatible API.
SETUP:
1. Place your GGUF model files in the Models path below.
2. Create a config.yaml (see template https://github.com/PikkonMG/Unraid-docker-templates/blob/main/examples/llama-swap/example-llama-swap-config.yaml) and place it in the Config path.
3. In your config.yaml, reference models as /models/yourmodel.gguf
4. For NVIDIA GPU: install the Unraid Nvidia plugin, select the cuda tag, and set ExtraParams to: --gpus all
5. For AMD GPU: select the rocm tag and set ExtraParams to: --device /dev/kfd --device /dev/dri --group-add video --security-opt seccomp=unconfined
6. For Intel iGPU or other Vulkan-capable GPUs: select the vulkan tag and set ExtraParams to: --device /dev/dri
7. For CPU only: select the cpu tag and remove ExtraParams entirely
运行时参数
- 网络用户界面
http://[IP]:[PORT:8080]/ui- 网络
bridge- 特权
- false
- 额外参数
--gpus all
模板配置
Web UI / API PortPorttcp
Port for the OpenAI-compatible API and log viewer web UI.
- 目标
- 8080
- 默认值
- 8080
- 价值
- 8080
Models DirectoryPathro
Path to the folder containing your GGUF model files. Referenced in config.yaml as /models/filename.gguf
- 目标
- /models
- 默认值
- /mnt/user/appdata/llama-swap/models
- 价值
- /mnt/user/appdata/llama-swap/models
Config DirectoryPathrw
Path to the folder containing your config.yaml file. Hot-reload should be enabled via -watch-config.
- 目标
- /config
- 默认值
- /mnt/user/appdata/llama-swap/config
- 价值
- /mnt/user/appdata/llama-swap/config
NVIDIA_VISIBLE_DEVICESVariable
Which NVIDIA GPU(s) to expose. Use 'all' or a specific GPU UUID from the Nvidia plugin settings page. Only needed for NVIDIA tags.
- 默认值
- all
- 价值
- all
NVIDIA_DRIVER_CAPABILITIESVariable
NVIDIA driver capabilities to expose. Use 'compute,utility' for LLM inference or 'all' for full access.
- 默认值
- all
- 价值
- all
详细信息
存储库
ghcr.io/mostlygeek/llama-swap:cuda最后更新2026-05-31
初见2026-04-06
在Unraid 上运行 llama-swap 。
llama-swap 已被列入Unraid OS 的社区应用程序。探索Unraid ,构建灵活的家庭服务器、NAS 或家庭实验室。