nvidia-nim-single
Docker 应用程序 from PikkonMG's Repository
概述
要求
运行时参数
- 网络用户界面
http://[IP]:[PORT:8000]/docs- 网络
bridge- 外壳
bash- 特权
- false
- 额外参数
--gpus all --shm-size=16gb --ulimit memlock=-1 --ulimit stack=67108864
模板配置
NIM listens on this port. WebUI docs at http://your-server-ip:8000/docs. API base URL at http://your-server-ip:8000/v1 -- use this when connecting clients like AnythingLLM, Open WebUI, LangChain, etc. Use any non-empty string as the API key in clients.
- 目标
- 8000
- 默认值
- 8000
- 价值
- 8000
Persistent storage for downloaded model weights. IMPORTANT: Run 'chown -R 1000:1000 /mnt/user/appdata/nvidia-nim/cache && chmod -R 775 /mnt/user/appdata/nvidia-nim/cache' in the Unraid terminal before first start or the container will fail with a permission error. SSD storage preferred for faster load times.
- 目标
- /opt/nim/.cache
- 默认值
- /mnt/user/appdata/nvidia-nim/cache
- 价值
- /mnt/user/appdata/nvidia-nim/cache
Your NVIDIA Personal API key from https://build.nvidia.com. Generate a Personal API Key from your profile. NOTE: This is separate from the docker login nvcr.io command which allows Docker to pull the container image. This variable allows the container to authenticate with NGC to download model artifacts at runtime.
- 目标
- NGC_API_KEY
Must match the model used by the container image. Default is the 3B model recommended for 12GB GPUs. Browse models at https://build.nvidia.com/models
- 目标
- NIM_MODEL_NAME
- 默认值
- meta/llama-3.2-3b-instruct
- 价值
- meta/llama-3.2-3b-instruct
Maximum context window in tokens. The 3B model requests 131072 by default but a 12GB GPU can only fit ~30000 tokens of KV cache. Set to 16384 for 12GB cards. Reduce to 8192 if KV cache errors occur.
- 目标
- NIM_MAX_MODEL_LEN
- 默认值
- 16384
- 价值
- 16384
Internal container path for the model cache. Must match the container-side path of the Model Cache volume mapping above.
- 目标
- NIM_CACHE_PATH
- 默认值
- /opt/nim/.cache
- 价值
- /opt/nim/.cache
GPU index to use inside the container. Use 0 for the first GPU, 0,1 for multiple GPUs. Do NOT use 'all' -- it will crash vLLM.
- 目标
- CUDA_VISIBLE_DEVICES
- 默认值
- 0
- 价值
- 0
Allows NIM to relax strict GPU memory checks so models may start on GPUs with less VRAM than normally required.
- 目标
- NIM_RELAX_MEM_CONSTRAINTS
- 默认值
- 1
- 价值
- 1
Reduces GPU memory fragmentation. Helps avoid out-of-memory errors on consumer GPUs.
- 目标
- PYTORCH_CUDA_ALLOC_CONF
- 默认值
- expandable_segments:True
- 价值
- expandable_segments:True
Logging verbosity. Options: DEBUG, INFO, WARNING, ERROR.
- 目标
- NIM_LOG_LEVEL
- 默认值
- INFO
- 价值
- INFO
详细信息
nvcr.io/nim/meta/llama-3.2-3b-instruct:latest在Unraid 上运行 nvidia-nim-single 。
nvidia-nim-single 已被列入Unraid OS 的社区应用程序。探索Unraid ,构建灵活的家庭服务器、NAS 或家庭实验室。