All apps · 0 apps

nvidia-nim-single

Overview

NVIDIA NIM AI inference server for running LLMs locally on NVIDIA GPUs with CUDA acceleration and an OpenAI-compatible API. Be sure to check out for NIM related support https://developer.nvidia.com/nim DEFAULT MODEL: meta/llama-3.2-3b-instruct -- recommended for GPUs with 12 GB VRAM or less (RTX 3060, 3070, etc). TO CHANGE MODELS: Update BOTH the Repository image tag AND the NIM_MODEL_NAME variable to matching values. Browse available models at https://build.nvidia.com/models VRAM REQUIREMENTS (approximate): - Llama 3.2 3B ~6 GB -- fits 8-12 GB cards - Mistral 7B ~14 GB -- needs 16 GB+ (fp16 uses more than expected) - Llama 3.1 8B ~22 GB -- needs 24 GB+ - Llama 3.1 70B ~80 GB -- multi-GPU only BEFORE FIRST START -- REQUIRED STEPS (run once in Unraid terminal): Before you can pull the image you must have a NVIDIA API key from https://build.nvidia.com. Generate a Personal API Key from your profile. Step 1: Login to NGC registry (only needed once, persists until reboot): docker login nvcr.io Username: $oauthtoken Password: YOUR_NGC_API_KEY REQUIRED BEFORE FIRST START -- run in Unraid terminal: Step 2: Fix cache directory permissions: chown -R 1000:1000 /mnt/user/appdata/nvidia-nim/cache chmod -R 775 /mnt/user/appdata/nvidia-nim/cache REQUIRES: NVIDIA GPU (Turing/RTX 20 series or newer) | nvidia-driver Unraid plugin | NGC API key from build.nvidia.com URLS (replace YOUR_SERVER_IP with your Unraid IP): WebUI / Swagger docs : http://YOUR_SERVER_IP:8000/docs API base URL : http://YOUR_SERVER_IP:8000/v1 (use this in AnythingLLM, Open WebUI, etc.) Models list : http://YOUR_SERVER_IP:8000/v1/models

Readme

View on GitHub

NVIDIA NIM on Unraid

Run NVIDIA NIM inference microservices locally on Unraid using Docker. This guide covers setup, common errors, and connecting OpenAI-compatible clients.

Prerequisites

Unraid 6.12 or later
NVIDIA GPU (Turing architecture or newer — GTX 16xx, RTX 20xx+)
NVIDIA drivers installed in Unraid (Community Applications → NerdTools or GPU Statistics plugin)
Free NGC account at build.nvidia.com
NGC API key generated at your NGC account dashboard

Tested on: RTX 3060 12 GB · Unraid 6.12+ · NIM 1.10.1

Model Selection

NIM uses pre-optimized engine profiles. Consumer GPUs require smaller models and reduced context windows. Below are Examples.

Model	VRAM Required	Fits 12 GB?
`meta/llama-3.2-3b-instruct`	~6 GB	✅ Recommended
`microsoft/phi-3-mini-4k-instruct`	~8 GB	✅ Yes
`nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1`	~10 GB	✅ Yes
`mistralai/mistral-7b-instruct-v0.3`	~14 GB fp16	❌ OOM
`meta/llama-3.1-8b-instruct`	~22 GB bf16	❌ OOM
`meta/llama-3.1-70b-instruct`	~80 GB	❌ Multi-GPU only

For 7B+ models on a 12 GB consumer GPU, consider Ollama instead — it uses quantized weights and fits comfortably.

NGC Registry Login

⚠️ This must be done before Unraid can pull NIM images. NIM images are hosted on NVIDIA's private registry (nvcr.io), not Docker Hub.

One-time login via Unraid terminal

docker login nvcr.io
# Username: $oauthtoken       ← type this literally
# Password: YOUR_NGC_API_KEY

Persist login across reboots

Add to /boot/config/go (runs at every boot):

docker login nvcr.io -u '$oauthtoken' -p 'YOUR_NGC_API_KEY'

Note: docker login (image pull auth) and the NGC_API_KEY environment variable (runtime model weight download auth) are two separate authentications. Both are required.

Docker Template Setup

In the Unraid Docker GUI, click Add Container and fill in the following fields.

Basic Settings

Field	Value
Name	`nvidia-nim`
Repository	`nvcr.io/nim/meta/llama-3.2-3b-instruct:latest`
Network Type	`bridge`
Extra Parameters	`--gpus all --shm-size=16gb --ulimit memlock=-1 --ulimit stack=67108864`

Port Mapping

Container Port	Host Port	Protocol
`8000`	`8000`	`TCP`

Volume Mapping

Container Path	Host Path	Access Mode
`/opt/nim/.cache`	`/mnt/user/appdata/nvidia-nim/cache`	Read/Write

Before starting the container, create the cache directory with correct permissions:

mkdir -p /mnt/user/appdata/nvidia-nim/cache
chown -R 1000:1000 /mnt/user/appdata/nvidia-nim/cache
chmod 775 /mnt/user/appdata/nvidia-nim/cache

Environment Variables

Variable	Value	Notes
`NGC_API_KEY`	`your_ngc_api_key`	Required. Used at runtime to download model weights.
`NIM_MODEL_NAME`	`meta/llama-3.2-3b-instruct`	Must match the image tag.
`NIM_MAX_MODEL_LEN`	`16384`	Required for consumer GPUs. See
`NIM_CACHE_PATH`	`/opt/nim/.cache`	Points to the mounted cache volume.
`CUDA_VISIBLE_DEVICES`	`0`	Use `0` for single GPU. See
`PYTORCH_CUDA_ALLOC_CONF`	`expandable_segments:True`	Reduces memory fragmentation.
`NIM_LOG_LEVEL`	`INFO`	Set to `DEBUG` for verbose output.

First Run

On first start, NIM downloads model weights to the cache directory (~6 GB for the 3B model). This can take several minutes depending on your connection.

Watch the logs in the Unraid Docker UI or via terminal:

docker logs -f nvidia-nim

A successful startup looks like:

INFO:     Uvicorn running on http://0.0.0.0:8000

You can also verify the API is running:

curl http://localhost:8000/v1/models

Connecting Clients

NIM exposes an OpenAI-compatible API. Use these settings in any compatible client:

Setting	Value
Docs	`http://[unraid-ip]:8000/docs`
Base URL	`http://[unraid-ip]:8000/v1`
API Key	Any non-empty string (e.g. `nim`) — not validated locally
Model	`meta/llama-3.2-3b-instruct`

Compatible clients

AnythingLLM
Open WebUI
LangChain
LlamaIndex
Cursor (custom OpenAI base URL)
Any app with a configurable OpenAI-compatible endpoint

Quick test

curl http://[unraid-ip]:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.2-3b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Switching Models

NIM images for the template are currently model-specific — there is no in-app model browser. To switch:

Stop the existing container
Update the Repository field to the new model image (e.g. nvcr.io/nim/microsoft/phi-3-mini-4k-instruct:latest)
Update NIM_MODEL_NAME to match (e.g. microsoft/phi-3-mini-4k-instruct)
Start the container

To run multiple models simultaneously, create separate containers on different host ports (e.g. 8000, 8001). They can share the same cache folder — weights are not duplicated if the same model is used.

Troubleshooting

CUDA_VISIBLE_DEVICES must be numeric

Error:

If you get following "ValueError: invalid literal for int() with base 10: 'all'" it's probably becuase you changed value to all!

Fix: Set CUDA_VISIBLE_DEVICES=0 (not all). The --gpus all flag in Extra Parameters handles Docker-level GPU exposure separately.

Cache directory permission denied

Error:

The container will launch after creation you will probably get the following "PermissionError: [Errno 13] Permission denied: '/opt/nim/.cache/local_cache'"

Fix:

mkdir -p /mnt/user/appdata/nvidia-nim/cache
chmod 775 /mnt/user/appdata/nvidia-nim/cache

KV cache size error

Error:

If you get something like "ValueError: The model's max seq len (131072) is larger than the maximum number
of tokens that can be stored in KV cache (30320)".

Fix: Set NIM_MAX_MODEL_LEN=16384. If the error persists, try 8192. Or

Consumer GPUs cannot accommodate the full context window that data center profiles request. This variable caps it to a size that fits in available VRAM.

General error reference

Error	Cause	Fix
`401 Unauthorized` / image pull fails	Not logged in to `nvcr.io`	Run `docker login nvcr.io`
`ValueError: invalid literal 'all'`	`CUDA_VISIBLE_DEVICES=all`	Change to `0`
`PermissionError` on `.cache`	Wrong directory permissions	`chmod 775` the cache path
`max seq len > KV cache`	Context window too large for GPU	Set `NIM_MAX_MODEL_LEN=16384`
`CUDA out of memory`	Model too large for GPU	Use a smaller model
`No compatible profiles`	GPU too old or driver too low	Requires Turing (RTX 20xx+) or newer
`WARNING: nvfp4 unsupported`	Consumer GPU lacks nvfp4	Harmless — falls back to bf16

XML Template

An Unraid Community Applications-compatible XML template is included in this repo as nvidia-nim.xml. You can place it in /boot/config/plugins/dockerMan/templates-user/ on your Unraid server to have it appear in the Docker template list.

Resources

Install nvidia-nim-single on Unraid in a few clicks.

Find nvidia-nim-single in Community Apps on your Unraid server, review the template, and click Install. Unraid handles the Docker app or plugin setup from the published template.

Open the Apps tab on your Unraid server Search Community Apps for nvidia-nim-single Review the template variables and paths Click Install

Explore Unraid OS

Requirements

NVIDIA GPU (Turing or newer) | nvidia-driver plugin (Community Applications) | NGC API Key (build.nvidia.com)

Related apps

Explore more like this

Explore all

AI Utilities apps

Links

Projectbuild.nvidia.com Supportforums.unraid.net Docker Hubregistry.hub.docker.com Templateraw.githubusercontent.com

Details

Repository

nvcr.io/nim/meta/llama-3.2-3b-instruct:latest

Registry

https://registry.hub.docker.com/r/nvcr.io/nim/meta/llama-3.2-3b-instruct

Last Updated2026-07-18

First Seen2026-04-06

Runtime arguments

Web UI: http://[IP]:[PORT:8000]/docs
Network: bridge
Shell: bash
Privileged: false
Extra Params: --gpus all --shm-size=16gb --ulimit memlock=-1 --ulimit stack=67108864

Template configuration

API PortPorttcp

NIM listens on this port. WebUI docs at http://your-server-ip:8000/docs. API base URL at http://your-server-ip:8000/v1 -- use this when connecting clients like AnythingLLM, Open WebUI, LangChain, etc. Use any non-empty string as the API key in clients.

Target: 8000
Default: 8000
Value: 8000

Model CachePathrw

Persistent storage for downloaded model weights. IMPORTANT: Run 'chown -R 1000:1000 /mnt/user/appdata/nvidia-nim/cache && chmod -R 775 /mnt/user/appdata/nvidia-nim/cache' in the Unraid terminal before first start or the container will fail with a permission error. SSD storage preferred for faster load times.

Target: /opt/nim/.cache
Default: /mnt/user/appdata/nvidia-nim/cache
Value: /mnt/user/appdata/nvidia-nim/cache

NGC API KeyVariable

Your NVIDIA Personal API key from https://build.nvidia.com. Generate a Personal API Key from your profile. NOTE: This is separate from the docker login nvcr.io command which allows Docker to pull the container image. This variable allows the container to authenticate with NGC to download model artifacts at runtime.

Target: NGC_API_KEY

NIM Model NameVariable

Must match the model used by the container image. Default is the 3B model recommended for 12GB GPUs. Browse models at https://build.nvidia.com/models

Target: NIM_MODEL_NAME
Default: meta/llama-3.2-3b-instruct
Value: meta/llama-3.2-3b-instruct

Max Model LengthVariable

Maximum context window in tokens. The 3B model requests 131072 by default but a 12GB GPU can only fit ~30000 tokens of KV cache. Set to 16384 for 12GB cards. Reduce to 8192 if KV cache errors occur.

Target: NIM_MAX_MODEL_LEN
Default: 16384
Value: 16384

NIM Cache PathVariable

Internal container path for the model cache. Must match the container-side path of the Model Cache volume mapping above.

Target: NIM_CACHE_PATH
Default: /opt/nim/.cache
Value: /opt/nim/.cache

CUDA Visible DevicesVariable

GPU index to use inside the container. Use 0 for the first GPU, 0,1 for multiple GPUs. Do NOT use 'all' -- it will crash vLLM.

Target: CUDA_VISIBLE_DEVICES
Default: 0
Value: 0

Relax Memory ConstraintsVariable

Allows NIM to relax strict GPU memory checks so models may start on GPUs with less VRAM than normally required.

Target: NIM_RELAX_MEM_CONSTRAINTS
Default: 1
Value: 1

PyTorch Memory AllocatorVariable

Reduces GPU memory fragmentation. Helps avoid out-of-memory errors on consumer GPUs.

Target: PYTORCH_CUDA_ALLOC_CONF
Default: expandable_segments:True
Value: expandable_segments:True

NIM Log LevelVariable

Logging verbosity. Options: DEBUG, INFO, WARNING, ERROR.

Target: NIM_LOG_LEVEL
Default: INFO
Value: INFO

Library

Curated

Categories

nvidia-nim-single

Overview

Readme

NVIDIA NIM on Unraid

Table of Contents

Prerequisites

Model Selection

NGC Registry Login

One-time login via Unraid terminal

Persist login across reboots

Docker Template Setup

Basic Settings

Port Mapping

Volume Mapping

Environment Variables

First Run

Connecting Clients

Compatible clients

Quick test

Switching Models

Troubleshooting

General error reference

XML Template

Resources

Install nvidia-nim-single on Unraid in a few clicks.

Requirements

Categories

Related apps

Links

Details

Runtime arguments

Template configuration