Zurück zu Anwendungen Eine App einreichenEinreichen

ollama

Offiziell

Docker-Anwendung from joly0's Repository

Übersicht

Official Ollama images for Nvidia or AMD GPUs. Intel Arc GPUs could potentially work with Vulkan enabled. NOTE: Extra Parameters needed for Nvidia: --gpus=all Additionally you can add a Variable called CUDA_VISIBLE_DEVICES to only use specific Nvidia GPU(s) by ID. Example: 0 or 0,1 Extra Parameters needed for AMD: --device='/dev/kfd' --device='/dev/dri' Additionally you can add a Variable called ROCR_VISIBLE_DEVICES to only use specific AMD GPU(s) by UUID. Example: GPU-67246603a11f0a21 or GPU-XXXXXXX,GPU-YYYYYYY Docs: https://docs.ollama.com/

Anforderungen

Nvidia-Driver plugin (nVidia Support)

Radeon-TOP plugin (AMD Support)

Laufzeit-Argumente

Web-UI: http://[IP]:[PORT:11434]/
Netzwerk: bridge
Shell: bash
Privilegiert: false

Konfiguration der Vorlage

DataPathrw

Ziel: /root/.ollama
Standard: /mnt/user/appdata/ollama
Wert: /mnt/user/appdata/ollama

API Interface PortPorttcp

Port number where ollama listens on.

Ziel: 11434
Standard: 11434
Wert: 11434

OLLAMA_HOSTVariable

IP and Port the server binds to. Set to 127.0.0.1:11434 for internal only access.

Standard: 0.0.0.0:11434
Wert: 0.0.0.0:11434

OLLAMA_ORIGINSVariable

Comma-separated list of allowed CORS origins.

Standard: *
Wert: *

OLLAMA_KEEP_ALIVEVariable

How long a model stays in VRAM, e.g. 60m or 24h (Set to -1 for infinite, 0 for none).

Standard: 5m
Wert: 5m

OLLAMA_LOAD_TIMEOUTVariable

Timeout for stall detection during model loads.

Standard: 5m
Wert: 5m

OLLAMA_NUM_PARALLELVariable

Max number of parallel requests a single model can handle.

Standard: 1
Wert: 1

OLLAMA_CONTEXT_LENGTHVariable

Default context window (tokens) if not specified by the model.

Standard: 4096
Wert: 4096

OLLAMA_KV_CACHE_TYPEVariable

Quantization type for the K/V cache, e.g. f16, q8_0, q4_0.

Standard: f16
Wert: f16

OLLAMA_MODELSVariable

The path where model weights and blobs are stored.

Standard: /root/.ollama/models
Wert: /root/.ollama/models

OLLAMA_MAX_LOADED_MODELSVariable

Maximum number of models loaded per GPU at once (Set to 0 for infinite).

Standard: 0
Wert: 0

OLLAMA_MAX_QUEUEVariable

Max requests that can wait in line when the server is busy.

Standard: 512
Wert: 512

OLLAMA_DEBUGVariable

Log detail level: 0 for INFO, 1 for DEBUG, 2 for TRACE.

Standard: 0|1|2

OLLAMA_GPU_OVERHEADVariable

Reserved VRAM (in bytes) to leave empty on each GPU.

Standard: 0
Wert: 0

OLLAMA_FLASH_ATTENTIONVariable

Enables experimental Flash Attention optimizations.

Standard: false|true

OLLAMA_SCHED_SPREADVariable

If true, always spreads model layers across all visible GPUs.

Standard: false|true

OLLAMA_MULTIUSER_CACHEVariable

Optimizes prompt caching when multiple users share a model.

Standard: false|true

OLLAMA_NOPRUNEVariable

If true, does not delete unused model blobs on startup.

Standard: false|true

OLLAMA_NOHISTORYVariable

Disables the readline history in the interactive CLI.

Standard: false|true

OLLAMA_NEW_ENGINEVariable

Enables the experimental new Ollama engine.

Standard: false|true

OLLAMA_VULKANVariable

Enables experimental Vulkan hardware acceleration.

Standard: false|true

HTTP_PROXYVariable

Proxy for downloading models over HTTP.

HTTPS_PROXYVariable

Proxy for downloading models over HTTPS.

NO_PROXYVariable

Comma-separate list of hosts/IPs that bypass the proxy.

Links

Vorlage Unterstützung Docker-Hub

Einzelheiten

Repository

ollama/ollama

Registry

https://hub.docker.com/r/ollama/ollama/

Zuletzt aktualisiert2026-05-27

Erstmals gesehen2023-11-08

Führen Sie Ollama auf Unraid aus.

Ollama ist gelistet in Community Apps für Unraid OS. Erkunden Sie Unraid, um einen flexiblen Heimserver, ein NAS oder ein Heimlabor aufzubauen.

Erkunden Sie Unraid OS