ollama

官方

概述

Official Ollama images for Nvidia or AMD GPUs. Intel Arc GPUs could potentially work with Vulkan enabled. NOTE: Extra Parameters needed for Nvidia: --gpus=all Additionally you can add a Variable called CUDA_VISIBLE_DEVICES to only use specific Nvidia GPU(s) by ID. Example: 0 or 0,1 Extra Parameters needed for AMD: --device='/dev/kfd' --device='/dev/dri' Additionally you can add a Variable called ROCR_VISIBLE_DEVICES to only use specific AMD GPU(s) by UUID. Example: GPU-67246603a11f0a21 or GPU-XXXXXXX,GPU-YYYYYYY Docs: https://docs.ollama.com/

要求

Nvidia-Driver plugin (nVidia Support)

Radeon-TOP plugin (AMD Support)

运行时参数

网络用户界面: http://[IP]:[PORT:11434]/
网络: bridge
外壳: bash
特权: false

模板配置

DataPathrw

目标: /root/.ollama
默认值: /mnt/user/appdata/ollama
价值: /mnt/user/appdata/ollama

API Interface PortPorttcp

Port number where ollama listens on.

目标: 11434
默认值: 11434
价值: 11434

OLLAMA_HOSTVariable

IP and Port the server binds to. Set to 127.0.0.1:11434 for internal only access.

默认值: 0.0.0.0:11434
价值: 0.0.0.0:11434

OLLAMA_ORIGINSVariable

Comma-separated list of allowed CORS origins.

默认值: *
价值: *

OLLAMA_KEEP_ALIVEVariable

How long a model stays in VRAM, e.g. 60m or 24h (Set to -1 for infinite, 0 for none).

默认值: 5m
价值: 5m

OLLAMA_LOAD_TIMEOUTVariable

Timeout for stall detection during model loads.

默认值: 5m
价值: 5m

OLLAMA_NUM_PARALLELVariable

Max number of parallel requests a single model can handle.

默认值: 1
价值: 1

OLLAMA_CONTEXT_LENGTHVariable

Default context window (tokens) if not specified by the model.

默认值: 4096
价值: 4096

OLLAMA_KV_CACHE_TYPEVariable

Quantization type for the K/V cache, e.g. f16, q8_0, q4_0.

默认值: f16
价值: f16

OLLAMA_MODELSVariable

The path where model weights and blobs are stored.

默认值: /root/.ollama/models
价值: /root/.ollama/models

OLLAMA_MAX_LOADED_MODELSVariable

Maximum number of models loaded per GPU at once (Set to 0 for infinite).

默认值: 0
价值: 0

OLLAMA_MAX_QUEUEVariable

Max requests that can wait in line when the server is busy.

默认值: 512
价值: 512

OLLAMA_DEBUGVariable

Log detail level: 0 for INFO, 1 for DEBUG, 2 for TRACE.

默认值: 0|1|2

OLLAMA_GPU_OVERHEADVariable

Reserved VRAM (in bytes) to leave empty on each GPU.

默认值: 0
价值: 0

OLLAMA_FLASH_ATTENTIONVariable

Enables experimental Flash Attention optimizations.

默认值: false|true

OLLAMA_SCHED_SPREADVariable

If true, always spreads model layers across all visible GPUs.

默认值: false|true

OLLAMA_MULTIUSER_CACHEVariable

Optimizes prompt caching when multiple users share a model.

默认值: false|true

OLLAMA_NOPRUNEVariable

If true, does not delete unused model blobs on startup.

默认值: false|true

OLLAMA_NOHISTORYVariable

Disables the readline history in the interactive CLI.

默认值: false|true

OLLAMA_NEW_ENGINEVariable

Enables the experimental new Ollama engine.

默认值: false|true

OLLAMA_VULKANVariable

Enables experimental Vulkan hardware acceleration.

默认值: false|true

HTTP_PROXYVariable

Proxy for downloading models over HTTP.

HTTPS_PROXYVariable

Proxy for downloading models over HTTPS.

NO_PROXYVariable

Comma-separate list of hosts/IPs that bypass the proxy.

类别

AI Other

下载统计数据

138,822,335

下载总数

8,835,773

本月

15,452,047

平均每月

长期下载总量

加载图表...

链接

模板支持 Docker Hub

详细信息

存储库

ollama/ollama

登记处

https://hub.docker.com/r/ollama/ollama/

最后更新2026-05-27

初见2023-11-08

在Unraid 上运行 Ollama 。

Ollama 已被列入Unraid OS 的社区应用程序。探索Unraid ，构建灵活的家庭服务器、NAS 或家庭实验室。

探索Unraid OS