All apps · 0 apps

OpenChat-Cuda

Overview

A self-hosted, offline, ChatGPT-like chatbot with open source LLM support. 100% private, with no data leaving your device. Please note that this version requires an NVIDIA GPU with the Unraid NVIDIA-DRIVER plugin.

Readme

View on GitHub

A self-hosted, offline, ChatGPT-like chatbot with different LLM support. 100% private, with no data leaving your device.

How to install

Install OpenLLM anywhere else with Docker

You can run OpenLLM on any x86 system. Make sure you have Docker installed.

Then, clone this repo and cd into it:

git clone https://github.com/edgar971/open-chat.git
cd open-chat

You can now run OpenLLM with any of the following models depending upon your hardware:

Model size	Model used	Minimum RAM required	How to start OpenLLM
7B	Nous Hermes Llama 2 7B (GGML q4_0)	8GB	`docker compose up -d`
13B	Nous Hermes Llama 2 13B (GGML q4_0)	16GB	`docker compose -f docker-compose-13b.yml up -d`
70B	Meta Llama 2 70B Chat (GGML q4_0)	48GB	`docker compose -f docker-compose-70b.yml up -d`

You can access OpenLLM at http://localhost:3000.

To stop OpenLLM, run:

docker compose down

API Configuration

Additional settings can be found here and added as env variables or arguments to the run.sh (--n_ctx 12) script.

Example:

version: '3'
services:
  api:
    image: ghcr.io/edgar971/open-chat-cuda:latest
    environment:
      - MODEL=/path/to/your/model
      - N_CTX=4096
    ports:

Acknowledgements

A massive thank you to the following developers and teams for making OpenLLM possible:

Mckay Wrigley for building Chatbot UI.
Georgi Gerganov for implementing llama.cpp.
Andrei for building the Python bindings for llama.cpp.
NousResearch for fine-tuning the Llama 2 7B and 13B models.
Tom Jobbins for quantizing the Llama 2 models.
Meta for releasing Llama 2 under a permissive license.

Install OpenChat-Cuda on Unraid in a few clicks.

Find OpenChat-Cuda in Community Apps on your Unraid server, review the template, and click Install. Unraid handles the Docker app or plugin setup from the published template.

Open the Apps tab on your Unraid server Search Community Apps for OpenChat-Cuda Review the template variables and paths Click Install

Explore Unraid OS

Related apps

Explore more like this

Explore all

AI Home Automation Productivity Utilities apps

Links

Projectgithub.com Supportgithub.com GHCRregistry.hub.docker.com Templateraw.githubusercontent.com

Details

Repository

ghcr.io/edgar971/open-chat-cuda:v1.0.6

Registry

https://registry.hub.docker.com/r/ghcr.io/edgar971/open-chat-cuda

Last Updated2026-07-15

First Seen2023-09-06

Runtime arguments

Web UI: http://[IP]:[PORT:3000]/
Network: bridge
Shell: sh
Privileged: false
Extra Params: --gpus all

Template configuration

Local Model PathVariable

The local model path

Target: MODEL
Default: /models/llama-2-7b-chat.bin

Model Download URLVariable

GGML Model Binary.

Target: MODEL_DOWNLOAD_URL
Default: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin

Model DirectoryPathrw

The local model directory to use as a cache

Target: /models
Default: /mnt/user/appdata/models

Web UIPorttcp

Chat UI Port

Target: 3000
Default: 3000

API PortPorttcp

HTTP API Port

Target: 8000
Default: 8000

Number Of GPU LayersVariable

Layers to offload to GPU. Update this number if server fails to load.

Target: N_GPU_LAYERS
Value: 12

Library

Curated

Categories