Ollama with Open WebUI

beginner

Ollama local LLMs with Open WebUI chat.

[i]Overview

Ollama is a revolutionary tool that democratizes access to large language models by enabling users to run powerful AI models like Llama 2, Mistral, and Code Llama directly on their local machines. Created to address the privacy concerns and API costs associated with cloud-based AI services, Ollama provides a simple command-line interface for downloading, managing, and serving LLMs with OpenAI-compatible APIs. Its efficient model quantization and GPU acceleration capabilities make it possible to run sophisticated AI models on consumer hardware. This stack combines Ollama's local LLM serving capabilities with Open WebUI's intuitive chat interface, creating a complete self-hosted AI chat solution. Open WebUI transforms Ollama's command-line functionality into a familiar ChatGPT-like web interface, complete with conversation history, model switching, and multi-user support. The integration allows users to interact with their locally-hosted models through a polished web interface while maintaining complete data privacy and eliminating external API dependencies. This combination is ideal for privacy-conscious individuals, developers building AI applications, organizations with sensitive data requirements, and anyone wanting to experiment with large language models without recurring costs. The stack provides enterprise-grade AI capabilities while keeping all data processing completely local, making it perfect for confidential document analysis, code generation in secure environments, and educational AI exploration without internet dependencies.

[*]Key Features

[+]Local LLM serving with OpenAI-compatible API for complete data privacy
[+]ChatGPT-like web interface with conversation history and user management
[+]Simple model management through Ollama's pull and run commands
[+]GPU acceleration support with NVIDIA CUDA for improved inference speed
[+]Multi-model support including Llama 2, Mistral, Code Llama, and custom Modelfiles
[+]RAG document support for context-aware conversations with uploaded files
[+]Prompt templates and custom personas for specialized AI interactions
[+]Model quantization options to optimize memory usage and performance

[#]Common Use Cases

[1]Private AI chat for sensitive business communications and confidential document analysis
[2]Development teams building AI-powered applications without external API dependencies
[3]Educational institutions teaching AI concepts with hands-on local model experimentation
[4]Healthcare organizations requiring HIPAA-compliant AI assistance for medical documentation
[5]Legal firms using AI for document review while maintaining attorney-client privilege
[6]Home lab enthusiasts exploring large language models without subscription costs
[7]Offline AI applications in environments with limited or restricted internet access

[!]Prerequisites

[!]Minimum 8GB RAM (16GB+ recommended) for running 7B parameter models effectively
[!]NVIDIA GPU with 6GB+ VRAM for optimal performance, or remove GPU configuration for CPU-only mode
[!]Docker and Docker Compose installed with nvidia-container-toolkit for GPU support
[!]Ports 3000 and 11434 available for Open WebUI and Ollama API respectively
[!]Sufficient disk space for model storage (models range from 4GB to 70GB+)
[!]Basic understanding of LLM concepts and model parameter sizing

[!]

WARNING: For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

[$]docker-compose.yml

[docker-compose.yml]

1services: 
2  ollama: 
3    image: ollama/ollama:latest
4    container_name: ollama
5    restart: unless-stopped
6    ports: 
7      - "${OLLAMA_PORT:-11434}:11434"
8    volumes: 
9      - ollama_data:/root/.ollama
10    deploy: 
11      resources: 
12        reservations: 
13          devices: 
14            - driver: nvidia
15              count: all
16              capabilities: [gpu]
17
18  open-webui: 
19    image: ghcr.io/open-webui/open-webui:main
20    container_name: open-webui
21    restart: unless-stopped
22    ports: 
23      - "${WEBUI_PORT:-3000}:8080"
24    environment: 
25      - OLLAMA_BASE_URL=http://ollama:11434
26    volumes: 
27      - open_webui_data:/app/backend/data
28    depends_on: 
29      - ollama
30
31volumes: 
32  ollama_data: 
33  open_webui_data:

[$].env Template

[.env]

1# Ollama + Open WebUI
2OLLAMA_PORT=11434
3WEBUI_PORT=3000

[i]Usage Notes

[1]Docs: https://docs.openwebui.com/, https://ollama.ai/docs
[2]Open WebUI at http://localhost:3000 - create admin on first visit
[3]Pull models: docker exec ollama ollama pull llama2:7b (or mistral, codellama)
[4]GPU requires nvidia-container-toolkit - remove deploy section for CPU mode
[5]Chat history, multiple users, and custom personas supported
[6]Connect additional OpenAI-compatible APIs in Settings > Connections

Individual Services(2 services)

Copy individual services to mix and match with your existing compose files.

ollama

ollama:
  image: ollama/ollama:latest
  container_name: ollama
  restart: unless-stopped
  ports:
    - ${OLLAMA_PORT:-11434}:11434
  volumes:
    - ollama_data:/root/.ollama
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities:
              - gpu

open-webui

open-webui:
  image: ghcr.io/open-webui/open-webui:main
  container_name: open-webui
  restart: unless-stopped
  ports:
    - ${WEBUI_PORT:-3000}:8080
  environment:
    - OLLAMA_BASE_URL=http://ollama:11434
  volumes:
    - open_webui_data:/app/backend/data
  depends_on:
    - ollama

[>]Quick Start

[terminal]

1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4  ollama:
5    image: ollama/ollama:latest
6    container_name: ollama
7    restart: unless-stopped
8    ports:
9      - "${OLLAMA_PORT:-11434}:11434"
10    volumes:
11      - ollama_data:/root/.ollama
12    deploy:
13      resources:
14        reservations:
15          devices:
16            - driver: nvidia
17              count: all
18              capabilities: [gpu]
19
20  open-webui:
21    image: ghcr.io/open-webui/open-webui:main
22    container_name: open-webui
23    restart: unless-stopped
24    ports:
25      - "${WEBUI_PORT:-3000}:8080"
26    environment:
27      - OLLAMA_BASE_URL=http://ollama:11434
28    volumes:
29      - open_webui_data:/app/backend/data
30    depends_on:
31      - ollama
32
33volumes:
34  ollama_data:
35  open_webui_data:
36EOF
37
38# 2. Create the .env file
39cat > .env << 'EOF'
40# Ollama + Open WebUI
41OLLAMA_PORT=11434
42WEBUI_PORT=3000
43EOF
44
45# 3. Start the services
46docker compose up -d
47
48# 4. View logs
49docker compose logs -f

[>]One-Liner

Run this command to download and set up the recipe in one step:

[terminal]

1curl -fsSL https://docker.recipes/api/recipes/ollama-webui-stack/run | bash

[?]Troubleshooting

[!]Open WebUI shows 'Ollama not found' error: Verify Ollama container is running and accessible at http://ollama:11434 internally
[!]Models fail to load with out of memory errors: Reduce model size or switch to CPU-only mode by removing the GPU deployment configuration
[!]GPU not detected in Ollama container: Install nvidia-container-toolkit on host system and restart Docker daemon
[!]Slow inference performance: Enable GPU acceleration or try smaller quantized models like 7B instead of 13B parameters
[!]Cannot access Open WebUI interface: Check if port 3000 is available and not blocked by firewall
[!]Model download fails or times out: Use docker exec ollama ollama pull command manually and check internet connectivity

Community Notes

Loading notes...

## Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

## Components

ollamaopen-webui

## Tags

#ollama#llm#chat#local-ai

## Category

AI & Machine Learning

## Related

Shortcuts: C CopyF FavoriteD Download