Ollama with Open WebUI
Ollama local LLMs with Open WebUI chat.
Overview
Ollama is a revolutionary tool that democratizes access to large language models by enabling users to run powerful AI models like Llama 2, Mistral, and Code Llama directly on their local machines. Created to address the privacy concerns and API costs associated with cloud-based AI services, Ollama provides a simple command-line interface for downloading, managing, and serving LLMs with OpenAI-compatible APIs. Its efficient model quantization and GPU acceleration capabilities make it possible to run sophisticated AI models on consumer hardware.
This stack combines Ollama's local LLM serving capabilities with Open WebUI's intuitive chat interface, creating a complete self-hosted AI chat solution. Open WebUI transforms Ollama's command-line functionality into a familiar ChatGPT-like web interface, complete with conversation history, model switching, and multi-user support. The integration allows users to interact with their locally-hosted models through a polished web interface while maintaining complete data privacy and eliminating external API dependencies.
This combination is ideal for privacy-conscious individuals, developers building AI applications, organizations with sensitive data requirements, and anyone wanting to experiment with large language models without recurring costs. The stack provides enterprise-grade AI capabilities while keeping all data processing completely local, making it perfect for confidential document analysis, code generation in secure environments, and educational AI exploration without internet dependencies.
Key Features
- Local LLM serving with OpenAI-compatible API for complete data privacy
- ChatGPT-like web interface with conversation history and user management
- Simple model management through Ollama's pull and run commands
- GPU acceleration support with NVIDIA CUDA for improved inference speed
- Multi-model support including Llama 2, Mistral, Code Llama, and custom Modelfiles
- RAG document support for context-aware conversations with uploaded files
- Prompt templates and custom personas for specialized AI interactions
- Model quantization options to optimize memory usage and performance
Common Use Cases
- 1Private AI chat for sensitive business communications and confidential document analysis
- 2Development teams building AI-powered applications without external API dependencies
- 3Educational institutions teaching AI concepts with hands-on local model experimentation
- 4Healthcare organizations requiring HIPAA-compliant AI assistance for medical documentation
- 5Legal firms using AI for document review while maintaining attorney-client privilege
- 6Home lab enthusiasts exploring large language models without subscription costs
- 7Offline AI applications in environments with limited or restricted internet access
Prerequisites
- Minimum 8GB RAM (16GB+ recommended) for running 7B parameter models effectively
- NVIDIA GPU with 6GB+ VRAM for optimal performance, or remove GPU configuration for CPU-only mode
- Docker and Docker Compose installed with nvidia-container-toolkit for GPU support
- Ports 3000 and 11434 available for Open WebUI and Ollama API respectively
- Sufficient disk space for model storage (models range from 4GB to 70GB+)
- Basic understanding of LLM concepts and model parameter sizing
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 ollama: 3 image: ollama/ollama:latest4 container_name: ollama5 restart: unless-stopped6 ports: 7 - "${OLLAMA_PORT:-11434}:11434"8 volumes: 9 - ollama_data:/root/.ollama10 deploy: 11 resources: 12 reservations: 13 devices: 14 - driver: nvidia15 count: all16 capabilities: [gpu]1718 open-webui: 19 image: ghcr.io/open-webui/open-webui:main20 container_name: open-webui21 restart: unless-stopped22 ports: 23 - "${WEBUI_PORT:-3000}:8080"24 environment: 25 - OLLAMA_BASE_URL=http://ollama:1143426 volumes: 27 - open_webui_data:/app/backend/data28 depends_on: 29 - ollama3031volumes: 32 ollama_data: 33 open_webui_data: .env Template
.env
1# Ollama + Open WebUI2OLLAMA_PORT=114343WEBUI_PORT=3000Usage Notes
- 1Docs: https://docs.openwebui.com/, https://ollama.ai/docs
- 2Open WebUI at http://localhost:3000 - create admin on first visit
- 3Pull models: docker exec ollama ollama pull llama2:7b (or mistral, codellama)
- 4GPU requires nvidia-container-toolkit - remove deploy section for CPU mode
- 5Chat history, multiple users, and custom personas supported
- 6Connect additional OpenAI-compatible APIs in Settings > Connections
Individual Services(2 services)
Copy individual services to mix and match with your existing compose files.
ollama
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- ${OLLAMA_PORT:-11434}:11434
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
open-webui
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- ${WEBUI_PORT:-3000}:8080
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open_webui_data:/app/backend/data
depends_on:
- ollama
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 ollama:5 image: ollama/ollama:latest6 container_name: ollama7 restart: unless-stopped8 ports:9 - "${OLLAMA_PORT:-11434}:11434"10 volumes:11 - ollama_data:/root/.ollama12 deploy:13 resources:14 reservations:15 devices:16 - driver: nvidia17 count: all18 capabilities: [gpu]1920 open-webui:21 image: ghcr.io/open-webui/open-webui:main22 container_name: open-webui23 restart: unless-stopped24 ports:25 - "${WEBUI_PORT:-3000}:8080"26 environment:27 - OLLAMA_BASE_URL=http://ollama:1143428 volumes:29 - open_webui_data:/app/backend/data30 depends_on:31 - ollama3233volumes:34 ollama_data:35 open_webui_data:36EOF3738# 2. Create the .env file39cat > .env << 'EOF'40# Ollama + Open WebUI41OLLAMA_PORT=1143442WEBUI_PORT=300043EOF4445# 3. Start the services46docker compose up -d4748# 4. View logs49docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/ollama-webui-stack/run | bashTroubleshooting
- Open WebUI shows 'Ollama not found' error: Verify Ollama container is running and accessible at http://ollama:11434 internally
- Models fail to load with out of memory errors: Reduce model size or switch to CPU-only mode by removing the GPU deployment configuration
- GPU not detected in Ollama container: Install nvidia-container-toolkit on host system and restart Docker daemon
- Slow inference performance: Enable GPU acceleration or try smaller quantized models like 7B instead of 13B parameters
- Cannot access Open WebUI interface: Check if port 3000 is available and not blocked by firewall
- Model download fails or times out: Use docker exec ollama ollama pull command manually and check internet connectivity
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download