docker.recipes

LLM Development Stack

intermediate

Local LLM development with Ollama, Open WebUI, ChromaDB vector store, and LangServe

Overview

Ollama transforms how developers work with large language models by providing a simple way to run models like Llama 2, Mistral, and Code Llama locally without complex setup or cloud dependencies. Rather than managing CUDA drivers, model weights, and inference servers separately, Ollama packages everything into a streamlined interface that handles model downloading, quantization, and serving through an OpenAI-compatible API. This eliminates the traditional barriers of LLM deployment while maintaining full control over data privacy and inference costs. This development stack combines Ollama's local model serving with Open WebUI's ChatGPT-like interface, ChromaDB's vector storage capabilities, and LangServe's API framework to create a complete RAG-enabled AI development environment. The integration allows developers to experiment with retrieval-augmented generation, build custom LLM applications, and prototype AI features without sending data to external services or managing complex model infrastructure. ChromaDB provides persistent vector embeddings for document retrieval, while LangServe exposes LangChain applications as production-ready APIs. AI startups, privacy-conscious enterprises, and developers building LLM-powered applications will find this stack invaluable for rapid prototyping and production deployment. The combination delivers the full ChatGPT experience locally while providing the vector storage and API infrastructure needed for sophisticated RAG applications, making it ideal for teams that need to iterate quickly on AI features without cloud dependencies or data privacy concerns.

Key Features

  • Local LLM serving with Ollama's simple model management and GPU acceleration
  • ChatGPT-like web interface through Open WebUI with chat history and model switching
  • Persistent vector storage via ChromaDB for document embeddings and similarity search
  • LangChain application serving through LangServe's FastAPI-based endpoints
  • OpenAI-compatible API endpoints enabling drop-in replacement for existing applications
  • Multi-model support allowing switching between Llama 2, Mistral, Code Llama and other models
  • RAG document processing with automatic embedding generation and retrieval
  • Multi-user support with individual chat histories and prompt template sharing

Common Use Cases

  • 1AI startups prototyping RAG applications with proprietary documents and local model inference
  • 2Enterprise teams building internal AI assistants without sending sensitive data to external APIs
  • 3Developers creating LLM-powered applications that need offline functionality and data privacy
  • 4Research teams experimenting with different models and RAG configurations for academic projects
  • 5Companies migrating from OpenAI API to self-hosted solutions for cost control and compliance
  • 6Development teams building chatbots that combine company knowledge bases with conversational AI
  • 7Organizations creating internal documentation assistants using ChromaDB for vector search

Prerequisites

  • Minimum 16GB RAM for running 7B parameter models, 32GB+ recommended for 13B models
  • NVIDIA GPU with CUDA support for optimal inference performance (CPU fallback available)
  • Available ports 3000, 8000, 8080, and 11434 for Open WebUI, ChromaDB, LangServe, and Ollama
  • Docker Compose with GPU support configured for NVIDIA container runtime
  • Basic understanding of LangChain concepts for customizing RAG pipelines
  • Familiarity with vector databases and embedding models for ChromaDB configuration

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 ollama:
3 image: ollama/ollama:latest
4 container_name: ollama
5 restart: unless-stopped
6 ports:
7 - "${OLLAMA_PORT:-11434}:11434"
8 volumes:
9 - ollama_data:/root/.ollama
10 deploy:
11 resources:
12 reservations:
13 devices:
14 - driver: nvidia
15 count: all
16 capabilities: [gpu]
17
18 open-webui:
19 image: ghcr.io/open-webui/open-webui:main
20 container_name: open-webui
21 restart: unless-stopped
22 ports:
23 - "${WEBUI_PORT:-3000}:8080"
24 environment:
25 - OLLAMA_BASE_URL=http://ollama:11434
26 - WEBUI_SECRET_KEY=${SECRET_KEY}
27 volumes:
28 - open_webui_data:/app/backend/data
29 depends_on:
30 - ollama
31
32 chromadb:
33 image: chromadb/chroma:latest
34 container_name: chromadb
35 restart: unless-stopped
36 ports:
37 - "${CHROMA_PORT:-8000}:8000"
38 volumes:
39 - chroma_data:/chroma/chroma
40 environment:
41 - IS_PERSISTENT=TRUE
42 - ANONYMIZED_TELEMETRY=FALSE
43
44 langserve:
45 build:
46 context: ./langserve
47 dockerfile: Dockerfile
48 container_name: langserve
49 restart: unless-stopped
50 ports:
51 - "${LANGSERVE_PORT:-8080}:8080"
52 environment:
53 - OLLAMA_BASE_URL=http://ollama:11434
54 - CHROMA_HOST=chromadb
55 - CHROMA_PORT=8000
56 depends_on:
57 - ollama
58 - chromadb
59 profiles:
60 - dev
61
62volumes:
63 ollama_data:
64 open_webui_data:
65 chroma_data:

.env Template

.env
1# LLM Development Stack
2OLLAMA_PORT=11434
3WEBUI_PORT=3000
4CHROMA_PORT=8000
5LANGSERVE_PORT=8080
6
7# Security
8SECRET_KEY=your-secret-key

Usage Notes

  1. 1Open WebUI at http://localhost:3000 (create account)
  2. 2Ollama API at http://localhost:11434
  3. 3ChromaDB at http://localhost:8000
  4. 4Pull models: docker exec ollama ollama pull llama2
  5. 5ChromaDB provides vector storage for RAG
  6. 6LangServe exposes LangChain apps as APIs

Individual Services(4 services)

Copy individual services to mix and match with your existing compose files.

ollama
ollama:
  image: ollama/ollama:latest
  container_name: ollama
  restart: unless-stopped
  ports:
    - ${OLLAMA_PORT:-11434}:11434
  volumes:
    - ollama_data:/root/.ollama
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities:
              - gpu
open-webui
open-webui:
  image: ghcr.io/open-webui/open-webui:main
  container_name: open-webui
  restart: unless-stopped
  ports:
    - ${WEBUI_PORT:-3000}:8080
  environment:
    - OLLAMA_BASE_URL=http://ollama:11434
    - WEBUI_SECRET_KEY=${SECRET_KEY}
  volumes:
    - open_webui_data:/app/backend/data
  depends_on:
    - ollama
chromadb
chromadb:
  image: chromadb/chroma:latest
  container_name: chromadb
  restart: unless-stopped
  ports:
    - ${CHROMA_PORT:-8000}:8000
  volumes:
    - chroma_data:/chroma/chroma
  environment:
    - IS_PERSISTENT=TRUE
    - ANONYMIZED_TELEMETRY=FALSE
langserve
langserve:
  build:
    context: ./langserve
    dockerfile: Dockerfile
  container_name: langserve
  restart: unless-stopped
  ports:
    - ${LANGSERVE_PORT:-8080}:8080
  environment:
    - OLLAMA_BASE_URL=http://ollama:11434
    - CHROMA_HOST=chromadb
    - CHROMA_PORT=8000
  depends_on:
    - ollama
    - chromadb
  profiles:
    - dev

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 ollama:
5 image: ollama/ollama:latest
6 container_name: ollama
7 restart: unless-stopped
8 ports:
9 - "${OLLAMA_PORT:-11434}:11434"
10 volumes:
11 - ollama_data:/root/.ollama
12 deploy:
13 resources:
14 reservations:
15 devices:
16 - driver: nvidia
17 count: all
18 capabilities: [gpu]
19
20 open-webui:
21 image: ghcr.io/open-webui/open-webui:main
22 container_name: open-webui
23 restart: unless-stopped
24 ports:
25 - "${WEBUI_PORT:-3000}:8080"
26 environment:
27 - OLLAMA_BASE_URL=http://ollama:11434
28 - WEBUI_SECRET_KEY=${SECRET_KEY}
29 volumes:
30 - open_webui_data:/app/backend/data
31 depends_on:
32 - ollama
33
34 chromadb:
35 image: chromadb/chroma:latest
36 container_name: chromadb
37 restart: unless-stopped
38 ports:
39 - "${CHROMA_PORT:-8000}:8000"
40 volumes:
41 - chroma_data:/chroma/chroma
42 environment:
43 - IS_PERSISTENT=TRUE
44 - ANONYMIZED_TELEMETRY=FALSE
45
46 langserve:
47 build:
48 context: ./langserve
49 dockerfile: Dockerfile
50 container_name: langserve
51 restart: unless-stopped
52 ports:
53 - "${LANGSERVE_PORT:-8080}:8080"
54 environment:
55 - OLLAMA_BASE_URL=http://ollama:11434
56 - CHROMA_HOST=chromadb
57 - CHROMA_PORT=8000
58 depends_on:
59 - ollama
60 - chromadb
61 profiles:
62 - dev
63
64volumes:
65 ollama_data:
66 open_webui_data:
67 chroma_data:
68EOF
69
70# 2. Create the .env file
71cat > .env << 'EOF'
72# LLM Development Stack
73OLLAMA_PORT=11434
74WEBUI_PORT=3000
75CHROMA_PORT=8000
76LANGSERVE_PORT=8080
77
78# Security
79SECRET_KEY=your-secret-key
80EOF
81
82# 3. Start the services
83docker compose up -d
84
85# 4. View logs
86docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/llm-development-stack/run | bash

Troubleshooting

  • Ollama fails to start with GPU errors: Ensure NVIDIA container toolkit is installed and docker-compose includes GPU runtime configuration
  • Open WebUI shows 'Cannot connect to Ollama': Verify Ollama container is running and accessible at http://ollama:11434 within Docker network
  • ChromaDB connection refused: Check if port 8000 is available and container started successfully with persistent volume mounted
  • Models fail to load with OOM errors: Reduce model size or increase Docker memory limits, consider using quantized versions
  • LangServe API returns embedding errors: Ensure ChromaDB is running before starting LangServe container and environment variables are correctly set
  • Chat responses extremely slow: Enable GPU acceleration in Ollama or switch to smaller quantized models for better performance

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Components

ollamaopen-webuichromadblangserve

Tags

#llm#ollama#rag#chromadb#langchain#ai

Category

AI & Machine Learning
Ad Space