LLM Development Stack

intermediate

Local LLM development with Ollama, Open WebUI, ChromaDB vector store, and LangServe

[i]Overview

Ollama transforms how developers work with large language models by providing a simple way to run models like Llama 2, Mistral, and Code Llama locally without complex setup or cloud dependencies. Rather than managing CUDA drivers, model weights, and inference servers separately, Ollama packages everything into a streamlined interface that handles model downloading, quantization, and serving through an OpenAI-compatible API. This eliminates the traditional barriers of LLM deployment while maintaining full control over data privacy and inference costs. This development stack combines Ollama's local model serving with Open WebUI's ChatGPT-like interface, ChromaDB's vector storage capabilities, and LangServe's API framework to create a complete RAG-enabled AI development environment. The integration allows developers to experiment with retrieval-augmented generation, build custom LLM applications, and prototype AI features without sending data to external services or managing complex model infrastructure. ChromaDB provides persistent vector embeddings for document retrieval, while LangServe exposes LangChain applications as production-ready APIs. AI startups, privacy-conscious enterprises, and developers building LLM-powered applications will find this stack invaluable for rapid prototyping and production deployment. The combination delivers the full ChatGPT experience locally while providing the vector storage and API infrastructure needed for sophisticated RAG applications, making it ideal for teams that need to iterate quickly on AI features without cloud dependencies or data privacy concerns.

[*]Key Features

[+]Local LLM serving with Ollama's simple model management and GPU acceleration
[+]ChatGPT-like web interface through Open WebUI with chat history and model switching
[+]Persistent vector storage via ChromaDB for document embeddings and similarity search
[+]LangChain application serving through LangServe's FastAPI-based endpoints
[+]OpenAI-compatible API endpoints enabling drop-in replacement for existing applications
[+]Multi-model support allowing switching between Llama 2, Mistral, Code Llama and other models
[+]RAG document processing with automatic embedding generation and retrieval
[+]Multi-user support with individual chat histories and prompt template sharing

[#]Common Use Cases

[1]AI startups prototyping RAG applications with proprietary documents and local model inference
[2]Enterprise teams building internal AI assistants without sending sensitive data to external APIs
[3]Developers creating LLM-powered applications that need offline functionality and data privacy
[4]Research teams experimenting with different models and RAG configurations for academic projects
[5]Companies migrating from OpenAI API to self-hosted solutions for cost control and compliance
[6]Development teams building chatbots that combine company knowledge bases with conversational AI
[7]Organizations creating internal documentation assistants using ChromaDB for vector search

[!]Prerequisites

[!]Minimum 16GB RAM for running 7B parameter models, 32GB+ recommended for 13B models
[!]NVIDIA GPU with CUDA support for optimal inference performance (CPU fallback available)
[!]Available ports 3000, 8000, 8080, and 11434 for Open WebUI, ChromaDB, LangServe, and Ollama
[!]Docker Compose with GPU support configured for NVIDIA container runtime
[!]Basic understanding of LangChain concepts for customizing RAG pipelines
[!]Familiarity with vector databases and embedding models for ChromaDB configuration

[!]

WARNING: For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

[$]docker-compose.yml

[docker-compose.yml]

1services: 
2  ollama: 
3    image: ollama/ollama:latest
4    container_name: ollama
5    restart: unless-stopped
6    ports: 
7      - "${OLLAMA_PORT:-11434}:11434"
8    volumes: 
9      - ollama_data:/root/.ollama
10    deploy: 
11      resources: 
12        reservations: 
13          devices: 
14            - driver: nvidia
15              count: all
16              capabilities: [gpu]
17
18  open-webui: 
19    image: ghcr.io/open-webui/open-webui:main
20    container_name: open-webui
21    restart: unless-stopped
22    ports: 
23      - "${WEBUI_PORT:-3000}:8080"
24    environment: 
25      - OLLAMA_BASE_URL=http://ollama:11434
26      - WEBUI_SECRET_KEY=${SECRET_KEY}
27    volumes: 
28      - open_webui_data:/app/backend/data
29    depends_on: 
30      - ollama
31
32  chromadb: 
33    image: chromadb/chroma:latest
34    container_name: chromadb
35    restart: unless-stopped
36    ports: 
37      - "${CHROMA_PORT:-8000}:8000"
38    volumes: 
39      - chroma_data:/chroma/chroma
40    environment: 
41      - IS_PERSISTENT=TRUE
42      - ANONYMIZED_TELEMETRY=FALSE
43
44  langserve: 
45    build: 
46      context: ./langserve
47      dockerfile: Dockerfile
48    container_name: langserve
49    restart: unless-stopped
50    ports: 
51      - "${LANGSERVE_PORT:-8080}:8080"
52    environment: 
53      - OLLAMA_BASE_URL=http://ollama:11434
54      - CHROMA_HOST=chromadb
55      - CHROMA_PORT=8000
56    depends_on: 
57      - ollama
58      - chromadb
59    profiles: 
60      - dev
61
62volumes: 
63  ollama_data: 
64  open_webui_data: 
65  chroma_data:

[$].env Template

[.env]

1# LLM Development Stack
2OLLAMA_PORT=11434
3WEBUI_PORT=3000
4CHROMA_PORT=8000
5LANGSERVE_PORT=8080
6
7# Security
8SECRET_KEY=your-secret-key

[i]Usage Notes

[1]Open WebUI at http://localhost:3000 (create account)
[2]Ollama API at http://localhost:11434
[3]ChromaDB at http://localhost:8000
[4]Pull models: docker exec ollama ollama pull llama2
[5]ChromaDB provides vector storage for RAG
[6]LangServe exposes LangChain apps as APIs

Individual Services(4 services)

Copy individual services to mix and match with your existing compose files.

ollama

ollama:
  image: ollama/ollama:latest
  container_name: ollama
  restart: unless-stopped
  ports:
    - ${OLLAMA_PORT:-11434}:11434
  volumes:
    - ollama_data:/root/.ollama
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities:
              - gpu

open-webui

open-webui:
  image: ghcr.io/open-webui/open-webui:main
  container_name: open-webui
  restart: unless-stopped
  ports:
    - ${WEBUI_PORT:-3000}:8080
  environment:
    - OLLAMA_BASE_URL=http://ollama:11434
    - WEBUI_SECRET_KEY=${SECRET_KEY}
  volumes:
    - open_webui_data:/app/backend/data
  depends_on:
    - ollama

chromadb

chromadb:
  image: chromadb/chroma:latest
  container_name: chromadb
  restart: unless-stopped
  ports:
    - ${CHROMA_PORT:-8000}:8000
  volumes:
    - chroma_data:/chroma/chroma
  environment:
    - IS_PERSISTENT=TRUE
    - ANONYMIZED_TELEMETRY=FALSE

langserve

langserve:
  build:
    context: ./langserve
    dockerfile: Dockerfile
  container_name: langserve
  restart: unless-stopped
  ports:
    - ${LANGSERVE_PORT:-8080}:8080
  environment:
    - OLLAMA_BASE_URL=http://ollama:11434
    - CHROMA_HOST=chromadb
    - CHROMA_PORT=8000
  depends_on:
    - ollama
    - chromadb
  profiles:
    - dev

[>]Quick Start

[terminal]

1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4  ollama:
5    image: ollama/ollama:latest
6    container_name: ollama
7    restart: unless-stopped
8    ports:
9      - "${OLLAMA_PORT:-11434}:11434"
10    volumes:
11      - ollama_data:/root/.ollama
12    deploy:
13      resources:
14        reservations:
15          devices:
16            - driver: nvidia
17              count: all
18              capabilities: [gpu]
19
20  open-webui:
21    image: ghcr.io/open-webui/open-webui:main
22    container_name: open-webui
23    restart: unless-stopped
24    ports:
25      - "${WEBUI_PORT:-3000}:8080"
26    environment:
27      - OLLAMA_BASE_URL=http://ollama:11434
28      - WEBUI_SECRET_KEY=${SECRET_KEY}
29    volumes:
30      - open_webui_data:/app/backend/data
31    depends_on:
32      - ollama
33
34  chromadb:
35    image: chromadb/chroma:latest
36    container_name: chromadb
37    restart: unless-stopped
38    ports:
39      - "${CHROMA_PORT:-8000}:8000"
40    volumes:
41      - chroma_data:/chroma/chroma
42    environment:
43      - IS_PERSISTENT=TRUE
44      - ANONYMIZED_TELEMETRY=FALSE
45
46  langserve:
47    build:
48      context: ./langserve
49      dockerfile: Dockerfile
50    container_name: langserve
51    restart: unless-stopped
52    ports:
53      - "${LANGSERVE_PORT:-8080}:8080"
54    environment:
55      - OLLAMA_BASE_URL=http://ollama:11434
56      - CHROMA_HOST=chromadb
57      - CHROMA_PORT=8000
58    depends_on:
59      - ollama
60      - chromadb
61    profiles:
62      - dev
63
64volumes:
65  ollama_data:
66  open_webui_data:
67  chroma_data:
68EOF
69
70# 2. Create the .env file
71cat > .env << 'EOF'
72# LLM Development Stack
73OLLAMA_PORT=11434
74WEBUI_PORT=3000
75CHROMA_PORT=8000
76LANGSERVE_PORT=8080
77
78# Security
79SECRET_KEY=your-secret-key
80EOF
81
82# 3. Start the services
83docker compose up -d
84
85# 4. View logs
86docker compose logs -f

[>]One-Liner

Run this command to download and set up the recipe in one step:

[terminal]

1curl -fsSL https://docker.recipes/api/recipes/llm-development-stack/run | bash

[?]Troubleshooting

[!]Ollama fails to start with GPU errors: Ensure NVIDIA container toolkit is installed and docker-compose includes GPU runtime configuration
[!]Open WebUI shows 'Cannot connect to Ollama': Verify Ollama container is running and accessible at http://ollama:11434 within Docker network
[!]ChromaDB connection refused: Check if port 8000 is available and container started successfully with persistent volume mounted
[!]Models fail to load with OOM errors: Reduce model size or increase Docker memory limits, consider using quantized versions
[!]LangServe API returns embedding errors: Ensure ChromaDB is running before starting LangServe container and environment variables are correctly set
[!]Chat responses extremely slow: Enable GPU acceleration in Ollama or switch to smaller quantized models for better performance

Community Notes

Loading notes...

## Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

## Components

ollamaopen-webuichromadblangserve

## Tags

#llm#ollama#rag#chromadb#langchain#ai

## Category

AI & Machine Learning

## Related

Shortcuts: C CopyF FavoriteD Download