LLM Development Stack
Local LLM development with Ollama, Open WebUI, ChromaDB vector store, and LangServe
Overview
Ollama transforms how developers work with large language models by providing a simple way to run models like Llama 2, Mistral, and Code Llama locally without complex setup or cloud dependencies. Rather than managing CUDA drivers, model weights, and inference servers separately, Ollama packages everything into a streamlined interface that handles model downloading, quantization, and serving through an OpenAI-compatible API. This eliminates the traditional barriers of LLM deployment while maintaining full control over data privacy and inference costs. This development stack combines Ollama's local model serving with Open WebUI's ChatGPT-like interface, ChromaDB's vector storage capabilities, and LangServe's API framework to create a complete RAG-enabled AI development environment. The integration allows developers to experiment with retrieval-augmented generation, build custom LLM applications, and prototype AI features without sending data to external services or managing complex model infrastructure. ChromaDB provides persistent vector embeddings for document retrieval, while LangServe exposes LangChain applications as production-ready APIs. AI startups, privacy-conscious enterprises, and developers building LLM-powered applications will find this stack invaluable for rapid prototyping and production deployment. The combination delivers the full ChatGPT experience locally while providing the vector storage and API infrastructure needed for sophisticated RAG applications, making it ideal for teams that need to iterate quickly on AI features without cloud dependencies or data privacy concerns.
Key Features
- Local LLM serving with Ollama's simple model management and GPU acceleration
- ChatGPT-like web interface through Open WebUI with chat history and model switching
- Persistent vector storage via ChromaDB for document embeddings and similarity search
- LangChain application serving through LangServe's FastAPI-based endpoints
- OpenAI-compatible API endpoints enabling drop-in replacement for existing applications
- Multi-model support allowing switching between Llama 2, Mistral, Code Llama and other models
- RAG document processing with automatic embedding generation and retrieval
- Multi-user support with individual chat histories and prompt template sharing
Common Use Cases
- 1AI startups prototyping RAG applications with proprietary documents and local model inference
- 2Enterprise teams building internal AI assistants without sending sensitive data to external APIs
- 3Developers creating LLM-powered applications that need offline functionality and data privacy
- 4Research teams experimenting with different models and RAG configurations for academic projects
- 5Companies migrating from OpenAI API to self-hosted solutions for cost control and compliance
- 6Development teams building chatbots that combine company knowledge bases with conversational AI
- 7Organizations creating internal documentation assistants using ChromaDB for vector search
Prerequisites
- Minimum 16GB RAM for running 7B parameter models, 32GB+ recommended for 13B models
- NVIDIA GPU with CUDA support for optimal inference performance (CPU fallback available)
- Available ports 3000, 8000, 8080, and 11434 for Open WebUI, ChromaDB, LangServe, and Ollama
- Docker Compose with GPU support configured for NVIDIA container runtime
- Basic understanding of LangChain concepts for customizing RAG pipelines
- Familiarity with vector databases and embedding models for ChromaDB configuration
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 ollama: 3 image: ollama/ollama:latest4 container_name: ollama5 restart: unless-stopped6 ports: 7 - "${OLLAMA_PORT:-11434}:11434"8 volumes: 9 - ollama_data:/root/.ollama10 deploy: 11 resources: 12 reservations: 13 devices: 14 - driver: nvidia15 count: all16 capabilities: [gpu]1718 open-webui: 19 image: ghcr.io/open-webui/open-webui:main20 container_name: open-webui21 restart: unless-stopped22 ports: 23 - "${WEBUI_PORT:-3000}:8080"24 environment: 25 - OLLAMA_BASE_URL=http://ollama:1143426 - WEBUI_SECRET_KEY=${SECRET_KEY}27 volumes: 28 - open_webui_data:/app/backend/data29 depends_on: 30 - ollama3132 chromadb: 33 image: chromadb/chroma:latest34 container_name: chromadb35 restart: unless-stopped36 ports: 37 - "${CHROMA_PORT:-8000}:8000"38 volumes: 39 - chroma_data:/chroma/chroma40 environment: 41 - IS_PERSISTENT=TRUE42 - ANONYMIZED_TELEMETRY=FALSE4344 langserve: 45 build: 46 context: ./langserve47 dockerfile: Dockerfile48 container_name: langserve49 restart: unless-stopped50 ports: 51 - "${LANGSERVE_PORT:-8080}:8080"52 environment: 53 - OLLAMA_BASE_URL=http://ollama:1143454 - CHROMA_HOST=chromadb55 - CHROMA_PORT=800056 depends_on: 57 - ollama58 - chromadb59 profiles: 60 - dev6162volumes: 63 ollama_data: 64 open_webui_data: 65 chroma_data: .env Template
.env
1# LLM Development Stack2OLLAMA_PORT=114343WEBUI_PORT=30004CHROMA_PORT=80005LANGSERVE_PORT=808067# Security8SECRET_KEY=your-secret-keyUsage Notes
- 1Open WebUI at http://localhost:3000 (create account)
- 2Ollama API at http://localhost:11434
- 3ChromaDB at http://localhost:8000
- 4Pull models: docker exec ollama ollama pull llama2
- 5ChromaDB provides vector storage for RAG
- 6LangServe exposes LangChain apps as APIs
Individual Services(4 services)
Copy individual services to mix and match with your existing compose files.
ollama
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- ${OLLAMA_PORT:-11434}:11434
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
open-webui
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- ${WEBUI_PORT:-3000}:8080
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_SECRET_KEY=${SECRET_KEY}
volumes:
- open_webui_data:/app/backend/data
depends_on:
- ollama
chromadb
chromadb:
image: chromadb/chroma:latest
container_name: chromadb
restart: unless-stopped
ports:
- ${CHROMA_PORT:-8000}:8000
volumes:
- chroma_data:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
- ANONYMIZED_TELEMETRY=FALSE
langserve
langserve:
build:
context: ./langserve
dockerfile: Dockerfile
container_name: langserve
restart: unless-stopped
ports:
- ${LANGSERVE_PORT:-8080}:8080
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- CHROMA_HOST=chromadb
- CHROMA_PORT=8000
depends_on:
- ollama
- chromadb
profiles:
- dev
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 ollama:5 image: ollama/ollama:latest6 container_name: ollama7 restart: unless-stopped8 ports:9 - "${OLLAMA_PORT:-11434}:11434"10 volumes:11 - ollama_data:/root/.ollama12 deploy:13 resources:14 reservations:15 devices:16 - driver: nvidia17 count: all18 capabilities: [gpu]1920 open-webui:21 image: ghcr.io/open-webui/open-webui:main22 container_name: open-webui23 restart: unless-stopped24 ports:25 - "${WEBUI_PORT:-3000}:8080"26 environment:27 - OLLAMA_BASE_URL=http://ollama:1143428 - WEBUI_SECRET_KEY=${SECRET_KEY}29 volumes:30 - open_webui_data:/app/backend/data31 depends_on:32 - ollama3334 chromadb:35 image: chromadb/chroma:latest36 container_name: chromadb37 restart: unless-stopped38 ports:39 - "${CHROMA_PORT:-8000}:8000"40 volumes:41 - chroma_data:/chroma/chroma42 environment:43 - IS_PERSISTENT=TRUE44 - ANONYMIZED_TELEMETRY=FALSE4546 langserve:47 build:48 context: ./langserve49 dockerfile: Dockerfile50 container_name: langserve51 restart: unless-stopped52 ports:53 - "${LANGSERVE_PORT:-8080}:8080"54 environment:55 - OLLAMA_BASE_URL=http://ollama:1143456 - CHROMA_HOST=chromadb57 - CHROMA_PORT=800058 depends_on:59 - ollama60 - chromadb61 profiles:62 - dev6364volumes:65 ollama_data:66 open_webui_data:67 chroma_data:68EOF6970# 2. Create the .env file71cat > .env << 'EOF'72# LLM Development Stack73OLLAMA_PORT=1143474WEBUI_PORT=300075CHROMA_PORT=800076LANGSERVE_PORT=80807778# Security79SECRET_KEY=your-secret-key80EOF8182# 3. Start the services83docker compose up -d8485# 4. View logs86docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/llm-development-stack/run | bashTroubleshooting
- Ollama fails to start with GPU errors: Ensure NVIDIA container toolkit is installed and docker-compose includes GPU runtime configuration
- Open WebUI shows 'Cannot connect to Ollama': Verify Ollama container is running and accessible at http://ollama:11434 within Docker network
- ChromaDB connection refused: Check if port 8000 is available and container started successfully with persistent volume mounted
- Models fail to load with OOM errors: Reduce model size or increase Docker memory limits, consider using quantized versions
- LangServe API returns embedding errors: Ensure ChromaDB is running before starting LangServe container and environment variables are correctly set
- Chat responses extremely slow: Enable GPU acceleration in Ollama or switch to smaller quantized models for better performance
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
ollamaopen-webuichromadblangserve
Tags
#llm#ollama#rag#chromadb#langchain#ai
Category
AI & Machine LearningAd Space
Shortcuts: C CopyF FavoriteD Download