Ollama
Run large language models locally.
Overview
Ollama is a groundbreaking tool that democratizes access to large language models by enabling local deployment and execution without relying on cloud APIs. Originally developed to address privacy concerns and API costs associated with commercial LLM services, Ollama provides a streamlined interface for downloading, managing, and running models like Llama 2, Mistral, and CodeLLama directly on your hardware. The platform handles complex model quantization and optimization automatically, making advanced AI accessible to developers and organizations regardless of their machine learning expertise.
This Docker configuration creates a self-contained LLM server that exposes an OpenAI-compatible API endpoint, allowing existing applications to switch from cloud-based models to local inference without code changes. Ollama manages model storage, context windows, and concurrent request handling while leveraging GPU acceleration when available. The containerized deployment ensures consistent performance across different environments while isolating model data and dependencies.
This setup is ideal for organizations prioritizing data privacy, developers building AI-powered applications, researchers experimenting with different models, and teams seeking to reduce ongoing API costs. Privacy-sensitive industries like healthcare and finance particularly benefit from keeping sensitive data on-premises, while developers appreciate the ability to work offline and iterate quickly without API rate limits or usage costs.
Key Features
- Simple model management with pull and run commands
- OpenAI-compatible API for drop-in replacement of cloud services
- Automatic GPU acceleration with CUDA support
- Model quantization for optimized memory usage
- Concurrent request handling for multi-user scenarios
- Context window management for long conversations
- Library of pre-built models including Llama 2, Mistral, and CodeLlama
- Modelfile support for custom model configurations
Common Use Cases
- 1Healthcare applications requiring HIPAA-compliant local AI processing
- 2Financial services needing on-premises data analysis and document processing
- 3Software development teams using AI for code completion and review
- 4Research institutions experimenting with different language models
- 5Edge computing deployments in remote locations without internet
- 6Prototype development for AI applications before scaling to cloud
- 7Educational environments teaching AI concepts with hands-on model interaction
Prerequisites
- Minimum 8GB RAM (16GB+ recommended for larger models)
- NVIDIA GPU with CUDA support for optimal performance
- NVIDIA Container Toolkit installed on host system
- Docker and Docker Compose installed
- Basic understanding of REST APIs and model concepts
- Sufficient disk space for model storage (models range from 2GB to 40GB+)
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 ollama: 3 image: ollama/ollama:latest4 container_name: ollama5 restart: unless-stopped6 volumes: 7 - ollama_data:/root/.ollama8 ports: 9 - "11434:11434"10 deploy: 11 resources: 12 reservations: 13 devices: 14 - driver: nvidia15 count: all16 capabilities: [gpu]1718volumes: 19 ollama_data: .env Template
.env
1# GPU support requires NVIDIA Container ToolkitUsage Notes
- 1Docs: https://ollama.ai/docs
- 2API at http://localhost:11434 - OpenAI-compatible format
- 3Pull models: docker exec ollama ollama pull llama2:7b
- 4Popular models: llama2, mistral, codellama, phi, gemma
- 5List models: docker exec ollama ollama list
- 6GPU acceleration requires NVIDIA Container Toolkit installed
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 ollama:5 image: ollama/ollama:latest6 container_name: ollama7 restart: unless-stopped8 volumes:9 - ollama_data:/root/.ollama10 ports:11 - "11434:11434"12 deploy:13 resources:14 reservations:15 devices:16 - driver: nvidia17 count: all18 capabilities: [gpu]1920volumes:21 ollama_data:22EOF2324# 2. Create the .env file25cat > .env << 'EOF'26# GPU support requires NVIDIA Container Toolkit27EOF2829# 3. Start the services30docker compose up -d3132# 4. View logs33docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/ollama/run | bashTroubleshooting
- CUDA out of memory errors: Use smaller models like phi:2.7b or enable model quantization
- Models not downloading: Check internet connectivity and run 'docker exec ollama ollama pull model-name'
- API returning empty responses: Verify model is loaded with 'docker exec ollama ollama list'
- Slow inference on CPU: Install NVIDIA Container Toolkit and ensure GPU passthrough is working
- Port 11434 already in use: Change the host port mapping in docker-compose.yml
- Container restart loops: Check available system memory and reduce concurrent model loading
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download