docker.recipes

Ollama

beginner

Run large language models locally.

Overview

Ollama is a groundbreaking tool that democratizes access to large language models by enabling local deployment and execution without relying on cloud APIs. Originally developed to address privacy concerns and API costs associated with commercial LLM services, Ollama provides a streamlined interface for downloading, managing, and running models like Llama 2, Mistral, and CodeLLama directly on your hardware. The platform handles complex model quantization and optimization automatically, making advanced AI accessible to developers and organizations regardless of their machine learning expertise. This Docker configuration creates a self-contained LLM server that exposes an OpenAI-compatible API endpoint, allowing existing applications to switch from cloud-based models to local inference without code changes. Ollama manages model storage, context windows, and concurrent request handling while leveraging GPU acceleration when available. The containerized deployment ensures consistent performance across different environments while isolating model data and dependencies. This setup is ideal for organizations prioritizing data privacy, developers building AI-powered applications, researchers experimenting with different models, and teams seeking to reduce ongoing API costs. Privacy-sensitive industries like healthcare and finance particularly benefit from keeping sensitive data on-premises, while developers appreciate the ability to work offline and iterate quickly without API rate limits or usage costs.

Key Features

  • Simple model management with pull and run commands
  • OpenAI-compatible API for drop-in replacement of cloud services
  • Automatic GPU acceleration with CUDA support
  • Model quantization for optimized memory usage
  • Concurrent request handling for multi-user scenarios
  • Context window management for long conversations
  • Library of pre-built models including Llama 2, Mistral, and CodeLlama
  • Modelfile support for custom model configurations

Common Use Cases

  • 1Healthcare applications requiring HIPAA-compliant local AI processing
  • 2Financial services needing on-premises data analysis and document processing
  • 3Software development teams using AI for code completion and review
  • 4Research institutions experimenting with different language models
  • 5Edge computing deployments in remote locations without internet
  • 6Prototype development for AI applications before scaling to cloud
  • 7Educational environments teaching AI concepts with hands-on model interaction

Prerequisites

  • Minimum 8GB RAM (16GB+ recommended for larger models)
  • NVIDIA GPU with CUDA support for optimal performance
  • NVIDIA Container Toolkit installed on host system
  • Docker and Docker Compose installed
  • Basic understanding of REST APIs and model concepts
  • Sufficient disk space for model storage (models range from 2GB to 40GB+)

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 ollama:
3 image: ollama/ollama:latest
4 container_name: ollama
5 restart: unless-stopped
6 volumes:
7 - ollama_data:/root/.ollama
8 ports:
9 - "11434:11434"
10 deploy:
11 resources:
12 reservations:
13 devices:
14 - driver: nvidia
15 count: all
16 capabilities: [gpu]
17
18volumes:
19 ollama_data:

.env Template

.env
1# GPU support requires NVIDIA Container Toolkit

Usage Notes

  1. 1Docs: https://ollama.ai/docs
  2. 2API at http://localhost:11434 - OpenAI-compatible format
  3. 3Pull models: docker exec ollama ollama pull llama2:7b
  4. 4Popular models: llama2, mistral, codellama, phi, gemma
  5. 5List models: docker exec ollama ollama list
  6. 6GPU acceleration requires NVIDIA Container Toolkit installed

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 ollama:
5 image: ollama/ollama:latest
6 container_name: ollama
7 restart: unless-stopped
8 volumes:
9 - ollama_data:/root/.ollama
10 ports:
11 - "11434:11434"
12 deploy:
13 resources:
14 reservations:
15 devices:
16 - driver: nvidia
17 count: all
18 capabilities: [gpu]
19
20volumes:
21 ollama_data:
22EOF
23
24# 2. Create the .env file
25cat > .env << 'EOF'
26# GPU support requires NVIDIA Container Toolkit
27EOF
28
29# 3. Start the services
30docker compose up -d
31
32# 4. View logs
33docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/ollama/run | bash

Troubleshooting

  • CUDA out of memory errors: Use smaller models like phi:2.7b or enable model quantization
  • Models not downloading: Check internet connectivity and run 'docker exec ollama ollama pull model-name'
  • API returning empty responses: Verify model is loaded with 'docker exec ollama ollama list'
  • Slow inference on CPU: Install NVIDIA Container Toolkit and ensure GPU passthrough is working
  • Port 11434 already in use: Change the host port mapping in docker-compose.yml
  • Container restart loops: Check available system memory and reduce concurrent model loading

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space