Ollama

beginner

Run large language models locally.

[i]Overview

Ollama is a groundbreaking tool that democratizes access to large language models by enabling local deployment and execution without relying on cloud APIs. Originally developed to address privacy concerns and API costs associated with commercial LLM services, Ollama provides a streamlined interface for downloading, managing, and running models like Llama 2, Mistral, and CodeLLama directly on your hardware. The platform handles complex model quantization and optimization automatically, making advanced AI accessible to developers and organizations regardless of their machine learning expertise. This Docker configuration creates a self-contained LLM server that exposes an OpenAI-compatible API endpoint, allowing existing applications to switch from cloud-based models to local inference without code changes. Ollama manages model storage, context windows, and concurrent request handling while leveraging GPU acceleration when available. The containerized deployment ensures consistent performance across different environments while isolating model data and dependencies. This setup is ideal for organizations prioritizing data privacy, developers building AI-powered applications, researchers experimenting with different models, and teams seeking to reduce ongoing API costs. Privacy-sensitive industries like healthcare and finance particularly benefit from keeping sensitive data on-premises, while developers appreciate the ability to work offline and iterate quickly without API rate limits or usage costs.

[*]Key Features

[+]Simple model management with pull and run commands
[+]OpenAI-compatible API for drop-in replacement of cloud services
[+]Automatic GPU acceleration with CUDA support
[+]Model quantization for optimized memory usage
[+]Concurrent request handling for multi-user scenarios
[+]Context window management for long conversations
[+]Library of pre-built models including Llama 2, Mistral, and CodeLlama
[+]Modelfile support for custom model configurations

[#]Common Use Cases

[1]Healthcare applications requiring HIPAA-compliant local AI processing
[2]Financial services needing on-premises data analysis and document processing
[3]Software development teams using AI for code completion and review
[4]Research institutions experimenting with different language models
[5]Edge computing deployments in remote locations without internet
[6]Prototype development for AI applications before scaling to cloud
[7]Educational environments teaching AI concepts with hands-on model interaction

[!]Prerequisites

[!]Minimum 8GB RAM (16GB+ recommended for larger models)
[!]NVIDIA GPU with CUDA support for optimal performance
[!]NVIDIA Container Toolkit installed on host system
[!]Docker and Docker Compose installed
[!]Basic understanding of REST APIs and model concepts
[!]Sufficient disk space for model storage (models range from 2GB to 40GB+)

[!]

WARNING: For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

[$]docker-compose.yml

[docker-compose.yml]

1services: 
2  ollama: 
3    image: ollama/ollama:latest
4    container_name: ollama
5    restart: unless-stopped
6    volumes: 
7      - ollama_data:/root/.ollama
8    ports: 
9      - "11434:11434"
10    deploy: 
11      resources: 
12        reservations: 
13          devices: 
14            - driver: nvidia
15              count: all
16              capabilities: [gpu]
17
18volumes: 
19  ollama_data:

[$].env Template

[.env]

1# GPU support requires NVIDIA Container Toolkit

[i]Usage Notes

[1]Docs: https://ollama.ai/docs
[2]API at http://localhost:11434 - OpenAI-compatible format
[3]Pull models: docker exec ollama ollama pull llama2:7b
[4]Popular models: llama2, mistral, codellama, phi, gemma
[5]List models: docker exec ollama ollama list
[6]GPU acceleration requires NVIDIA Container Toolkit installed

[>]Quick Start

[terminal]

1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4  ollama:
5    image: ollama/ollama:latest
6    container_name: ollama
7    restart: unless-stopped
8    volumes:
9      - ollama_data:/root/.ollama
10    ports:
11      - "11434:11434"
12    deploy:
13      resources:
14        reservations:
15          devices:
16            - driver: nvidia
17              count: all
18              capabilities: [gpu]
19
20volumes:
21  ollama_data:
22EOF
23
24# 2. Create the .env file
25cat > .env << 'EOF'
26# GPU support requires NVIDIA Container Toolkit
27EOF
28
29# 3. Start the services
30docker compose up -d
31
32# 4. View logs
33docker compose logs -f

[>]One-Liner

Run this command to download and set up the recipe in one step:

[terminal]

1curl -fsSL https://docker.recipes/api/recipes/ollama/run | bash

[?]Troubleshooting

[!]CUDA out of memory errors: Use smaller models like phi:2.7b or enable model quantization
[!]Models not downloading: Check internet connectivity and run 'docker exec ollama ollama pull model-name'
[!]API returning empty responses: Verify model is loaded with 'docker exec ollama ollama list'
[!]Slow inference on CPU: Install NVIDIA Container Toolkit and ensure GPU passthrough is working
[!]Port 11434 already in use: Change the host port mapping in docker-compose.yml
[!]Container restart loops: Check available system memory and reduce concurrent model loading

Community Notes

Loading notes...

## Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

## Components

ollama

## Tags

#ollama#llm#ai#local

## Category

AI & Machine Learning

## Related

Shortcuts: C CopyF FavoriteD Download