docker.recipes

Chroma

beginner

Embedding database for AI applications.

Overview

Chroma is an open-source embedding database designed specifically for AI applications, particularly those involving large language models and retrieval-augmented generation (RAG) systems. Originally developed as a Python library, Chroma has evolved into a full-featured vector database that simplifies the storage, indexing, and retrieval of high-dimensional embeddings. It provides built-in support for popular embedding models from OpenAI, HuggingFace, and Cohere, making it particularly valuable for developers building AI-powered applications without the complexity of traditional vector databases. Chroma operates as a lightweight, persistent vector store that maintains embeddings alongside their associated metadata and documents. The database automatically handles the conversion of text into embeddings using your choice of embedding functions, stores them efficiently for similarity search, and provides fast retrieval capabilities essential for RAG workflows. Unlike heavyweight enterprise vector databases, Chroma focuses on simplicity and developer experience while maintaining production-ready performance. This deployment is ideal for AI engineers building chatbots, semantic search systems, or document analysis tools who need a reliable vector database without operational complexity. Startups developing AI products will appreciate Chroma's straightforward API and built-in integrations with LangChain and LlamaIndex, while research teams can leverage its flexibility for experimenting with different embedding models and retrieval strategies.

Key Features

  • Built-in embedding functions for OpenAI, HuggingFace, Sentence Transformers, and Cohere models
  • Automatic persistence with configurable storage directory for long-term data retention
  • RESTful HTTP API for language-agnostic integration with any development stack
  • Native Python client library with simple add(), query(), and get() methods
  • Metadata filtering and hybrid search combining vector similarity with traditional filters
  • Collection-based organization for managing multiple embedding datasets
  • Automatic embedding generation from raw text using specified embedding models
  • Built-in support for document chunking and text preprocessing workflows

Common Use Cases

  • 1Building retrieval-augmented generation (RAG) systems for chatbots and AI assistants
  • 2Creating semantic search engines for documentation, knowledge bases, or content libraries
  • 3Developing recommendation systems based on content similarity and user preferences
  • 4Implementing document analysis and similarity detection for legal or research applications
  • 5Building AI-powered customer support systems with context-aware responses
  • 6Creating content discovery platforms that surface related articles, papers, or media
  • 7Developing code search and similarity tools for software development teams

Prerequisites

  • Docker and Docker Compose installed with at least 2GB available memory
  • Port 8000 available for Chroma's HTTP API endpoint
  • Basic understanding of embeddings and vector similarity concepts
  • Python environment with chromadb client library if using Python integration
  • API keys for embedding providers (OpenAI, Cohere, HuggingFace) if using external models
  • Sufficient disk space for persistent vector storage based on expected dataset size

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 chroma:
3 image: chromadb/chroma:latest
4 container_name: chroma
5 restart: unless-stopped
6 environment:
7 IS_PERSISTENT: "true"
8 PERSIST_DIRECTORY: /chroma/chroma
9 ANONYMIZED_TELEMETRY: "false"
10 volumes:
11 - chroma_data:/chroma/chroma
12 ports:
13 - "8000:8000"
14
15volumes:
16 chroma_data:

.env Template

.env
1# No additional config needed

Usage Notes

  1. 1Docs: https://docs.trychroma.com/
  2. 2API at http://localhost:8000 - REST interface
  3. 3Python: pip install chromadb, then chromadb.HttpClient(host='localhost')
  4. 4Simple API: collection.add(), collection.query(), collection.get()
  5. 5Built-in embedding functions for OpenAI, HuggingFace, Cohere
  6. 6Perfect for RAG - integrates with LangChain and LlamaIndex

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 chroma:
5 image: chromadb/chroma:latest
6 container_name: chroma
7 restart: unless-stopped
8 environment:
9 IS_PERSISTENT: "true"
10 PERSIST_DIRECTORY: /chroma/chroma
11 ANONYMIZED_TELEMETRY: "false"
12 volumes:
13 - chroma_data:/chroma/chroma
14 ports:
15 - "8000:8000"
16
17volumes:
18 chroma_data:
19EOF
20
21# 2. Create the .env file
22cat > .env << 'EOF'
23# No additional config needed
24EOF
25
26# 3. Start the services
27docker compose up -d
28
29# 4. View logs
30docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/chroma/run | bash

Troubleshooting

  • Connection refused on port 8000: Ensure the container is fully started and check logs with docker logs chroma
  • Out of memory errors during embedding: Reduce batch size when adding documents or increase Docker memory allocation
  • Embedding function errors: Verify API keys are correctly set and embedding model names match supported providers
  • Slow query performance: Check if persistence is properly configured and consider creating indexes for large collections
  • Collection not found errors: Ensure collections are created before attempting to add or query documents
  • Permission denied on /chroma/chroma directory: Verify Docker volume permissions and ensure the chroma_data volume is properly mounted

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space