Chroma
Embedding database for AI applications.
Overview
Chroma is an open-source embedding database designed specifically for AI applications, particularly those involving large language models and retrieval-augmented generation (RAG) systems. Originally developed as a Python library, Chroma has evolved into a full-featured vector database that simplifies the storage, indexing, and retrieval of high-dimensional embeddings. It provides built-in support for popular embedding models from OpenAI, HuggingFace, and Cohere, making it particularly valuable for developers building AI-powered applications without the complexity of traditional vector databases. Chroma operates as a lightweight, persistent vector store that maintains embeddings alongside their associated metadata and documents. The database automatically handles the conversion of text into embeddings using your choice of embedding functions, stores them efficiently for similarity search, and provides fast retrieval capabilities essential for RAG workflows. Unlike heavyweight enterprise vector databases, Chroma focuses on simplicity and developer experience while maintaining production-ready performance. This deployment is ideal for AI engineers building chatbots, semantic search systems, or document analysis tools who need a reliable vector database without operational complexity. Startups developing AI products will appreciate Chroma's straightforward API and built-in integrations with LangChain and LlamaIndex, while research teams can leverage its flexibility for experimenting with different embedding models and retrieval strategies.
Key Features
- Built-in embedding functions for OpenAI, HuggingFace, Sentence Transformers, and Cohere models
- Automatic persistence with configurable storage directory for long-term data retention
- RESTful HTTP API for language-agnostic integration with any development stack
- Native Python client library with simple add(), query(), and get() methods
- Metadata filtering and hybrid search combining vector similarity with traditional filters
- Collection-based organization for managing multiple embedding datasets
- Automatic embedding generation from raw text using specified embedding models
- Built-in support for document chunking and text preprocessing workflows
Common Use Cases
- 1Building retrieval-augmented generation (RAG) systems for chatbots and AI assistants
- 2Creating semantic search engines for documentation, knowledge bases, or content libraries
- 3Developing recommendation systems based on content similarity and user preferences
- 4Implementing document analysis and similarity detection for legal or research applications
- 5Building AI-powered customer support systems with context-aware responses
- 6Creating content discovery platforms that surface related articles, papers, or media
- 7Developing code search and similarity tools for software development teams
Prerequisites
- Docker and Docker Compose installed with at least 2GB available memory
- Port 8000 available for Chroma's HTTP API endpoint
- Basic understanding of embeddings and vector similarity concepts
- Python environment with chromadb client library if using Python integration
- API keys for embedding providers (OpenAI, Cohere, HuggingFace) if using external models
- Sufficient disk space for persistent vector storage based on expected dataset size
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 chroma: 3 image: chromadb/chroma:latest4 container_name: chroma5 restart: unless-stopped6 environment: 7 IS_PERSISTENT: "true"8 PERSIST_DIRECTORY: /chroma/chroma9 ANONYMIZED_TELEMETRY: "false"10 volumes: 11 - chroma_data:/chroma/chroma12 ports: 13 - "8000:8000"1415volumes: 16 chroma_data: .env Template
.env
1# No additional config neededUsage Notes
- 1Docs: https://docs.trychroma.com/
- 2API at http://localhost:8000 - REST interface
- 3Python: pip install chromadb, then chromadb.HttpClient(host='localhost')
- 4Simple API: collection.add(), collection.query(), collection.get()
- 5Built-in embedding functions for OpenAI, HuggingFace, Cohere
- 6Perfect for RAG - integrates with LangChain and LlamaIndex
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 chroma:5 image: chromadb/chroma:latest6 container_name: chroma7 restart: unless-stopped8 environment:9 IS_PERSISTENT: "true"10 PERSIST_DIRECTORY: /chroma/chroma11 ANONYMIZED_TELEMETRY: "false"12 volumes:13 - chroma_data:/chroma/chroma14 ports:15 - "8000:8000"1617volumes:18 chroma_data:19EOF2021# 2. Create the .env file22cat > .env << 'EOF'23# No additional config needed24EOF2526# 3. Start the services27docker compose up -d2829# 4. View logs30docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/chroma/run | bashTroubleshooting
- Connection refused on port 8000: Ensure the container is fully started and check logs with docker logs chroma
- Out of memory errors during embedding: Reduce batch size when adding documents or increase Docker memory allocation
- Embedding function errors: Verify API keys are correctly set and embedding model names match supported providers
- Slow query performance: Check if persistence is properly configured and consider creating indexes for large collections
- Collection not found errors: Ensure collections are created before attempting to add or query documents
- Permission denied on /chroma/chroma directory: Verify Docker volume permissions and ensure the chroma_data volume is properly mounted
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download