docker.recipes

Weaviate

intermediate

Vector database with built-in ML models.

Overview

Weaviate is an open-source vector database that combines the power of traditional databases with modern machine learning capabilities. Developed by SeMI Technologies, Weaviate stores both objects and their vector representations, enabling semantic search and AI-powered applications. Unlike traditional databases that rely on exact matches, Weaviate understands the meaning and context of data through vector embeddings, making it possible to find similar concepts even when exact keywords don't match. This Docker configuration deploys a standalone Weaviate instance with persistence enabled and anonymous access configured for development purposes. The setup exposes both REST and GraphQL APIs on port 8080, along with a high-performance gRPC interface on port 50051 for intensive operations. With the default vectorizer module set to 'none', you have complete control over how your data gets vectorized, whether through external services like OpenAI's API or self-hosted models. This stack is ideal for developers building AI applications, researchers working with semantic search, and organizations looking to implement intelligent knowledge bases. The configuration balances simplicity with functionality, providing a solid Weaviate deployment that can handle everything from prototype development to production workloads. Data scientists and ML engineers will particularly appreciate Weaviate's ability to combine vector similarity search with traditional filtering and its support for multi-modal data types.

Key Features

  • Built-in ML modules including text2vec-openai, text2vec-huggingface, and img2vec for automatic vectorization
  • Hybrid search capabilities combining vector similarity with traditional keyword-based filtering
  • Hierarchical Navigable Small World (HNSW) indexing for fast approximate nearest neighbor search
  • GraphQL API with automatic schema generation and complex query support
  • Multi-tenancy support allowing data isolation within a single Weaviate instance
  • Real-time CRUD operations with immediate vector index updates
  • Support for multi-modal data including text, images, and custom object types
  • Modular architecture with pluggable vectorizer and reader modules

Common Use Cases

  • 1Semantic search engines that understand context and meaning rather than just keywords
  • 2Recommendation systems for e-commerce, content platforms, or product catalogs
  • 3Knowledge bases and FAQ systems with intelligent question-answer matching
  • 4Document similarity detection and duplicate content identification
  • 5Multi-modal search applications combining text and image data
  • 6AI-powered chatbots and virtual assistants with contextual understanding
  • 7Research and academic projects requiring semantic analysis of large text corpora

Prerequisites

  • Minimum 4GB RAM recommended for production use, 1GB sufficient for development
  • Docker and Docker Compose installed with sufficient disk space for vector indices
  • Port 8080 available for REST/GraphQL API access and port 50051 for gRPC
  • Understanding of vector embeddings and semantic search concepts
  • API keys for external vectorization services if using text2vec-openai or similar modules
  • Basic knowledge of GraphQL or REST API interaction for data operations

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 weaviate:
3 image: semitechnologies/weaviate:latest
4 container_name: weaviate
5 restart: unless-stopped
6 environment:
7 QUERY_DEFAULTS_LIMIT: 25
8 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
9 PERSISTENCE_DATA_PATH: /var/lib/weaviate
10 DEFAULT_VECTORIZER_MODULE: none
11 CLUSTER_HOSTNAME: node1
12 volumes:
13 - weaviate_data:/var/lib/weaviate
14 ports:
15 - "8080:8080"
16 - "50051:50051"
17
18volumes:
19 weaviate_data:

.env Template

.env
1# Enable modules as needed

Usage Notes

  1. 1Docs: https://weaviate.io/developers/weaviate
  2. 2REST API at http://localhost:8080/v1, GraphQL at /v1/graphql
  3. 3gRPC at localhost:50051 for high-performance operations
  4. 4Python client: pip install weaviate-client
  5. 5Built-in vectorizers: text2vec-openai, text2vec-huggingface, etc.
  6. 6Supports hybrid search (vector + keyword) and multi-tenancy

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 weaviate:
5 image: semitechnologies/weaviate:latest
6 container_name: weaviate
7 restart: unless-stopped
8 environment:
9 QUERY_DEFAULTS_LIMIT: 25
10 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
11 PERSISTENCE_DATA_PATH: /var/lib/weaviate
12 DEFAULT_VECTORIZER_MODULE: none
13 CLUSTER_HOSTNAME: node1
14 volumes:
15 - weaviate_data:/var/lib/weaviate
16 ports:
17 - "8080:8080"
18 - "50051:50051"
19
20volumes:
21 weaviate_data:
22EOF
23
24# 2. Create the .env file
25cat > .env << 'EOF'
26# Enable modules as needed
27EOF
28
29# 3. Start the services
30docker compose up -d
31
32# 4. View logs
33docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/weaviate/run | bash

Troubleshooting

  • Out of memory errors during large imports: Increase Docker memory allocation and consider batch processing data uploads
  • Slow query performance with large datasets: Check HNSW index parameters and consider increasing efConstruction and maxConnections
  • Module loading failures: Verify vectorizer module configuration and ensure required API keys are properly set
  • Connection refused on gRPC port: Ensure port 50051 is properly exposed and not blocked by firewall rules
  • Schema creation errors: Validate property types and ensure vectorizer module compatibility with your data structure
  • High memory usage during indexing: Monitor HNSW index size and consider using ef parameter tuning for memory optimization

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space