Weaviate
Vector database with built-in ML models.
Overview
Weaviate is an open-source vector database that combines the power of traditional databases with modern machine learning capabilities. Developed by SeMI Technologies, Weaviate stores both objects and their vector representations, enabling semantic search and AI-powered applications. Unlike traditional databases that rely on exact matches, Weaviate understands the meaning and context of data through vector embeddings, making it possible to find similar concepts even when exact keywords don't match.
This Docker configuration deploys a standalone Weaviate instance with persistence enabled and anonymous access configured for development purposes. The setup exposes both REST and GraphQL APIs on port 8080, along with a high-performance gRPC interface on port 50051 for intensive operations. With the default vectorizer module set to 'none', you have complete control over how your data gets vectorized, whether through external services like OpenAI's API or self-hosted models.
This stack is ideal for developers building AI applications, researchers working with semantic search, and organizations looking to implement intelligent knowledge bases. The configuration balances simplicity with functionality, providing a solid Weaviate deployment that can handle everything from prototype development to production workloads. Data scientists and ML engineers will particularly appreciate Weaviate's ability to combine vector similarity search with traditional filtering and its support for multi-modal data types.
Key Features
- Built-in ML modules including text2vec-openai, text2vec-huggingface, and img2vec for automatic vectorization
- Hybrid search capabilities combining vector similarity with traditional keyword-based filtering
- Hierarchical Navigable Small World (HNSW) indexing for fast approximate nearest neighbor search
- GraphQL API with automatic schema generation and complex query support
- Multi-tenancy support allowing data isolation within a single Weaviate instance
- Real-time CRUD operations with immediate vector index updates
- Support for multi-modal data including text, images, and custom object types
- Modular architecture with pluggable vectorizer and reader modules
Common Use Cases
- 1Semantic search engines that understand context and meaning rather than just keywords
- 2Recommendation systems for e-commerce, content platforms, or product catalogs
- 3Knowledge bases and FAQ systems with intelligent question-answer matching
- 4Document similarity detection and duplicate content identification
- 5Multi-modal search applications combining text and image data
- 6AI-powered chatbots and virtual assistants with contextual understanding
- 7Research and academic projects requiring semantic analysis of large text corpora
Prerequisites
- Minimum 4GB RAM recommended for production use, 1GB sufficient for development
- Docker and Docker Compose installed with sufficient disk space for vector indices
- Port 8080 available for REST/GraphQL API access and port 50051 for gRPC
- Understanding of vector embeddings and semantic search concepts
- API keys for external vectorization services if using text2vec-openai or similar modules
- Basic knowledge of GraphQL or REST API interaction for data operations
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 weaviate: 3 image: semitechnologies/weaviate:latest4 container_name: weaviate5 restart: unless-stopped6 environment: 7 QUERY_DEFAULTS_LIMIT: 258 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"9 PERSISTENCE_DATA_PATH: /var/lib/weaviate10 DEFAULT_VECTORIZER_MODULE: none11 CLUSTER_HOSTNAME: node112 volumes: 13 - weaviate_data:/var/lib/weaviate14 ports: 15 - "8080:8080"16 - "50051:50051"1718volumes: 19 weaviate_data: .env Template
.env
1# Enable modules as neededUsage Notes
- 1Docs: https://weaviate.io/developers/weaviate
- 2REST API at http://localhost:8080/v1, GraphQL at /v1/graphql
- 3gRPC at localhost:50051 for high-performance operations
- 4Python client: pip install weaviate-client
- 5Built-in vectorizers: text2vec-openai, text2vec-huggingface, etc.
- 6Supports hybrid search (vector + keyword) and multi-tenancy
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 weaviate:5 image: semitechnologies/weaviate:latest6 container_name: weaviate7 restart: unless-stopped8 environment:9 QUERY_DEFAULTS_LIMIT: 2510 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"11 PERSISTENCE_DATA_PATH: /var/lib/weaviate12 DEFAULT_VECTORIZER_MODULE: none13 CLUSTER_HOSTNAME: node114 volumes:15 - weaviate_data:/var/lib/weaviate16 ports:17 - "8080:8080"18 - "50051:50051"1920volumes:21 weaviate_data:22EOF2324# 2. Create the .env file25cat > .env << 'EOF'26# Enable modules as needed27EOF2829# 3. Start the services30docker compose up -d3132# 4. View logs33docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/weaviate/run | bashTroubleshooting
- Out of memory errors during large imports: Increase Docker memory allocation and consider batch processing data uploads
- Slow query performance with large datasets: Check HNSW index parameters and consider increasing efConstruction and maxConnections
- Module loading failures: Verify vectorizer module configuration and ensure required API keys are properly set
- Connection refused on gRPC port: Ensure port 50051 is properly exposed and not blocked by firewall rules
- Schema creation errors: Validate property types and ensure vectorizer module compatibility with your data structure
- High memory usage during indexing: Monitor HNSW index size and consider using ef parameter tuning for memory optimization
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download