Weaviate

intermediate

Vector database with built-in ML models.

[i]Overview

Weaviate is an open-source vector database that combines the power of traditional databases with modern machine learning capabilities. Developed by SeMI Technologies, Weaviate stores both objects and their vector representations, enabling semantic search and AI-powered applications. Unlike traditional databases that rely on exact matches, Weaviate understands the meaning and context of data through vector embeddings, making it possible to find similar concepts even when exact keywords don't match. This Docker configuration deploys a standalone Weaviate instance with persistence enabled and anonymous access configured for development purposes. The setup exposes both REST and GraphQL APIs on port 8080, along with a high-performance gRPC interface on port 50051 for intensive operations. With the default vectorizer module set to 'none', you have complete control over how your data gets vectorized, whether through external services like OpenAI's API or self-hosted models. This stack is ideal for developers building AI applications, researchers working with semantic search, and organizations looking to implement intelligent knowledge bases. The configuration balances simplicity with functionality, providing a solid Weaviate deployment that can handle everything from prototype development to production workloads. Data scientists and ML engineers will particularly appreciate Weaviate's ability to combine vector similarity search with traditional filtering and its support for multi-modal data types.

[*]Key Features

[+]Built-in ML modules including text2vec-openai, text2vec-huggingface, and img2vec for automatic vectorization
[+]Hybrid search capabilities combining vector similarity with traditional keyword-based filtering
[+]Hierarchical Navigable Small World (HNSW) indexing for fast approximate nearest neighbor search
[+]GraphQL API with automatic schema generation and complex query support
[+]Multi-tenancy support allowing data isolation within a single Weaviate instance
[+]Real-time CRUD operations with immediate vector index updates
[+]Support for multi-modal data including text, images, and custom object types
[+]Modular architecture with pluggable vectorizer and reader modules

[#]Common Use Cases

[1]Semantic search engines that understand context and meaning rather than just keywords
[2]Recommendation systems for e-commerce, content platforms, or product catalogs
[3]Knowledge bases and FAQ systems with intelligent question-answer matching
[4]Document similarity detection and duplicate content identification
[5]Multi-modal search applications combining text and image data
[6]AI-powered chatbots and virtual assistants with contextual understanding
[7]Research and academic projects requiring semantic analysis of large text corpora

[!]Prerequisites

[!]Minimum 4GB RAM recommended for production use, 1GB sufficient for development
[!]Docker and Docker Compose installed with sufficient disk space for vector indices
[!]Port 8080 available for REST/GraphQL API access and port 50051 for gRPC
[!]Understanding of vector embeddings and semantic search concepts
[!]API keys for external vectorization services if using text2vec-openai or similar modules
[!]Basic knowledge of GraphQL or REST API interaction for data operations

[!]

WARNING: For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

[$]docker-compose.yml

[docker-compose.yml]

1services: 
2  weaviate: 
3    image: semitechnologies/weaviate:latest
4    container_name: weaviate
5    restart: unless-stopped
6    environment: 
7      QUERY_DEFAULTS_LIMIT: 25
8      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
9      PERSISTENCE_DATA_PATH: /var/lib/weaviate
10      DEFAULT_VECTORIZER_MODULE: none
11      CLUSTER_HOSTNAME: node1
12    volumes: 
13      - weaviate_data:/var/lib/weaviate
14    ports: 
15      - "8080:8080"
16      - "50051:50051"
17
18volumes: 
19  weaviate_data:

[$].env Template

[.env]

1# Enable modules as needed

[i]Usage Notes

[1]Docs: https://weaviate.io/developers/weaviate
[2]REST API at http://localhost:8080/v1, GraphQL at /v1/graphql
[3]gRPC at localhost:50051 for high-performance operations
[4]Python client: pip install weaviate-client
[5]Built-in vectorizers: text2vec-openai, text2vec-huggingface, etc.
[6]Supports hybrid search (vector + keyword) and multi-tenancy

[>]Quick Start

[terminal]

1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4  weaviate:
5    image: semitechnologies/weaviate:latest
6    container_name: weaviate
7    restart: unless-stopped
8    environment:
9      QUERY_DEFAULTS_LIMIT: 25
10      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
11      PERSISTENCE_DATA_PATH: /var/lib/weaviate
12      DEFAULT_VECTORIZER_MODULE: none
13      CLUSTER_HOSTNAME: node1
14    volumes:
15      - weaviate_data:/var/lib/weaviate
16    ports:
17      - "8080:8080"
18      - "50051:50051"
19
20volumes:
21  weaviate_data:
22EOF
23
24# 2. Create the .env file
25cat > .env << 'EOF'
26# Enable modules as needed
27EOF
28
29# 3. Start the services
30docker compose up -d
31
32# 4. View logs
33docker compose logs -f

[>]One-Liner

Run this command to download and set up the recipe in one step:

[terminal]

1curl -fsSL https://docker.recipes/api/recipes/weaviate/run | bash

[?]Troubleshooting

[!]Out of memory errors during large imports: Increase Docker memory allocation and consider batch processing data uploads
[!]Slow query performance with large datasets: Check HNSW index parameters and consider increasing efConstruction and maxConnections
[!]Module loading failures: Verify vectorizer module configuration and ensure required API keys are properly set
[!]Connection refused on gRPC port: Ensure port 50051 is properly exposed and not blocked by firewall rules
[!]Schema creation errors: Validate property types and ensure vectorizer module compatibility with your data structure
[!]High memory usage during indexing: Monitor HNSW index size and consider using ef parameter tuning for memory optimization

Community Notes

Loading notes...

## Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

## Components

weaviate

## Tags

#weaviate#vector-db#semantic-search#ml

## Category

AI & Machine Learning

## Related

Shortcuts: C CopyF FavoriteD Download