Trino Data Lake Query Engine
Trino distributed SQL with MinIO object storage.
Overview
Trino is a distributed SQL query engine originally developed at Facebook (as Presto) and designed to run interactive analytical queries against data sources of all sizes. It excels at federating queries across multiple data sources, from gigabytes to petabytes, enabling organizations to query data where it lives without moving it. Trino's architecture separates compute from storage, making it ideal for cloud-native data lake architectures where data is stored in object storage systems.
This stack combines Trino with MinIO object storage, PostgreSQL-backed Hive Metastore, creating a complete data lake query engine. MinIO provides S3-compatible object storage for your data files (Parquet, ORC, JSON), while the Hive Metastore running on PostgreSQL maintains schema information and table definitions. Trino acts as the query coordinator, enabling SQL queries across data stored in MinIO while leveraging metadata from the Hive Metastore for schema-on-read operations.
Data engineers building analytics platforms, organizations migrating from traditional data warehouses, and companies implementing lakehouse architectures will find this stack particularly valuable. The combination provides enterprise-grade SQL capabilities over object storage at a fraction of the cost of cloud data warehouses, while maintaining compatibility with popular data formats and BI tools that support standard JDBC/ODBC connections.
Key Features
- Distributed SQL query engine supporting ANSI SQL with complex joins, window functions, and aggregations across petabyte-scale datasets
- S3-compatible object storage with MinIO providing high-performance data lake foundation with erasure coding and encryption
- Hive Metastore integration enabling schema-on-read for Parquet, ORC, Avro, and JSON file formats
- Query federation capabilities allowing joins between data in MinIO and external sources like MySQL, PostgreSQL, or Elasticsearch
- Connector architecture supporting 50+ data sources including Kafka, MongoDB, Cassandra, and cloud storage systems
- Cost-based query optimizer with predicate pushdown and partition pruning for optimal performance
- JDBC/ODBC connectivity enabling integration with popular BI tools like Tableau, Power BI, and Superset
- Fault-tolerant query execution with automatic retry and cluster coordinator failover capabilities
Common Use Cases
- 1Building self-hosted data lakes for analytics teams requiring SQL access to object storage data
- 2Migrating from expensive cloud data warehouses while maintaining SQL compatibility and BI tool integration
- 3Creating development and testing environments for big data analytics without cloud storage costs
- 4Implementing lakehouse architectures for machine learning feature stores and model training datasets
- 5Running ad-hoc analytical queries across large datasets stored in cost-effective object storage
- 6Federating queries between operational databases and analytical data stored in data lake formats
- 7Prototyping data mesh architectures with decentralized data ownership and SQL query capabilities
Prerequisites
- Minimum 4GB RAM available (Trino coordinator requires 2GB+, MinIO needs 512MB+, PostgreSQL needs 256MB+)
- Docker and Docker Compose with at least 10GB free disk space for data and metadata storage
- Understanding of SQL and data lake concepts including Parquet/ORC file formats and partitioning strategies
- Network access to ports 8080 (Trino), 9000 (MinIO API), and 9001 (MinIO Console)
- Basic knowledge of Hive Metastore concepts and table schemas for data catalog management
- Familiarity with S3 API concepts for bucket and object management in MinIO
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 trino: 3 image: trinodb/trino:latest4 container_name: trino5 restart: unless-stopped6 ports: 7 - "${TRINO_PORT:-8080}:8080"8 volumes: 9 - ./trino/catalog:/etc/trino/catalog:ro1011 metastore-db: 12 image: postgres:15-alpine13 container_name: metastore-db14 restart: unless-stopped15 environment: 16 - POSTGRES_USER=hive17 - POSTGRES_PASSWORD=${METASTORE_DB_PASSWORD}18 - POSTGRES_DB=metastore19 volumes: 20 - metastore_db_data:/var/lib/postgresql/data2122 minio: 23 image: minio/minio:latest24 container_name: trino-minio25 restart: unless-stopped26 ports: 27 - "${MINIO_PORT:-9000}:9000"28 - "${MINIO_CONSOLE:-9001}:9001"29 environment: 30 - MINIO_ROOT_USER=${MINIO_USER}31 - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}32 volumes: 33 - minio_data:/data34 command: server /data --console-address ":9001"3536volumes: 37 metastore_db_data: 38 minio_data: .env Template
.env
1# Trino Data Lake2TRINO_PORT=80803METASTORE_DB_PASSWORD=hive_password4MINIO_PORT=90005MINIO_CONSOLE=90016MINIO_USER=minioadmin7MINIO_PASSWORD=minioadminUsage Notes
- 1Trino UI at http://localhost:8080
- 2MinIO Console at http://localhost:9001
- 3Configure catalogs in /etc/trino/catalog
- 4Create buckets for data lake storage
Individual Services(3 services)
Copy individual services to mix and match with your existing compose files.
trino
trino:
image: trinodb/trino:latest
container_name: trino
restart: unless-stopped
ports:
- ${TRINO_PORT:-8080}:8080
volumes:
- ./trino/catalog:/etc/trino/catalog:ro
metastore-db
metastore-db:
image: postgres:15-alpine
container_name: metastore-db
restart: unless-stopped
environment:
- POSTGRES_USER=hive
- POSTGRES_PASSWORD=${METASTORE_DB_PASSWORD}
- POSTGRES_DB=metastore
volumes:
- metastore_db_data:/var/lib/postgresql/data
minio
minio:
image: minio/minio:latest
container_name: trino-minio
restart: unless-stopped
ports:
- ${MINIO_PORT:-9000}:9000
- ${MINIO_CONSOLE:-9001}:9001
environment:
- MINIO_ROOT_USER=${MINIO_USER}
- MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
volumes:
- minio_data:/data
command: server /data --console-address ":9001"
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 trino:5 image: trinodb/trino:latest6 container_name: trino7 restart: unless-stopped8 ports:9 - "${TRINO_PORT:-8080}:8080"10 volumes:11 - ./trino/catalog:/etc/trino/catalog:ro1213 metastore-db:14 image: postgres:15-alpine15 container_name: metastore-db16 restart: unless-stopped17 environment:18 - POSTGRES_USER=hive19 - POSTGRES_PASSWORD=${METASTORE_DB_PASSWORD}20 - POSTGRES_DB=metastore21 volumes:22 - metastore_db_data:/var/lib/postgresql/data2324 minio:25 image: minio/minio:latest26 container_name: trino-minio27 restart: unless-stopped28 ports:29 - "${MINIO_PORT:-9000}:9000"30 - "${MINIO_CONSOLE:-9001}:9001"31 environment:32 - MINIO_ROOT_USER=${MINIO_USER}33 - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}34 volumes:35 - minio_data:/data36 command: server /data --console-address ":9001"3738volumes:39 metastore_db_data:40 minio_data:41EOF4243# 2. Create the .env file44cat > .env << 'EOF'45# Trino Data Lake46TRINO_PORT=808047METASTORE_DB_PASSWORD=hive_password48MINIO_PORT=900049MINIO_CONSOLE=900150MINIO_USER=minioadmin51MINIO_PASSWORD=minioadmin52EOF5354# 3. Start the services55docker compose up -d5657# 4. View logs58docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/trino-data-lake/run | bashTroubleshooting
- Trino query fails with 'Catalog not found': Verify catalog configuration files are mounted in /etc/trino/catalog and contain correct MinIO/Hive connection details
- MinIO connection refused errors: Check MINIO_ROOT_USER and MINIO_ROOT_PASSWORD environment variables match Trino catalog configuration credentials
- Hive Metastore connection timeouts: Ensure PostgreSQL metastore-db container is healthy and METASTORE_DB_PASSWORD is set correctly
- Out of memory errors during large queries: Increase Docker memory limits and configure Trino JVM heap size in jvm.config
- Table not found errors despite data in MinIO: Verify Hive Metastore contains table definitions and partition information matches actual file structure in MinIO buckets
- Permission denied accessing MinIO buckets: Check bucket policies and ensure Trino's MinIO credentials have read/write access to required buckets
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download