docker.recipes

Trino Data Lake Query Engine

advanced

Trino distributed SQL with MinIO object storage.

Overview

Trino is a distributed SQL query engine originally developed at Facebook (as Presto) and designed to run interactive analytical queries against data sources of all sizes. It excels at federating queries across multiple data sources, from gigabytes to petabytes, enabling organizations to query data where it lives without moving it. Trino's architecture separates compute from storage, making it ideal for cloud-native data lake architectures where data is stored in object storage systems. This stack combines Trino with MinIO object storage, PostgreSQL-backed Hive Metastore, creating a complete data lake query engine. MinIO provides S3-compatible object storage for your data files (Parquet, ORC, JSON), while the Hive Metastore running on PostgreSQL maintains schema information and table definitions. Trino acts as the query coordinator, enabling SQL queries across data stored in MinIO while leveraging metadata from the Hive Metastore for schema-on-read operations. Data engineers building analytics platforms, organizations migrating from traditional data warehouses, and companies implementing lakehouse architectures will find this stack particularly valuable. The combination provides enterprise-grade SQL capabilities over object storage at a fraction of the cost of cloud data warehouses, while maintaining compatibility with popular data formats and BI tools that support standard JDBC/ODBC connections.

Key Features

  • Distributed SQL query engine supporting ANSI SQL with complex joins, window functions, and aggregations across petabyte-scale datasets
  • S3-compatible object storage with MinIO providing high-performance data lake foundation with erasure coding and encryption
  • Hive Metastore integration enabling schema-on-read for Parquet, ORC, Avro, and JSON file formats
  • Query federation capabilities allowing joins between data in MinIO and external sources like MySQL, PostgreSQL, or Elasticsearch
  • Connector architecture supporting 50+ data sources including Kafka, MongoDB, Cassandra, and cloud storage systems
  • Cost-based query optimizer with predicate pushdown and partition pruning for optimal performance
  • JDBC/ODBC connectivity enabling integration with popular BI tools like Tableau, Power BI, and Superset
  • Fault-tolerant query execution with automatic retry and cluster coordinator failover capabilities

Common Use Cases

  • 1Building self-hosted data lakes for analytics teams requiring SQL access to object storage data
  • 2Migrating from expensive cloud data warehouses while maintaining SQL compatibility and BI tool integration
  • 3Creating development and testing environments for big data analytics without cloud storage costs
  • 4Implementing lakehouse architectures for machine learning feature stores and model training datasets
  • 5Running ad-hoc analytical queries across large datasets stored in cost-effective object storage
  • 6Federating queries between operational databases and analytical data stored in data lake formats
  • 7Prototyping data mesh architectures with decentralized data ownership and SQL query capabilities

Prerequisites

  • Minimum 4GB RAM available (Trino coordinator requires 2GB+, MinIO needs 512MB+, PostgreSQL needs 256MB+)
  • Docker and Docker Compose with at least 10GB free disk space for data and metadata storage
  • Understanding of SQL and data lake concepts including Parquet/ORC file formats and partitioning strategies
  • Network access to ports 8080 (Trino), 9000 (MinIO API), and 9001 (MinIO Console)
  • Basic knowledge of Hive Metastore concepts and table schemas for data catalog management
  • Familiarity with S3 API concepts for bucket and object management in MinIO

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 trino:
3 image: trinodb/trino:latest
4 container_name: trino
5 restart: unless-stopped
6 ports:
7 - "${TRINO_PORT:-8080}:8080"
8 volumes:
9 - ./trino/catalog:/etc/trino/catalog:ro
10
11 metastore-db:
12 image: postgres:15-alpine
13 container_name: metastore-db
14 restart: unless-stopped
15 environment:
16 - POSTGRES_USER=hive
17 - POSTGRES_PASSWORD=${METASTORE_DB_PASSWORD}
18 - POSTGRES_DB=metastore
19 volumes:
20 - metastore_db_data:/var/lib/postgresql/data
21
22 minio:
23 image: minio/minio:latest
24 container_name: trino-minio
25 restart: unless-stopped
26 ports:
27 - "${MINIO_PORT:-9000}:9000"
28 - "${MINIO_CONSOLE:-9001}:9001"
29 environment:
30 - MINIO_ROOT_USER=${MINIO_USER}
31 - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
32 volumes:
33 - minio_data:/data
34 command: server /data --console-address ":9001"
35
36volumes:
37 metastore_db_data:
38 minio_data:

.env Template

.env
1# Trino Data Lake
2TRINO_PORT=8080
3METASTORE_DB_PASSWORD=hive_password
4MINIO_PORT=9000
5MINIO_CONSOLE=9001
6MINIO_USER=minioadmin
7MINIO_PASSWORD=minioadmin

Usage Notes

  1. 1Trino UI at http://localhost:8080
  2. 2MinIO Console at http://localhost:9001
  3. 3Configure catalogs in /etc/trino/catalog
  4. 4Create buckets for data lake storage

Individual Services(3 services)

Copy individual services to mix and match with your existing compose files.

trino
trino:
  image: trinodb/trino:latest
  container_name: trino
  restart: unless-stopped
  ports:
    - ${TRINO_PORT:-8080}:8080
  volumes:
    - ./trino/catalog:/etc/trino/catalog:ro
metastore-db
metastore-db:
  image: postgres:15-alpine
  container_name: metastore-db
  restart: unless-stopped
  environment:
    - POSTGRES_USER=hive
    - POSTGRES_PASSWORD=${METASTORE_DB_PASSWORD}
    - POSTGRES_DB=metastore
  volumes:
    - metastore_db_data:/var/lib/postgresql/data
minio
minio:
  image: minio/minio:latest
  container_name: trino-minio
  restart: unless-stopped
  ports:
    - ${MINIO_PORT:-9000}:9000
    - ${MINIO_CONSOLE:-9001}:9001
  environment:
    - MINIO_ROOT_USER=${MINIO_USER}
    - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
  volumes:
    - minio_data:/data
  command: server /data --console-address ":9001"

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 trino:
5 image: trinodb/trino:latest
6 container_name: trino
7 restart: unless-stopped
8 ports:
9 - "${TRINO_PORT:-8080}:8080"
10 volumes:
11 - ./trino/catalog:/etc/trino/catalog:ro
12
13 metastore-db:
14 image: postgres:15-alpine
15 container_name: metastore-db
16 restart: unless-stopped
17 environment:
18 - POSTGRES_USER=hive
19 - POSTGRES_PASSWORD=${METASTORE_DB_PASSWORD}
20 - POSTGRES_DB=metastore
21 volumes:
22 - metastore_db_data:/var/lib/postgresql/data
23
24 minio:
25 image: minio/minio:latest
26 container_name: trino-minio
27 restart: unless-stopped
28 ports:
29 - "${MINIO_PORT:-9000}:9000"
30 - "${MINIO_CONSOLE:-9001}:9001"
31 environment:
32 - MINIO_ROOT_USER=${MINIO_USER}
33 - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
34 volumes:
35 - minio_data:/data
36 command: server /data --console-address ":9001"
37
38volumes:
39 metastore_db_data:
40 minio_data:
41EOF
42
43# 2. Create the .env file
44cat > .env << 'EOF'
45# Trino Data Lake
46TRINO_PORT=8080
47METASTORE_DB_PASSWORD=hive_password
48MINIO_PORT=9000
49MINIO_CONSOLE=9001
50MINIO_USER=minioadmin
51MINIO_PASSWORD=minioadmin
52EOF
53
54# 3. Start the services
55docker compose up -d
56
57# 4. View logs
58docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/trino-data-lake/run | bash

Troubleshooting

  • Trino query fails with 'Catalog not found': Verify catalog configuration files are mounted in /etc/trino/catalog and contain correct MinIO/Hive connection details
  • MinIO connection refused errors: Check MINIO_ROOT_USER and MINIO_ROOT_PASSWORD environment variables match Trino catalog configuration credentials
  • Hive Metastore connection timeouts: Ensure PostgreSQL metastore-db container is healthy and METASTORE_DB_PASSWORD is set correctly
  • Out of memory errors during large queries: Increase Docker memory limits and configure Trino JVM heap size in jvm.config
  • Table not found errors despite data in MinIO: Verify Hive Metastore contains table definitions and partition information matches actual file structure in MinIO buckets
  • Permission denied accessing MinIO buckets: Check bucket policies and ensure Trino's MinIO credentials have read/write access to required buckets

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space