docker.recipes

Paperless-ngx + PostgreSQL + Redis + Gotenberg

intermediate

Document management system with OCR and full-text search.

Overview

Paperless-ngx is a modern document management system that transforms physical documents into a searchable digital archive through OCR (Optical Character Recognition) and intelligent classification. Originally forked from the paperless project, paperless-ngx adds enhanced features like improved OCR accuracy, automatic document tagging, and mobile-friendly interfaces for managing everything from receipts to legal documents. This stack combines paperless-ngx with PostgreSQL for robust document metadata storage, Redis for high-performance caching and task queuing, Gotenberg for PDF conversion and rendering, and Apache Tika for advanced document parsing and text extraction. Together, these components create a comprehensive document processing pipeline that can handle various file formats, perform OCR on scanned images, extract metadata, and provide full-text search capabilities across your entire document archive. This configuration is ideal for individuals and organizations looking to digitize their paper workflow, create searchable document archives, or implement a lightweight document management system without the complexity of enterprise-grade solutions like SharePoint or Alfresco.

Key Features

  • OCR processing with Tesseract engine for converting scanned documents to searchable text
  • Advanced document parsing through Apache Tika supporting 1000+ file formats including PDFs, Office documents, and images
  • PDF generation and manipulation via Gotenberg for consistent document rendering and conversion
  • Full-text search with PostgreSQL's advanced search capabilities and ranking algorithms
  • Automatic document classification using machine learning to suggest tags and correspondents
  • Redis-powered background task processing for OCR, parsing, and indexing operations
  • Email consumption for automatically importing documents sent to designated email addresses
  • Mobile document scanning integration with smartphone apps for instant digitization

Common Use Cases

  • 1Home office paperless transformation for receipts, bills, tax documents, and personal records
  • 2Small business document archival with automatic invoice processing and vendor correspondence tracking
  • 3Legal practice case file digitization with full-text search across contracts and legal documents
  • 4Healthcare clinic patient record management with HIPAA-compliant document storage and retrieval
  • 5Educational institution student record keeping and administrative document management
  • 6Non-profit organization grant documentation and compliance record maintenance
  • 7Real estate office property document management for contracts, inspections, and client files

Prerequisites

  • Docker and Docker Compose installed with at least 2GB available RAM for OCR processing
  • 4GB+ free disk space for document storage, OCR processing temporary files, and database growth
  • Port 8000 available for paperless-ngx web interface access
  • Basic understanding of environment variables for configuring database credentials and OCR languages
  • Familiarity with document scanning workflows and file organization principles
  • SSL certificate setup knowledge if exposing the interface over HTTPS to external networks

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 paperless:
3 image: ghcr.io/paperless-ngx/paperless-ngx:latest
4 environment:
5 - PAPERLESS_REDIS=redis://redis:6379
6 - PAPERLESS_DBHOST=postgres
7 - PAPERLESS_DBNAME=paperless
8 - PAPERLESS_DBUSER=${POSTGRES_USER}
9 - PAPERLESS_DBPASS=${POSTGRES_PASSWORD}
10 - PAPERLESS_TIKA_ENABLED=1
11 - PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
12 - PAPERLESS_TIKA_ENDPOINT=http://tika:9998
13 - PAPERLESS_SECRET_KEY=${SECRET_KEY}
14 - PAPERLESS_OCR_LANGUAGE=eng
15 - PAPERLESS_TIME_ZONE=UTC
16 - PAPERLESS_ADMIN_USER=${ADMIN_USER}
17 - PAPERLESS_ADMIN_PASSWORD=${ADMIN_PASSWORD}
18 volumes:
19 - paperless-data:/usr/src/paperless/data
20 - paperless-media:/usr/src/paperless/media
21 - paperless-export:/usr/src/paperless/export
22 - paperless-consume:/usr/src/paperless/consume
23 ports:
24 - "8000:8000"
25 depends_on:
26 - postgres
27 - redis
28 - gotenberg
29 - tika
30 networks:
31 - paperless-network
32 restart: unless-stopped
33
34 postgres:
35 image: postgres:15
36 environment:
37 - POSTGRES_USER=${POSTGRES_USER}
38 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
39 - POSTGRES_DB=paperless
40 volumes:
41 - postgres-data:/var/lib/postgresql/data
42 networks:
43 - paperless-network
44 restart: unless-stopped
45
46 redis:
47 image: redis:alpine
48 volumes:
49 - redis-data:/data
50 networks:
51 - paperless-network
52 restart: unless-stopped
53
54 gotenberg:
55 image: gotenberg/gotenberg:latest
56 command:
57 - "gotenberg"
58 - "--chromium-disable-javascript=true"
59 - "--chromium-allow-list=file:///tmp/.*"
60 networks:
61 - paperless-network
62 restart: unless-stopped
63
64 tika:
65 image: apache/tika:latest
66 networks:
67 - paperless-network
68 restart: unless-stopped
69
70volumes:
71 paperless-data:
72 paperless-media:
73 paperless-export:
74 paperless-consume:
75 postgres-data:
76 redis-data:
77
78networks:
79 paperless-network:
80 driver: bridge

.env Template

.env
1# Paperless-ngx
2POSTGRES_USER=paperless
3POSTGRES_PASSWORD=secure_postgres_password
4SECRET_KEY=your-very-long-secret-key
5ADMIN_USER=admin
6ADMIN_PASSWORD=secure_admin_password

Usage Notes

  1. 1Web UI at http://localhost:8000
  2. 2Drop files in consume folder
  3. 3OCR and text extraction
  4. 4Full-text search
  5. 5Mobile scanner integration

Individual Services(5 services)

Copy individual services to mix and match with your existing compose files.

paperless
paperless:
  image: ghcr.io/paperless-ngx/paperless-ngx:latest
  environment:
    - PAPERLESS_REDIS=redis://redis:6379
    - PAPERLESS_DBHOST=postgres
    - PAPERLESS_DBNAME=paperless
    - PAPERLESS_DBUSER=${POSTGRES_USER}
    - PAPERLESS_DBPASS=${POSTGRES_PASSWORD}
    - PAPERLESS_TIKA_ENABLED=1
    - PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
    - PAPERLESS_TIKA_ENDPOINT=http://tika:9998
    - PAPERLESS_SECRET_KEY=${SECRET_KEY}
    - PAPERLESS_OCR_LANGUAGE=eng
    - PAPERLESS_TIME_ZONE=UTC
    - PAPERLESS_ADMIN_USER=${ADMIN_USER}
    - PAPERLESS_ADMIN_PASSWORD=${ADMIN_PASSWORD}
  volumes:
    - paperless-data:/usr/src/paperless/data
    - paperless-media:/usr/src/paperless/media
    - paperless-export:/usr/src/paperless/export
    - paperless-consume:/usr/src/paperless/consume
  ports:
    - "8000:8000"
  depends_on:
    - postgres
    - redis
    - gotenberg
    - tika
  networks:
    - paperless-network
  restart: unless-stopped
postgres
postgres:
  image: postgres:15
  environment:
    - POSTGRES_USER=${POSTGRES_USER}
    - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    - POSTGRES_DB=paperless
  volumes:
    - postgres-data:/var/lib/postgresql/data
  networks:
    - paperless-network
  restart: unless-stopped
redis
redis:
  image: redis:alpine
  volumes:
    - redis-data:/data
  networks:
    - paperless-network
  restart: unless-stopped
gotenberg
gotenberg:
  image: gotenberg/gotenberg:latest
  command:
    - gotenberg
    - "--chromium-disable-javascript=true"
    - "--chromium-allow-list=file:///tmp/.*"
  networks:
    - paperless-network
  restart: unless-stopped
tika
tika:
  image: apache/tika:latest
  networks:
    - paperless-network
  restart: unless-stopped

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 paperless:
5 image: ghcr.io/paperless-ngx/paperless-ngx:latest
6 environment:
7 - PAPERLESS_REDIS=redis://redis:6379
8 - PAPERLESS_DBHOST=postgres
9 - PAPERLESS_DBNAME=paperless
10 - PAPERLESS_DBUSER=${POSTGRES_USER}
11 - PAPERLESS_DBPASS=${POSTGRES_PASSWORD}
12 - PAPERLESS_TIKA_ENABLED=1
13 - PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
14 - PAPERLESS_TIKA_ENDPOINT=http://tika:9998
15 - PAPERLESS_SECRET_KEY=${SECRET_KEY}
16 - PAPERLESS_OCR_LANGUAGE=eng
17 - PAPERLESS_TIME_ZONE=UTC
18 - PAPERLESS_ADMIN_USER=${ADMIN_USER}
19 - PAPERLESS_ADMIN_PASSWORD=${ADMIN_PASSWORD}
20 volumes:
21 - paperless-data:/usr/src/paperless/data
22 - paperless-media:/usr/src/paperless/media
23 - paperless-export:/usr/src/paperless/export
24 - paperless-consume:/usr/src/paperless/consume
25 ports:
26 - "8000:8000"
27 depends_on:
28 - postgres
29 - redis
30 - gotenberg
31 - tika
32 networks:
33 - paperless-network
34 restart: unless-stopped
35
36 postgres:
37 image: postgres:15
38 environment:
39 - POSTGRES_USER=${POSTGRES_USER}
40 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
41 - POSTGRES_DB=paperless
42 volumes:
43 - postgres-data:/var/lib/postgresql/data
44 networks:
45 - paperless-network
46 restart: unless-stopped
47
48 redis:
49 image: redis:alpine
50 volumes:
51 - redis-data:/data
52 networks:
53 - paperless-network
54 restart: unless-stopped
55
56 gotenberg:
57 image: gotenberg/gotenberg:latest
58 command:
59 - "gotenberg"
60 - "--chromium-disable-javascript=true"
61 - "--chromium-allow-list=file:///tmp/.*"
62 networks:
63 - paperless-network
64 restart: unless-stopped
65
66 tika:
67 image: apache/tika:latest
68 networks:
69 - paperless-network
70 restart: unless-stopped
71
72volumes:
73 paperless-data:
74 paperless-media:
75 paperless-export:
76 paperless-consume:
77 postgres-data:
78 redis-data:
79
80networks:
81 paperless-network:
82 driver: bridge
83EOF
84
85# 2. Create the .env file
86cat > .env << 'EOF'
87# Paperless-ngx
88POSTGRES_USER=paperless
89POSTGRES_PASSWORD=secure_postgres_password
90SECRET_KEY=your-very-long-secret-key
91ADMIN_USER=admin
92ADMIN_PASSWORD=secure_admin_password
93EOF
94
95# 3. Start the services
96docker compose up -d
97
98# 4. View logs
99docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/paperless-ngx-complete/run | bash

Troubleshooting

  • OCR processing fails with 'Tesseract not found' error: Ensure the paperless-ngx image includes Tesseract by using the full image tag, not the slim variant
  • Documents stuck in 'Processing' status indefinitely: Check Redis connectivity and restart the Redis container to clear stuck background tasks
  • Gotenberg service returns 503 errors during PDF conversion: Increase Gotenberg memory limits and disable JavaScript processing for better stability
  • PostgreSQL connection refused during startup: Verify POSTGRES_USER and POSTGRES_PASSWORD environment variables match between paperless and postgres services
  • Tika service crashes with OutOfMemoryError: Add memory limits to the Tika container using deploy.resources.limits.memory in the compose file
  • Paperless web interface shows 'Secret key not configured' error: Generate and set a secure PAPERLESS_SECRET_KEY environment variable with at least 50 random characters

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space