docker.recipes

Paperless-ngx Document Management

intermediate

Document management system with OCR, full-text search, and automatic tagging.

Overview

Paperless-ngx is a modern document management system designed to transform physical documents into a searchable digital archive. Originally forked from the now-discontinued Paperless project, Paperless-ngx has evolved into a comprehensive solution that performs optical character recognition (OCR) on scanned documents, automatically extracts metadata, and provides intelligent tagging and classification. The system excels at consuming documents through multiple channels including web uploads, email ingestion, and watched folders. This stack combines Paperless-ngx with PostgreSQL for robust document metadata storage, Redis for caching and task queuing, Gotenberg for PDF generation and conversion, and Apache Tika for advanced document parsing and text extraction. Together, these components create a powerful document processing pipeline that can handle everything from simple scanned receipts to complex multi-page contracts with embedded images and forms. The combination is particularly valuable for organizations and individuals looking to eliminate paper clutter while maintaining full searchability and organization of their document archives. Gotenberg provides reliable PDF rendering capabilities, while Tika's content analysis enhances OCR accuracy and metadata extraction, making this stack suitable for both personal document management and small business applications requiring sophisticated document processing workflows.

Key Features

  • OCR processing with Tesseract engine supporting 100+ languages
  • Full-text search across document contents with highlighting and ranking
  • Automatic document classification and tagging based on content patterns
  • Email consumption for automatic document import from designated accounts
  • Correspondent and tag matching using machine learning algorithms
  • Advanced PDF processing via Gotenberg for conversion and generation tasks
  • Content analysis and metadata extraction through Apache Tika integration
  • Archive format preservation maintaining original file integrity alongside processed versions

Common Use Cases

  • 1Personal paperless office setup for managing household documents, bills, and receipts
  • 2Small business document archiving with automatic invoice and contract processing
  • 3Home office tax document organization with automatic categorization by year and type
  • 4Legal practice case file management with OCR-searchable document discovery
  • 5Medical office patient record digitization with HIPAA-compliant storage
  • 6Real estate document management for property records and transaction files
  • 7Academic research paper organization with full-text search across literature collections

Prerequisites

  • Docker host with minimum 2GB RAM (4GB+ recommended for OCR processing)
  • Available ports 8000 for Paperless-ngx web interface
  • 50GB+ storage space for document archive growth and PostgreSQL data
  • Basic understanding of document scanning workflows and OCR limitations
  • Knowledge of email server configuration if using email consumption features

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 paperless:
3 image: ghcr.io/paperless-ngx/paperless-ngx:latest
4 ports:
5 - "8000:8000"
6 environment:
7 PAPERLESS_REDIS: redis://redis:6379
8 PAPERLESS_DBHOST: postgres
9 PAPERLESS_DBUSER: ${POSTGRES_USER}
10 PAPERLESS_DBPASS: ${POSTGRES_PASSWORD}
11 PAPERLESS_DBNAME: ${POSTGRES_DB}
12 PAPERLESS_SECRET_KEY: ${SECRET_KEY}
13 PAPERLESS_TIKA_ENABLED: 1
14 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
15 PAPERLESS_TIKA_ENDPOINT: http://tika:9998
16 PAPERLESS_OCR_LANGUAGE: eng
17 PAPERLESS_ADMIN_USER: ${ADMIN_USER}
18 PAPERLESS_ADMIN_PASSWORD: ${ADMIN_PASSWORD}
19 volumes:
20 - paperless_data:/usr/src/paperless/data
21 - paperless_media:/usr/src/paperless/media
22 - paperless_export:/usr/src/paperless/export
23 - paperless_consume:/usr/src/paperless/consume
24 depends_on:
25 postgres:
26 condition: service_healthy
27 redis:
28 condition: service_started
29 gotenberg:
30 condition: service_started
31 tika:
32 condition: service_started
33 networks:
34 - paperless-net
35 restart: unless-stopped
36
37 postgres:
38 image: postgres:16-alpine
39 environment:
40 POSTGRES_USER: ${POSTGRES_USER}
41 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
42 POSTGRES_DB: ${POSTGRES_DB}
43 volumes:
44 - postgres_data:/var/lib/postgresql/data
45 healthcheck:
46 test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
47 interval: 10s
48 timeout: 5s
49 retries: 5
50 networks:
51 - paperless-net
52 restart: unless-stopped
53
54 redis:
55 image: redis:7-alpine
56 volumes:
57 - redis_data:/data
58 networks:
59 - paperless-net
60 restart: unless-stopped
61
62 gotenberg:
63 image: gotenberg/gotenberg:latest
64 command:
65 - "gotenberg"
66 - "--chromium-disable-javascript=true"
67 - "--chromium-allow-list=file:///tmp/.*"
68 networks:
69 - paperless-net
70 restart: unless-stopped
71
72 tika:
73 image: apache/tika:latest
74 networks:
75 - paperless-net
76 restart: unless-stopped
77
78volumes:
79 paperless_data:
80 paperless_media:
81 paperless_export:
82 paperless_consume:
83 postgres_data:
84 redis_data:
85
86networks:
87 paperless-net:
88 driver: bridge

.env Template

.env
1# PostgreSQL
2POSTGRES_USER=paperless
3POSTGRES_PASSWORD=secure_postgres_password
4POSTGRES_DB=paperless
5
6# Paperless-ngx
7SECRET_KEY=$(openssl rand -hex 32)
8ADMIN_USER=admin
9ADMIN_PASSWORD=secure_admin_password

Usage Notes

  1. 1Paperless-ngx at http://localhost:8000
  2. 2Drop files in consume folder for auto-import
  3. 3OCR processes documents automatically
  4. 4Configure mail consumption for email imports

Individual Services(5 services)

Copy individual services to mix and match with your existing compose files.

paperless
paperless:
  image: ghcr.io/paperless-ngx/paperless-ngx:latest
  ports:
    - "8000:8000"
  environment:
    PAPERLESS_REDIS: redis://redis:6379
    PAPERLESS_DBHOST: postgres
    PAPERLESS_DBUSER: ${POSTGRES_USER}
    PAPERLESS_DBPASS: ${POSTGRES_PASSWORD}
    PAPERLESS_DBNAME: ${POSTGRES_DB}
    PAPERLESS_SECRET_KEY: ${SECRET_KEY}
    PAPERLESS_TIKA_ENABLED: 1
    PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
    PAPERLESS_TIKA_ENDPOINT: http://tika:9998
    PAPERLESS_OCR_LANGUAGE: eng
    PAPERLESS_ADMIN_USER: ${ADMIN_USER}
    PAPERLESS_ADMIN_PASSWORD: ${ADMIN_PASSWORD}
  volumes:
    - paperless_data:/usr/src/paperless/data
    - paperless_media:/usr/src/paperless/media
    - paperless_export:/usr/src/paperless/export
    - paperless_consume:/usr/src/paperless/consume
  depends_on:
    postgres:
      condition: service_healthy
    redis:
      condition: service_started
    gotenberg:
      condition: service_started
    tika:
      condition: service_started
  networks:
    - paperless-net
  restart: unless-stopped
postgres
postgres:
  image: postgres:16-alpine
  environment:
    POSTGRES_USER: ${POSTGRES_USER}
    POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    POSTGRES_DB: ${POSTGRES_DB}
  volumes:
    - postgres_data:/var/lib/postgresql/data
  healthcheck:
    test:
      - CMD-SHELL
      - pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}
    interval: 10s
    timeout: 5s
    retries: 5
  networks:
    - paperless-net
  restart: unless-stopped
redis
redis:
  image: redis:7-alpine
  volumes:
    - redis_data:/data
  networks:
    - paperless-net
  restart: unless-stopped
gotenberg
gotenberg:
  image: gotenberg/gotenberg:latest
  command:
    - gotenberg
    - "--chromium-disable-javascript=true"
    - "--chromium-allow-list=file:///tmp/.*"
  networks:
    - paperless-net
  restart: unless-stopped
tika
tika:
  image: apache/tika:latest
  networks:
    - paperless-net
  restart: unless-stopped

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 paperless:
5 image: ghcr.io/paperless-ngx/paperless-ngx:latest
6 ports:
7 - "8000:8000"
8 environment:
9 PAPERLESS_REDIS: redis://redis:6379
10 PAPERLESS_DBHOST: postgres
11 PAPERLESS_DBUSER: ${POSTGRES_USER}
12 PAPERLESS_DBPASS: ${POSTGRES_PASSWORD}
13 PAPERLESS_DBNAME: ${POSTGRES_DB}
14 PAPERLESS_SECRET_KEY: ${SECRET_KEY}
15 PAPERLESS_TIKA_ENABLED: 1
16 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
17 PAPERLESS_TIKA_ENDPOINT: http://tika:9998
18 PAPERLESS_OCR_LANGUAGE: eng
19 PAPERLESS_ADMIN_USER: ${ADMIN_USER}
20 PAPERLESS_ADMIN_PASSWORD: ${ADMIN_PASSWORD}
21 volumes:
22 - paperless_data:/usr/src/paperless/data
23 - paperless_media:/usr/src/paperless/media
24 - paperless_export:/usr/src/paperless/export
25 - paperless_consume:/usr/src/paperless/consume
26 depends_on:
27 postgres:
28 condition: service_healthy
29 redis:
30 condition: service_started
31 gotenberg:
32 condition: service_started
33 tika:
34 condition: service_started
35 networks:
36 - paperless-net
37 restart: unless-stopped
38
39 postgres:
40 image: postgres:16-alpine
41 environment:
42 POSTGRES_USER: ${POSTGRES_USER}
43 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
44 POSTGRES_DB: ${POSTGRES_DB}
45 volumes:
46 - postgres_data:/var/lib/postgresql/data
47 healthcheck:
48 test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
49 interval: 10s
50 timeout: 5s
51 retries: 5
52 networks:
53 - paperless-net
54 restart: unless-stopped
55
56 redis:
57 image: redis:7-alpine
58 volumes:
59 - redis_data:/data
60 networks:
61 - paperless-net
62 restart: unless-stopped
63
64 gotenberg:
65 image: gotenberg/gotenberg:latest
66 command:
67 - "gotenberg"
68 - "--chromium-disable-javascript=true"
69 - "--chromium-allow-list=file:///tmp/.*"
70 networks:
71 - paperless-net
72 restart: unless-stopped
73
74 tika:
75 image: apache/tika:latest
76 networks:
77 - paperless-net
78 restart: unless-stopped
79
80volumes:
81 paperless_data:
82 paperless_media:
83 paperless_export:
84 paperless_consume:
85 postgres_data:
86 redis_data:
87
88networks:
89 paperless-net:
90 driver: bridge
91EOF
92
93# 2. Create the .env file
94cat > .env << 'EOF'
95# PostgreSQL
96POSTGRES_USER=paperless
97POSTGRES_PASSWORD=secure_postgres_password
98POSTGRES_DB=paperless
99
100# Paperless-ngx
101SECRET_KEY=$(openssl rand -hex 32)
102ADMIN_USER=admin
103ADMIN_PASSWORD=secure_admin_password
104EOF
105
106# 3. Start the services
107docker compose up -d
108
109# 4. View logs
110docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/paperless-ngx-stack/run | bash

Troubleshooting

  • OCR processing fails with 'Tesseract not found' error: Verify PAPERLESS_OCR_LANGUAGE environment variable matches installed language packs in container
  • Documents stuck in 'Processing' status indefinitely: Check Redis container health and restart paperless container to clear task queue
  • Gotenberg PDF conversion timeouts: Increase container memory limits and verify --chromium-allow-list permissions for temporary files
  • Tika text extraction returns empty results: Ensure document file permissions allow container read access and file format is supported
  • PostgreSQL connection refused during startup: Wait for postgres healthcheck to pass before paperless container initialization completes
  • Web interface shows 'CSRF token missing' errors: Verify PAPERLESS_SECRET_KEY is set and consistent across container restarts

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space