Paperless-ngx + PostgreSQL + Redis + Gotenberg
Document management system with OCR and full-text search.
Overview
Paperless-ngx is a modern document management system that transforms physical documents into a searchable digital archive through OCR (Optical Character Recognition) and intelligent classification. Originally forked from the paperless project, paperless-ngx adds enhanced features like improved OCR accuracy, automatic document tagging, and mobile-friendly interfaces for managing everything from receipts to legal documents. This stack combines paperless-ngx with PostgreSQL for robust document metadata storage, Redis for high-performance caching and task queuing, Gotenberg for PDF conversion and rendering, and Apache Tika for advanced document parsing and text extraction. Together, these components create a comprehensive document processing pipeline that can handle various file formats, perform OCR on scanned images, extract metadata, and provide full-text search capabilities across your entire document archive. This configuration is ideal for individuals and organizations looking to digitize their paper workflow, create searchable document archives, or implement a lightweight document management system without the complexity of enterprise-grade solutions like SharePoint or Alfresco.
Key Features
- OCR processing with Tesseract engine for converting scanned documents to searchable text
- Advanced document parsing through Apache Tika supporting 1000+ file formats including PDFs, Office documents, and images
- PDF generation and manipulation via Gotenberg for consistent document rendering and conversion
- Full-text search with PostgreSQL's advanced search capabilities and ranking algorithms
- Automatic document classification using machine learning to suggest tags and correspondents
- Redis-powered background task processing for OCR, parsing, and indexing operations
- Email consumption for automatically importing documents sent to designated email addresses
- Mobile document scanning integration with smartphone apps for instant digitization
Common Use Cases
- 1Home office paperless transformation for receipts, bills, tax documents, and personal records
- 2Small business document archival with automatic invoice processing and vendor correspondence tracking
- 3Legal practice case file digitization with full-text search across contracts and legal documents
- 4Healthcare clinic patient record management with HIPAA-compliant document storage and retrieval
- 5Educational institution student record keeping and administrative document management
- 6Non-profit organization grant documentation and compliance record maintenance
- 7Real estate office property document management for contracts, inspections, and client files
Prerequisites
- Docker and Docker Compose installed with at least 2GB available RAM for OCR processing
- 4GB+ free disk space for document storage, OCR processing temporary files, and database growth
- Port 8000 available for paperless-ngx web interface access
- Basic understanding of environment variables for configuring database credentials and OCR languages
- Familiarity with document scanning workflows and file organization principles
- SSL certificate setup knowledge if exposing the interface over HTTPS to external networks
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 paperless: 3 image: ghcr.io/paperless-ngx/paperless-ngx:latest4 environment: 5 - PAPERLESS_REDIS=redis://redis:63796 - PAPERLESS_DBHOST=postgres7 - PAPERLESS_DBNAME=paperless8 - PAPERLESS_DBUSER=${POSTGRES_USER}9 - PAPERLESS_DBPASS=${POSTGRES_PASSWORD}10 - PAPERLESS_TIKA_ENABLED=111 - PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:300012 - PAPERLESS_TIKA_ENDPOINT=http://tika:999813 - PAPERLESS_SECRET_KEY=${SECRET_KEY}14 - PAPERLESS_OCR_LANGUAGE=eng15 - PAPERLESS_TIME_ZONE=UTC16 - PAPERLESS_ADMIN_USER=${ADMIN_USER}17 - PAPERLESS_ADMIN_PASSWORD=${ADMIN_PASSWORD}18 volumes: 19 - paperless-data:/usr/src/paperless/data20 - paperless-media:/usr/src/paperless/media21 - paperless-export:/usr/src/paperless/export22 - paperless-consume:/usr/src/paperless/consume23 ports: 24 - "8000:8000"25 depends_on: 26 - postgres27 - redis28 - gotenberg29 - tika30 networks: 31 - paperless-network32 restart: unless-stopped3334 postgres: 35 image: postgres:1536 environment: 37 - POSTGRES_USER=${POSTGRES_USER}38 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}39 - POSTGRES_DB=paperless40 volumes: 41 - postgres-data:/var/lib/postgresql/data42 networks: 43 - paperless-network44 restart: unless-stopped4546 redis: 47 image: redis:alpine48 volumes: 49 - redis-data:/data50 networks: 51 - paperless-network52 restart: unless-stopped5354 gotenberg: 55 image: gotenberg/gotenberg:latest56 command: 57 - "gotenberg"58 - "--chromium-disable-javascript=true"59 - "--chromium-allow-list=file:///tmp/.*"60 networks: 61 - paperless-network62 restart: unless-stopped6364 tika: 65 image: apache/tika:latest66 networks: 67 - paperless-network68 restart: unless-stopped6970volumes: 71 paperless-data: 72 paperless-media: 73 paperless-export: 74 paperless-consume: 75 postgres-data: 76 redis-data: 7778networks: 79 paperless-network: 80 driver: bridge.env Template
.env
1# Paperless-ngx2POSTGRES_USER=paperless3POSTGRES_PASSWORD=secure_postgres_password4SECRET_KEY=your-very-long-secret-key5ADMIN_USER=admin6ADMIN_PASSWORD=secure_admin_passwordUsage Notes
- 1Web UI at http://localhost:8000
- 2Drop files in consume folder
- 3OCR and text extraction
- 4Full-text search
- 5Mobile scanner integration
Individual Services(5 services)
Copy individual services to mix and match with your existing compose files.
paperless
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
environment:
- PAPERLESS_REDIS=redis://redis:6379
- PAPERLESS_DBHOST=postgres
- PAPERLESS_DBNAME=paperless
- PAPERLESS_DBUSER=${POSTGRES_USER}
- PAPERLESS_DBPASS=${POSTGRES_PASSWORD}
- PAPERLESS_TIKA_ENABLED=1
- PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
- PAPERLESS_TIKA_ENDPOINT=http://tika:9998
- PAPERLESS_SECRET_KEY=${SECRET_KEY}
- PAPERLESS_OCR_LANGUAGE=eng
- PAPERLESS_TIME_ZONE=UTC
- PAPERLESS_ADMIN_USER=${ADMIN_USER}
- PAPERLESS_ADMIN_PASSWORD=${ADMIN_PASSWORD}
volumes:
- paperless-data:/usr/src/paperless/data
- paperless-media:/usr/src/paperless/media
- paperless-export:/usr/src/paperless/export
- paperless-consume:/usr/src/paperless/consume
ports:
- "8000:8000"
depends_on:
- postgres
- redis
- gotenberg
- tika
networks:
- paperless-network
restart: unless-stopped
postgres
postgres:
image: postgres:15
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=paperless
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- paperless-network
restart: unless-stopped
redis
redis:
image: redis:alpine
volumes:
- redis-data:/data
networks:
- paperless-network
restart: unless-stopped
gotenberg
gotenberg:
image: gotenberg/gotenberg:latest
command:
- gotenberg
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
networks:
- paperless-network
restart: unless-stopped
tika
tika:
image: apache/tika:latest
networks:
- paperless-network
restart: unless-stopped
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 paperless:5 image: ghcr.io/paperless-ngx/paperless-ngx:latest6 environment:7 - PAPERLESS_REDIS=redis://redis:63798 - PAPERLESS_DBHOST=postgres9 - PAPERLESS_DBNAME=paperless10 - PAPERLESS_DBUSER=${POSTGRES_USER}11 - PAPERLESS_DBPASS=${POSTGRES_PASSWORD}12 - PAPERLESS_TIKA_ENABLED=113 - PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:300014 - PAPERLESS_TIKA_ENDPOINT=http://tika:999815 - PAPERLESS_SECRET_KEY=${SECRET_KEY}16 - PAPERLESS_OCR_LANGUAGE=eng17 - PAPERLESS_TIME_ZONE=UTC18 - PAPERLESS_ADMIN_USER=${ADMIN_USER}19 - PAPERLESS_ADMIN_PASSWORD=${ADMIN_PASSWORD}20 volumes:21 - paperless-data:/usr/src/paperless/data22 - paperless-media:/usr/src/paperless/media23 - paperless-export:/usr/src/paperless/export24 - paperless-consume:/usr/src/paperless/consume25 ports:26 - "8000:8000"27 depends_on:28 - postgres29 - redis30 - gotenberg31 - tika32 networks:33 - paperless-network34 restart: unless-stopped3536 postgres:37 image: postgres:1538 environment:39 - POSTGRES_USER=${POSTGRES_USER}40 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}41 - POSTGRES_DB=paperless42 volumes:43 - postgres-data:/var/lib/postgresql/data44 networks:45 - paperless-network46 restart: unless-stopped4748 redis:49 image: redis:alpine50 volumes:51 - redis-data:/data52 networks:53 - paperless-network54 restart: unless-stopped5556 gotenberg:57 image: gotenberg/gotenberg:latest58 command:59 - "gotenberg"60 - "--chromium-disable-javascript=true"61 - "--chromium-allow-list=file:///tmp/.*"62 networks:63 - paperless-network64 restart: unless-stopped6566 tika:67 image: apache/tika:latest68 networks:69 - paperless-network70 restart: unless-stopped7172volumes:73 paperless-data:74 paperless-media:75 paperless-export:76 paperless-consume:77 postgres-data:78 redis-data:7980networks:81 paperless-network:82 driver: bridge83EOF8485# 2. Create the .env file86cat > .env << 'EOF'87# Paperless-ngx88POSTGRES_USER=paperless89POSTGRES_PASSWORD=secure_postgres_password90SECRET_KEY=your-very-long-secret-key91ADMIN_USER=admin92ADMIN_PASSWORD=secure_admin_password93EOF9495# 3. Start the services96docker compose up -d9798# 4. View logs99docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/paperless-ngx-complete/run | bashTroubleshooting
- OCR processing fails with 'Tesseract not found' error: Ensure the paperless-ngx image includes Tesseract by using the full image tag, not the slim variant
- Documents stuck in 'Processing' status indefinitely: Check Redis connectivity and restart the Redis container to clear stuck background tasks
- Gotenberg service returns 503 errors during PDF conversion: Increase Gotenberg memory limits and disable JavaScript processing for better stability
- PostgreSQL connection refused during startup: Verify POSTGRES_USER and POSTGRES_PASSWORD environment variables match between paperless and postgres services
- Tika service crashes with OutOfMemoryError: Add memory limits to the Tika container using deploy.resources.limits.memory in the compose file
- Paperless web interface shows 'Secret key not configured' error: Generate and set a secure PAPERLESS_SECRET_KEY environment variable with at least 50 random characters
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
paperless-ngxpostgresqlredisgotenbergtika
Tags
#paperless#documents#ocr#archive#scanning
Category
Productivity & CollaborationAd Space
Shortcuts: C CopyF FavoriteD Download