Paperless-ngx Document Management
Document management system with OCR, full-text search, and automatic tagging.
Overview
Paperless-ngx is a modern document management system designed to transform physical documents into a searchable digital archive. Originally forked from the now-discontinued Paperless project, Paperless-ngx has evolved into a comprehensive solution that performs optical character recognition (OCR) on scanned documents, automatically extracts metadata, and provides intelligent tagging and classification. The system excels at consuming documents through multiple channels including web uploads, email ingestion, and watched folders. This stack combines Paperless-ngx with PostgreSQL for robust document metadata storage, Redis for caching and task queuing, Gotenberg for PDF generation and conversion, and Apache Tika for advanced document parsing and text extraction. Together, these components create a powerful document processing pipeline that can handle everything from simple scanned receipts to complex multi-page contracts with embedded images and forms. The combination is particularly valuable for organizations and individuals looking to eliminate paper clutter while maintaining full searchability and organization of their document archives. Gotenberg provides reliable PDF rendering capabilities, while Tika's content analysis enhances OCR accuracy and metadata extraction, making this stack suitable for both personal document management and small business applications requiring sophisticated document processing workflows.
Key Features
- OCR processing with Tesseract engine supporting 100+ languages
- Full-text search across document contents with highlighting and ranking
- Automatic document classification and tagging based on content patterns
- Email consumption for automatic document import from designated accounts
- Correspondent and tag matching using machine learning algorithms
- Advanced PDF processing via Gotenberg for conversion and generation tasks
- Content analysis and metadata extraction through Apache Tika integration
- Archive format preservation maintaining original file integrity alongside processed versions
Common Use Cases
- 1Personal paperless office setup for managing household documents, bills, and receipts
- 2Small business document archiving with automatic invoice and contract processing
- 3Home office tax document organization with automatic categorization by year and type
- 4Legal practice case file management with OCR-searchable document discovery
- 5Medical office patient record digitization with HIPAA-compliant storage
- 6Real estate document management for property records and transaction files
- 7Academic research paper organization with full-text search across literature collections
Prerequisites
- Docker host with minimum 2GB RAM (4GB+ recommended for OCR processing)
- Available ports 8000 for Paperless-ngx web interface
- 50GB+ storage space for document archive growth and PostgreSQL data
- Basic understanding of document scanning workflows and OCR limitations
- Knowledge of email server configuration if using email consumption features
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 paperless: 3 image: ghcr.io/paperless-ngx/paperless-ngx:latest4 ports: 5 - "8000:8000"6 environment: 7 PAPERLESS_REDIS: redis://redis:63798 PAPERLESS_DBHOST: postgres9 PAPERLESS_DBUSER: ${POSTGRES_USER}10 PAPERLESS_DBPASS: ${POSTGRES_PASSWORD}11 PAPERLESS_DBNAME: ${POSTGRES_DB}12 PAPERLESS_SECRET_KEY: ${SECRET_KEY}13 PAPERLESS_TIKA_ENABLED: 114 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:300015 PAPERLESS_TIKA_ENDPOINT: http://tika:999816 PAPERLESS_OCR_LANGUAGE: eng17 PAPERLESS_ADMIN_USER: ${ADMIN_USER}18 PAPERLESS_ADMIN_PASSWORD: ${ADMIN_PASSWORD}19 volumes: 20 - paperless_data:/usr/src/paperless/data21 - paperless_media:/usr/src/paperless/media22 - paperless_export:/usr/src/paperless/export23 - paperless_consume:/usr/src/paperless/consume24 depends_on: 25 postgres: 26 condition: service_healthy27 redis: 28 condition: service_started29 gotenberg: 30 condition: service_started31 tika: 32 condition: service_started33 networks: 34 - paperless-net35 restart: unless-stopped3637 postgres: 38 image: postgres:16-alpine39 environment: 40 POSTGRES_USER: ${POSTGRES_USER}41 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}42 POSTGRES_DB: ${POSTGRES_DB}43 volumes: 44 - postgres_data:/var/lib/postgresql/data45 healthcheck: 46 test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]47 interval: 10s48 timeout: 5s49 retries: 550 networks: 51 - paperless-net52 restart: unless-stopped5354 redis: 55 image: redis:7-alpine56 volumes: 57 - redis_data:/data58 networks: 59 - paperless-net60 restart: unless-stopped6162 gotenberg: 63 image: gotenberg/gotenberg:latest64 command: 65 - "gotenberg"66 - "--chromium-disable-javascript=true"67 - "--chromium-allow-list=file:///tmp/.*"68 networks: 69 - paperless-net70 restart: unless-stopped7172 tika: 73 image: apache/tika:latest74 networks: 75 - paperless-net76 restart: unless-stopped7778volumes: 79 paperless_data: 80 paperless_media: 81 paperless_export: 82 paperless_consume: 83 postgres_data: 84 redis_data: 8586networks: 87 paperless-net: 88 driver: bridge.env Template
.env
1# PostgreSQL2POSTGRES_USER=paperless3POSTGRES_PASSWORD=secure_postgres_password4POSTGRES_DB=paperless56# Paperless-ngx7SECRET_KEY=$(openssl rand -hex 32)8ADMIN_USER=admin9ADMIN_PASSWORD=secure_admin_passwordUsage Notes
- 1Paperless-ngx at http://localhost:8000
- 2Drop files in consume folder for auto-import
- 3OCR processes documents automatically
- 4Configure mail consumption for email imports
Individual Services(5 services)
Copy individual services to mix and match with your existing compose files.
paperless
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
ports:
- "8000:8000"
environment:
PAPERLESS_REDIS: redis://redis:6379
PAPERLESS_DBHOST: postgres
PAPERLESS_DBUSER: ${POSTGRES_USER}
PAPERLESS_DBPASS: ${POSTGRES_PASSWORD}
PAPERLESS_DBNAME: ${POSTGRES_DB}
PAPERLESS_SECRET_KEY: ${SECRET_KEY}
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_ADMIN_USER: ${ADMIN_USER}
PAPERLESS_ADMIN_PASSWORD: ${ADMIN_PASSWORD}
volumes:
- paperless_data:/usr/src/paperless/data
- paperless_media:/usr/src/paperless/media
- paperless_export:/usr/src/paperless/export
- paperless_consume:/usr/src/paperless/consume
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
gotenberg:
condition: service_started
tika:
condition: service_started
networks:
- paperless-net
restart: unless-stopped
postgres
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test:
- CMD-SHELL
- pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}
interval: 10s
timeout: 5s
retries: 5
networks:
- paperless-net
restart: unless-stopped
redis
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
networks:
- paperless-net
restart: unless-stopped
gotenberg
gotenberg:
image: gotenberg/gotenberg:latest
command:
- gotenberg
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
networks:
- paperless-net
restart: unless-stopped
tika
tika:
image: apache/tika:latest
networks:
- paperless-net
restart: unless-stopped
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 paperless:5 image: ghcr.io/paperless-ngx/paperless-ngx:latest6 ports:7 - "8000:8000"8 environment:9 PAPERLESS_REDIS: redis://redis:637910 PAPERLESS_DBHOST: postgres11 PAPERLESS_DBUSER: ${POSTGRES_USER}12 PAPERLESS_DBPASS: ${POSTGRES_PASSWORD}13 PAPERLESS_DBNAME: ${POSTGRES_DB}14 PAPERLESS_SECRET_KEY: ${SECRET_KEY}15 PAPERLESS_TIKA_ENABLED: 116 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:300017 PAPERLESS_TIKA_ENDPOINT: http://tika:999818 PAPERLESS_OCR_LANGUAGE: eng19 PAPERLESS_ADMIN_USER: ${ADMIN_USER}20 PAPERLESS_ADMIN_PASSWORD: ${ADMIN_PASSWORD}21 volumes:22 - paperless_data:/usr/src/paperless/data23 - paperless_media:/usr/src/paperless/media24 - paperless_export:/usr/src/paperless/export25 - paperless_consume:/usr/src/paperless/consume26 depends_on:27 postgres:28 condition: service_healthy29 redis:30 condition: service_started31 gotenberg:32 condition: service_started33 tika:34 condition: service_started35 networks:36 - paperless-net37 restart: unless-stopped3839 postgres:40 image: postgres:16-alpine41 environment:42 POSTGRES_USER: ${POSTGRES_USER}43 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}44 POSTGRES_DB: ${POSTGRES_DB}45 volumes:46 - postgres_data:/var/lib/postgresql/data47 healthcheck:48 test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]49 interval: 10s50 timeout: 5s51 retries: 552 networks:53 - paperless-net54 restart: unless-stopped5556 redis:57 image: redis:7-alpine58 volumes:59 - redis_data:/data60 networks:61 - paperless-net62 restart: unless-stopped6364 gotenberg:65 image: gotenberg/gotenberg:latest66 command:67 - "gotenberg"68 - "--chromium-disable-javascript=true"69 - "--chromium-allow-list=file:///tmp/.*"70 networks:71 - paperless-net72 restart: unless-stopped7374 tika:75 image: apache/tika:latest76 networks:77 - paperless-net78 restart: unless-stopped7980volumes:81 paperless_data:82 paperless_media:83 paperless_export:84 paperless_consume:85 postgres_data:86 redis_data:8788networks:89 paperless-net:90 driver: bridge91EOF9293# 2. Create the .env file94cat > .env << 'EOF'95# PostgreSQL96POSTGRES_USER=paperless97POSTGRES_PASSWORD=secure_postgres_password98POSTGRES_DB=paperless99100# Paperless-ngx101SECRET_KEY=$(openssl rand -hex 32)102ADMIN_USER=admin103ADMIN_PASSWORD=secure_admin_password104EOF105106# 3. Start the services107docker compose up -d108109# 4. View logs110docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/paperless-ngx-stack/run | bashTroubleshooting
- OCR processing fails with 'Tesseract not found' error: Verify PAPERLESS_OCR_LANGUAGE environment variable matches installed language packs in container
- Documents stuck in 'Processing' status indefinitely: Check Redis container health and restart paperless container to clear task queue
- Gotenberg PDF conversion timeouts: Increase container memory limits and verify --chromium-allow-list permissions for temporary files
- Tika text extraction returns empty results: Ensure document file permissions allow container read access and file format is supported
- PostgreSQL connection refused during startup: Wait for postgres healthcheck to pass before paperless container initialization completes
- Web interface shows 'CSRF token missing' errors: Verify PAPERLESS_SECRET_KEY is set and consistent across container restarts
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
paperless-ngxpostgresqlredisgotenbergtika
Tags
#paperless#document-management#ocr#paperless-ngx
Category
Productivity & CollaborationAd Space
Shortcuts: C CopyF FavoriteD Download