PDF Redaction

Self-Hosted PDF Redaction API

Deploy and configure the PDF Redaction API on your own infrastructure using Docker

PDF Redaction API: Self-Hosted & Secure Document Masking

Automate PII (Personally Identifiable Information) removal from PDF documents with a high-performance, self-hosted REST API. No data ever leaves your server.

The PDF Redaction API is a containerized solution designed for developers who need to protect sensitive data while maintaining total control over document privacy. It is ideal for GDPR, HIPAA, and CCPA compliance.

Docker Hub: stabrise/pdf-redaction-api

🚀 Key Features

  • Automated Sensitive Data Detection: Remove emails, credit card numbers, tax IDs (DNI/NIE), phone numbers, and other PII automatically.
  • Multi-Language OCR Support: Built-in Tesseract OCR with support for English, Spanish, French, German, Italian, Portuguese, and Russian.
  • Deep Search & Custom Rules: Use predefined and custom rules for advanced redaction patterns.
  • Total Data Privacy: Unlike cloud-based APIs, this Docker image runs on your infrastructure (on-premise). Your documents are never uploaded to the cloud.
  • RESTful Architecture: Simple JSON-based API integration with any language (Python, JS, Go, PHP, etc.).
  • Advanced Image Processing: Includes ffmpeg and image processing libraries for handling complex PDF structures.

🛠 Quick Start

🚀 Instant Install

Install the self-hosted PDF Redaction API using one of the following methods:

Option A (curl):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/StabRise/pdf-redaction-api/main/install.sh)"

Option B (wget):

/bin/bash -c "$(wget -qO- https://raw.githubusercontent.com/StabRise/pdf-redaction-api/main/install.sh)"

Docker

Pull the image from Docker Hub and run the PDF Redaction API on your VPS using Docker:

docker run -d -p 8002:8002 \
  -e PDF_REDACTION_API_LICENSE=your_key_here \
  stabrise/pdf-redaction-api:latest

For a professional setup with a persistent volume (best for your VPS setup):

services:
  pdf-api:
    image: stabrise/pdf-redaction-api:latest
    restart: unless-stopped
    ports:
      - "8002:8002"
    environment:
      # Required: License key for production use
      - PDF_REDACTION_API_LICENSE=${PDF_REDACTION_API_LICENSE}
      # Server settings (optional, defaults shown)
      - PDF_REDACTION_API_HOST=0.0.0.0
      - PDF_REDACTION_API_PORT=8002
      - PDF_REDACTION_API_WORKERS_COUNT=1
      - PDF_REDACTION_API_ENVIRONMENT=production
      - PDF_REDACTION_API_LOG_LEVEL=INFO
      # LLM settings (optional, for custom LLM provider)
      # - PDF_REDACTION_API_LLM_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
      # - PDF_REDACTION_API_LLM_API_KEY=
      # - PDF_REDACTION_API_LLM_API_BASE_URL=
      # Processing limits (optional, defaults shown)
      # - PDF_REDACTION_API_PDF_PII_DETECT_MAX_PAGES=10
      # - PDF_REDACTION_API_PDF_REDACTION_MAX_PAGES=10
    # Optional: Use .env file for configuration
    env_file:
      - .env

📖 API Usage Example

Once the container is running, you can redact a document by sending a POST request to /api/anonymize/pdf/:

curl -X POST http://localhost:8002/api/anonymize/pdf/ \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.pdf"

For full documentation, visit: PDF Redaction API Docs

⚙️ Configuration

The PDF Redaction API can be configured using environment variables. Key configuration options:

Required

  • PDF_REDACTION_API_LICENSE: License key (required for production use)

Server Settings (Optional)

  • PDF_REDACTION_API_HOST: Server host (default: 127.0.0.1, use 0.0.0.0 for Docker)
  • PDF_REDACTION_API_PORT: Server port (default: 8002)
  • PDF_REDACTION_API_WORKERS_COUNT: Number of worker processes (default: 1)
  • PDF_REDACTION_API_ENVIRONMENT: Environment mode (default: production)
  • PDF_REDACTION_API_LOG_LEVEL: Logging level (default: INFO)

LLM Settings (Optional)

  • PDF_REDACTION_API_LLM_MODEL: LLM model identifier (e.g., meta-llama/llama-4-scout-17b-16e-instruct)
  • PDF_REDACTION_API_LLM_API_KEY: API key for LLM provider
  • PDF_REDACTION_API_LLM_API_BASE_URL: Base URL for LLM API

Processing Limits (Optional)

  • PDF_REDACTION_API_PDF_PII_DETECT_MAX_PAGES: Maximum pages for PII detection (default: 10)
  • PDF_REDACTION_API_PDF_REDACTION_MAX_PAGES: Maximum pages for redaction (default: 10)

🔒 Security & Compliance

This image is designed for high-security environments:

  • GDPR/CCPA Ready: Keep data inside your jurisdiction.
  • Stateless Processing: Documents are processed in memory or temp volumes and are not stored permanently by the API.
  • Resource Efficient: Optimized for VPS deployments with low memory overhead.
  • Multi-Language OCR: Supports OCR in 7 languages (English, Spanish, French, German, Italian, Portuguese, Russian) for international document processing.

💳 License

A valid API key is required to use the service. You can: