Self-Hosted PDF Redaction API
Deploy and configure the PDF Redaction API on your own infrastructure using Docker
PDF Redaction API: Self-Hosted & Secure Document Masking
Automate PII (Personally Identifiable Information) removal from PDF documents with a high-performance, self-hosted REST API. No data ever leaves your server.
The PDF Redaction API is a containerized solution designed for developers who need to protect sensitive data while maintaining total control over document privacy. It is ideal for GDPR, HIPAA, and CCPA compliance.
Docker Hub: stabrise/pdf-redaction-api
🚀 Key Features
- Automated Sensitive Data Detection: Remove emails, credit card numbers, tax IDs (DNI/NIE), phone numbers, and other PII automatically.
- Multi-Language OCR Support: Built-in Tesseract OCR with support for English, Spanish, French, German, Italian, Portuguese, and Russian.
- Deep Search & Custom Rules: Use predefined and custom rules for advanced redaction patterns.
- Total Data Privacy: Unlike cloud-based APIs, this Docker image runs on your infrastructure (on-premise). Your documents are never uploaded to the cloud.
- RESTful Architecture: Simple JSON-based API integration with any language (Python, JS, Go, PHP, etc.).
- Advanced Image Processing: Includes ffmpeg and image processing libraries for handling complex PDF structures.
🛠 Quick Start
🚀 Instant Install
Install the self-hosted PDF Redaction API using one of the following methods:
Option A (curl):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/StabRise/pdf-redaction-api/main/install.sh)"Option B (wget):
/bin/bash -c "$(wget -qO- https://raw.githubusercontent.com/StabRise/pdf-redaction-api/main/install.sh)"Docker
Pull the image from Docker Hub and run the PDF Redaction API on your VPS using Docker:
docker run -d -p 8002:8002 \
-e PDF_REDACTION_API_LICENSE=your_key_here \
stabrise/pdf-redaction-api:latestDocker Compose (Recommended)
For a professional setup with a persistent volume (best for your VPS setup):
services:
pdf-api:
image: stabrise/pdf-redaction-api:latest
restart: unless-stopped
ports:
- "8002:8002"
environment:
# Required: License key for production use
- PDF_REDACTION_API_LICENSE=${PDF_REDACTION_API_LICENSE}
# Server settings (optional, defaults shown)
- PDF_REDACTION_API_HOST=0.0.0.0
- PDF_REDACTION_API_PORT=8002
- PDF_REDACTION_API_WORKERS_COUNT=1
- PDF_REDACTION_API_ENVIRONMENT=production
- PDF_REDACTION_API_LOG_LEVEL=INFO
# LLM settings (optional, for custom LLM provider)
# - PDF_REDACTION_API_LLM_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
# - PDF_REDACTION_API_LLM_API_KEY=
# - PDF_REDACTION_API_LLM_API_BASE_URL=
# Processing limits (optional, defaults shown)
# - PDF_REDACTION_API_PDF_PII_DETECT_MAX_PAGES=10
# - PDF_REDACTION_API_PDF_REDACTION_MAX_PAGES=10
# Optional: Use .env file for configuration
env_file:
- .env📖 API Usage Example
Once the container is running, you can redact a document by sending a POST request to /api/anonymize/pdf/:
curl -X POST http://localhost:8002/api/anonymize/pdf/ \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"For full documentation, visit: PDF Redaction API Docs
⚙️ Configuration
The PDF Redaction API can be configured using environment variables. Key configuration options:
Required
- PDF_REDACTION_API_LICENSE: License key (required for production use)
Server Settings (Optional)
- PDF_REDACTION_API_HOST: Server host (default:
127.0.0.1, use0.0.0.0for Docker) - PDF_REDACTION_API_PORT: Server port (default:
8002) - PDF_REDACTION_API_WORKERS_COUNT: Number of worker processes (default:
1) - PDF_REDACTION_API_ENVIRONMENT: Environment mode (default:
production) - PDF_REDACTION_API_LOG_LEVEL: Logging level (default:
INFO)
LLM Settings (Optional)
- PDF_REDACTION_API_LLM_MODEL: LLM model identifier (e.g.,
meta-llama/llama-4-scout-17b-16e-instruct) - PDF_REDACTION_API_LLM_API_KEY: API key for LLM provider
- PDF_REDACTION_API_LLM_API_BASE_URL: Base URL for LLM API
Processing Limits (Optional)
- PDF_REDACTION_API_PDF_PII_DETECT_MAX_PAGES: Maximum pages for PII detection (default:
10) - PDF_REDACTION_API_PDF_REDACTION_MAX_PAGES: Maximum pages for redaction (default:
10)
🔒 Security & Compliance
This image is designed for high-security environments:
- GDPR/CCPA Ready: Keep data inside your jurisdiction.
- Stateless Processing: Documents are processed in memory or temp volumes and are not stored permanently by the API.
- Resource Efficient: Optimized for VPS deployments with low memory overhead.
- Multi-Language OCR: Supports OCR in 7 languages (English, Spanish, French, German, Italian, Portuguese, Russian) for international document processing.
💳 License
A valid API key is required to use the service. You can:
- Generate an API key at pdf-redaction.com/apikeys/
- Check your usage at pdf-redaction.com/apikeys/usage/
- Obtain a free trial or commercial license at pdf-redaction.com/licenses/
- Set the license using the
PDF_REDACTION_API_LICENSEenvironment variable or include it in a.envfile - The
setup.shscript will prompt and append the license to your.envfile if missing