PDF Redaction

Anonimizar PDF con entrada base64

Anonimice un archivo PDF detectando y redactando PII (Información de Identificación Personal). Acepta PDF como cadena codificada en base64 en el cuerpo de la solicitud. Devuelve PDF anonimizado, entidades PII detectadas y métricas de procesamiento. Admite múltiples idiomas OCR, detección de texto rotado y detección personalizable de etiquetas PII. Solo procesa la primera página del PDF.

POST
/api/anonymize/pdf

Authorization

APIKeyHeader
X-API-Key<token>

In: header

Request Body

application/json

pdf*string

Base64-encoded PDF document to be processed

tags?Tags

List of predefined PII tags to detect and redact. If empty, all available tags are used

Default[]
force_ocr?boolean

Force OCR processing even if text is extractable from PDF

Defaultfalse
ocr_langs?Ocr Langs

List of OCR languages to use for text recognition. Available: ENG, SPA, FRA, DEU, ITA, POR, RUS. Multiple languages can be specified for multilingual documents

Default["eng"]
rotated_text?boolean

Enable detection and recognition of rotated text in the document

Defaultfalse
redact_text?boolean

Enable text redaction using NER. When enabled, detected PII entities are redacted (blacked out) in the output PDF

Defaulttrue
min_chunk_size?integer

Minimum chunk size for text processing. Used to control text segmentation for NER processing. Larger values may improve accuracy but increase processing time

Default0
custom_tags?|null

List of custom tags to detect and redact. These tags are added to the standard PII tags

Response Body

application/json

application/json

application/json

application/json

curl -X POST "https://api.pdf-redaction.com/api/anonymize/pdf" \  -H "Content-Type: application/json" \  -d '{    "custom_tags": [      "CUSTOM_TAG_1",      "CUSTOM_TAG_2"    ],    "force_ocr": false,    "min_chunk_size": 0,    "ocr_langs": [      "eng"    ],    "pdf": "base64_encoded_pdf_string",    "redact_text": true,    "rotated_text": false,    "tags": [      "DATE",      "PERSON_NAME",      "EMAIL",      "PHONE"    ]  }'
{
  "pdf": "base64_encoded_pdf_string",
  "detected_pii": {
    "path": "memory",
    "entities": [
      {
        "entity_group": "PERSON_NAME",
        "score": 0.95,
        "word": "John Doe",
        "start": 0,
        "end": 8,
        "boxes": [
          {
            "text": "John Doe",
            "score": 0.95,
            "x": 100,
            "y": 200,
            "width": 150,
            "height": 25
          }
        ]
      },
      {
        "entity_group": "EMAIL",
        "score": 0.98,
        "word": "john.doe@example.com",
        "start": 0,
        "end": 20,
        "boxes": [
          {
            "text": "john.doe@example.com",
            "score": 0.98,
            "x": 100,
            "y": 250,
            "width": 200,
            "height": 25
          }
        ]
      }
    ],
    "exception": "",
    "json": ""
  },
  "processing_time": {
    "total": 2.9577243328094482,
    "stages": {
      "PdfDataToSingleImage": 0.5639204978942871,
      "PdfDataToDocument": 0.0037207603454589844,
      "Ocr": 0.0004279613494873047,
      "Ner": 1.7787754535675049,
      "ImageDrawBoxes": 0.5246167182922363,
      "SingleImageToPdf": 0.08531355857849121
    }
  }
}
{
  "error_code": "LLM_CALL_ERROR",
  "message": "string"
}
Empty
{
  "error_code": "LLM_CALL_ERROR",
  "message": "string"
}
{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}
Empty