Parse messy files

Convert PDFs to markdown

Find skills for PDF parsing, OCR fallback, table extraction, and clean markdown conversion.

Resolve via API Open scenario Text version

Agent prompt

Find the best skill for converting PDF files into clean markdown while preserving headings, tables, and metadata.

Matched skills

34K

Top stars

Best first install

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

34K stars75 qualitydocument-processing

Open skill page Install handoff

Install with one command

$ npx skills add ocrmypdf/OCRmyPDF

Install targets

Install this skill in your agent workflow

Copy the registry command or an agent-specific install prompt for Codex, Claude Code, and Cursor.

skill install

OpenAgentSkill CLI

Use the registry command when your workflow supports the OpenAgentSkill installer.

$ npx skills add ocrmypdf/OCRmyPDF

Decision guide

Use and avoid conditions

Success criteria

Handles common PDFs
Keeps headings and tables usable
Reports extraction limits

Do not use when

The PDF is encrypted
Scanned documents need manual OCR review
Legal/medical data requires compliance review

Alternatives

Compare before installing

Compare top 4

PaddleOCR

971

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

83K starsdocument-processing

Stirling PDF

936

#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere

81K starsdocument-processing

Docling

932

Get your documents ready for gen AI

63K starsdocument-processing

MinerU

927

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

68K starsdocument-processing

Markitdown

924

Python tool for converting files and office documents to Markdown.

156K starsdocument-processing

Pandoc

900

Universal markup converter

45K starsdocument-processing