Parse messy files
Convert PDFs to markdown
Find skills for PDF parsing, OCR fallback, table extraction, and clean markdown conversion.
Agent prompt
Find the best skill for converting PDF files into clean markdown while preserving headings, tables, and metadata.
Best first install
Pdf Inspector
Fast Rust library for PDF inspection, classification, and text extraction. Intelligently detects scanned vs text-based PDFs to enable smart routing decisions.
Install with one command
$ npx skills add firecrawl/pdf-inspectorInstall targets
Install this skill in your agent workflow
Copy the registry command or an agent-specific install prompt for Codex, Claude Code, and Cursor.
OpenAgentSkill CLI
Use the registry command when your workflow supports the OpenAgentSkill installer.
$ npx skills add firecrawl/pdf-inspectorDecision guide
Use and avoid conditions
Success criteria
- Handles common PDFs
- Keeps headings and tables usable
- Reports extraction limits
Do not use when
- The PDF is encrypted
- Scanned documents need manual OCR review
- Legal/medical data requires compliance review
Alternatives
Compare before installing
Pdf Craft
817PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
PdfPig
813Read and extract text and other content from PDFs in C# (port of PDFBox)
Retain Pdf
812在保留版面、公式与结构的前提下进行 PDF 翻译,适用于科研与技术文档
PyMuPDF
803PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
MarkdownMonster
800An extensible Markdown Editor, Viewer and Weblog Publisher for Windows
Milewski Ctfp Pdf
792Bartosz Milewski's 'Category Theory for Programmers' unofficial PDF and LaTeX source