Parse messy files

Best document processing skills for AI agents

Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.

Builders choosing skills for extract tables from pdfs and convert files to markdown. Ranked from the OpenAgentSkill index using quality, trust, freshness, adoption, and install readiness.

30
Ranked
809K
Stars
100
Top trust

Workflow

Extract tables from PDFs

Workflow

Convert files to markdown

Workflow

Run OCR over scans

#1

MinerU

43 fitTrust 100Excellent 100

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Excellent quality, 67K stars, and a 43 use-case fit score.

67K starsJun 11, 2026 pushProduction candidatePythonPDF
$ npx skills add opendatalab/MinerU
#2

Docling

39 fitTrust 100Excellent 100

Get your documents ready for gen AI

Excellent quality, 61K stars, and a 39 use-case fit score.

61K starsJun 12, 2026 pushProduction candidatePythonPDF
$ npx skills add docling-project/docling
#3

Llama Cloud Services

37 fitTrust 100Excellent 100

Knowledge Agents and Management in the Cloud

Excellent quality, 4.3K stars, and a 37 use-case fit score.

4.3K starsMay 18, 2026 pushProduction candidateTypeScriptPDF
$ npx skills add run-llama/llama_cloud_services
#4

Unstructured

35 fitTrust 100Excellent 100

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Excellent quality, 15K stars, and a 35 use-case fit score.

15K starsJun 11, 2026 pushProduction candidateHTMLPDF
$ npx skills add Unstructured-IO/unstructured
#5

Liteparse

34 fitTrust 100Excellent 100

A fast, helpful, and open-source document parser

Excellent quality, 9.9K stars, and a 34 use-case fit score.

9.9K starsJun 11, 2026 pushProduction candidateRustPDF
$ npx skills add run-llama/liteparse
#6

Dolphin

34 fitTrust 100Excellent 100

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Excellent quality, 9.0K stars, and a 34 use-case fit score.

9.0K starsMar 25, 2026 pushProduction candidatePythonPDF
$ npx skills add bytedance/Dolphin
#7

Pypdf

32 fitTrust 100Excellent 100

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Excellent quality, 10K stars, and a 32 use-case fit score.

10K starsJun 11, 2026 pushProduction candidatePythonPDF
$ npx skills add py-pdf/pypdf
#8

PaddleX

31 fitTrust 100Excellent 100

All-in-One Development Tool based on PaddlePaddle

Excellent quality, 6.2K stars, and a 31 use-case fit score.

6.2K starsJun 12, 2026 pushProduction candidatePythonOCR
$ npx skills add PaddlePaddle/PaddleX
#9

Markitdown

31 fitTrust 100Excellent 100

Python tool for converting files and office documents to Markdown.

Excellent quality, 152K stars, and a 31 use-case fit score.

152K starsMay 26, 2026 pushProduction candidatePythonPDF
$ npx skills add microsoft/markitdown
#10

Deepdoctection

30 fitTrust 100Excellent 100

A Repo For Document AI

Excellent quality, 3.2K stars, and a 30 use-case fit score.

3.2K starsJun 12, 2026 pushProduction candidatePythonOCR
$ npx skills add deepdoctection/deepdoctection
#11

OCRmyPDF

30 fitTrust 100Excellent 100

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Excellent quality, 34K stars, and a 30 use-case fit score.

34K starsJun 12, 2026 pushProduction candidatePythonPDF
$ npx skills add ocrmypdf/OCRmyPDF
#12

Koodo Reader

30 fitTrust 100Excellent 100

A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web

Excellent quality, 27K stars, and a 30 use-case fit score.

27K starsJun 12, 2026 pushProduction candidateJavaScriptPDF
$ npx skills add koodo-reader/koodo-reader
#13

Umi OCR

30 fitTrust 100Excellent 100

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Excellent quality, 45K stars, and a 30 use-case fit score.

45K starsNov 20, 2025 pushProduction candidatePythonOCR
$ npx skills add hiroi-sora/Umi-OCR
#14

Quarkdown

29 fitTrust 100Excellent 100

🪐 Markdown with superpowers: from ideas to papers, presentations, websites, books, and knowledge bases.

Excellent quality, 15K stars, and a 29 use-case fit score.

15K starsJun 12, 2026 pushProduction candidateKotlinPDF
$ npx skills add iamgio/quarkdown
#15

Xournalpp

29 fitTrust 100Excellent 100

Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.

Excellent quality, 15K stars, and a 29 use-case fit score.

15K starsJun 11, 2026 pushProduction candidateC++PDF
$ npx skills add xournalpp/xournalpp
#16

KkFileView

29 fitTrust 100Excellent 100

Universal File Online Preview Project based on Spring-Boot

Excellent quality, 14K stars, and a 29 use-case fit score.

14K starsJun 11, 2026 pushProduction candidateJavaPDF
$ npx skills add kekingcn/kkFileView
#17

ImageToolbox

29 fitTrust 100Excellent 100

🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options

Excellent quality, 13K stars, and a 29 use-case fit score.

13K starsJun 11, 2026 pushProduction candidateKotlinPDF
$ npx skills add T8RIN/ImageToolbox
#18

Zettlr

29 fitTrust 100Excellent 100

Your One-Stop Publication Workbench

Excellent quality, 13K stars, and a 29 use-case fit score.

13K starsJun 11, 2026 pushProduction candidateTypeScriptPDF
$ npx skills add Zettlr/Zettlr
#19

Chandra

29 fitTrust 100Excellent 100

OCR model that handles complex tables, forms, handwriting with full layout.

Excellent quality, 11K stars, and a 29 use-case fit score.

11K starsApr 22, 2026 pushProduction candidatePythonOCR
$ npx skills add datalab-to/chandra
#20

PyMuPDF

28 fitTrust 100Excellent 100

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Excellent quality, 10.0K stars, and a 28 use-case fit score.

10.0K starsJun 11, 2026 pushProduction candidatePythonPDF
$ npx skills add pymupdf/PyMuPDF
#21

PHPWord

28 fitTrust 100Excellent 100

A pure PHP library for reading and writing word processing documents

Excellent quality, 7.6K stars, and a 28 use-case fit score.

7.6K starsMay 18, 2026 pushProduction candidatePHPPDF
$ npx skills add PHPOffice/PHPWord
#22

Pdf Craft

28 fitTrust 100Excellent 100

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

Excellent quality, 5.8K stars, and a 28 use-case fit score.

5.8K starsJun 6, 2026 pushProduction candidatePythonPDF
$ npx skills add oomol-lab/pdf-craft
#23

Trpl Zh Cn

28 fitTrust 100Excellent 100

Rust 程序设计语言(2024 edition 施工完毕)

Excellent quality, 5.5K stars, and a 28 use-case fit score.

5.5K starsJun 10, 2026 pushProduction candidateMarkdownPDF
$ npx skills add KaiserY/trpl-zh-cn
#24

Parsr

28 fitTrust 100Excellent 100

Transforms PDF, Documents and Images into Enriched Structured Data

Excellent quality, 6.2K stars, and a 28 use-case fit score.

6.2K starsMar 20, 2026 pushProduction candidateJavaScriptPDF
$ npx skills add axa-group/Parsr
#25

DesktopEditors

28 fitTrust 100Excellent 100

Open-source office suite pack that comprises all the tools you need to work with documents, spreadsheets, presentations, PDFs, and PDF forms on Windows, Linux, and macOS

Excellent quality, 4.9K stars, and a 28 use-case fit score.

4.9K starsMay 21, 2026 pushProduction candidatePDFClaude Code
$ npx skills add ONLYOFFICE/DesktopEditors
#26

PaddleOCR

28 fitTrust 100Excellent 100

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Excellent quality, 82K stars, and a 28 use-case fit score.

82K starsJun 12, 2026 pushProduction candidatePythonRAG
$ npx skills add PaddlePaddle/PaddleOCR
#27

Stirling PDF

28 fitTrust 100Excellent 100

#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere

Excellent quality, 81K stars, and a 28 use-case fit score.

81K starsJun 12, 2026 pushProduction candidateTypeScriptPDF
$ npx skills add Stirling-Tools/Stirling-PDF
#28

Tesseract

28 fitTrust 100Excellent 100

Tesseract Open Source OCR Engine (main repository)

Excellent quality, 75K stars, and a 28 use-case fit score.

75K starsJun 4, 2026 pushProduction candidateC++OCR
$ npx skills add tesseract-ocr/tesseract
#29

Comic Translate

27 fitTrust 100Excellent 100

AI comic and manga translator app/browser extension for automatically translating comics, manga, manhwa, BDs, fumetti, and more in multiple languages and formats (Images, PDF, EPUB, CBR, CBZ etc).

Excellent quality, 2.8K stars, and a 27 use-case fit score.

2.8K starsJun 6, 2026 pushProduction candidatePythonOCR
$ npx skills add ogkalu2/comic-translate
#30

PdfPig

27 fitTrust 100Excellent 100

Read and extract text and other content from PDFs in C# (port of PDFBox)

Excellent quality, 2.5K stars, and a 27 use-case fit score.

2.5K starsJun 12, 2026 pushProduction candidateC#PDF
$ npx skills add UglyToad/PdfPig

Selection method

How this list is ranked

OpenAgentSkill scores each candidate against the workflow keywords, then balances fit with GitHub stars, quality signals, trust profile, maintenance freshness, and whether there is a clear install path.

How does OpenAgentSkill rank document processing?

The ranking combines workflow fit, quality score, trust profile, GitHub adoption, maintenance freshness, and whether a clear install path exists.

Should I install the top skill immediately?

No. Treat the list as a shortlist, open the skill detail page, inspect the repository and license, then test the install command in a sandbox workflow.

Can my agent consume this ranking through an API?

Yes. Use /api/skills/search with the related task or /api/agent/rankings?slug=best-document-processing-skills to fetch ranked skill data.