OpenAgentSkillRegistry

Star0 Submit Skill

OpenAgentSkill guide

Best document processing skills for AI agents

Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.

Open use-case page Browse excellent matches

When to use this guide

Start from the job, then shortlist the tools.

Extract tables from PDFs

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Convert files to markdown

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Run OCR over scans

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Normalize document metadata

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Shortlist

Top skills to evaluate

#1MinerUExcellent · 10068K stars

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#2DoclingExcellent · 10063K stars

Get your documents ready for gen AI

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#3UnstructuredExcellent · 10015K stars

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#4LiteparseExcellent · 10010K stars

A fast, helpful, and open-source document parser

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#5PypdfExcellent · 10010K stars

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#6MarkitdownExcellent · 100156K stars

Python tool for converting files and office documents to Markdown.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#7PaddleOCRExcellent · 10083K stars

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#8MarkedExcellent · 10037K stars

A markdown parser and compiler. Built for speed.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#9OCRmyPDFExcellent · 10034K stars

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#10Koodo ReaderExcellent · 10027K stars

A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.