Parse messy files

Document processing and PDF extraction skills

Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.

Browse matching skills Read guide Resolve via Agent API

Try this task

I need my agent to read PDFs, extract tables, and turn documents into structured data.

Matched

Strong trust

Install ready

Auto allowed

500+ stars

Agent should be able to

+Read uploaded files
+Extract structured fields
+Prepare clean context for downstream agents

Resolve

Let the agent pick

Returns the best skill, alternatives, install handoff, risk summary, and safety gate.

Text plan

LLM-readable output

Plain text version for Codex, Claude Code, Cursor, and custom agent runtimes.

Browse

Human shortlist

Open the filtered registry view for this workflow and compare candidates manually.

Workflow map

What to build with these skills

Extract tables from PDFs

Convert files to markdown

Run OCR over scans

Normalize document metadata

Best first installs

Start with high-signal skills

18 matched skills

MinerU

VERIFIED

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

68K stars90 trust93 auditREVIEWED

Install

Human review before install

Risk

Safe to try

Agent fit

Claude Code + CLI

Updated

Jun 17, 2026

$ npx skills add opendatalab/MinerU

Docling

VERIFIED

Get your documents ready for gen AI

63K stars89 trust94 auditREVIEWED

Install

Agent install candidate

Risk

Safe to try

Agent fit

Claude Code + CLI

Updated

Jul 6, 2026

$ npx skills add docling-project/docling

Llama Cloud Services

VERIFIED

Knowledge Agents and Management in the Cloud

4.3K stars88 trust92 auditVERIFIED

Install

Agent install candidate

Risk

Safe to try

Agent fit

Claude Code + CLI

Updated

May 18, 2026

$ npx skills add run-llama/llama_cloud_services

Skill shortlist

More options for this use case

Browse full marketplace

Pdf Inspector

document-processingResearch

Fast Rust library for PDF inspection, classification, and text extraction. Intelligently detects scanned vs text-based PDFs to enable smart routing decisions.

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

1.5K stars87 trust91 audit75 safety

Unstructured

document-processingResearch

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

15K stars93 trust95 audit87 safety

Liteparse

document-processingResearch

A fast, helpful, and open-source document parser

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

10K stars91 trust94 audit86 safety

Pypdf

document-processingResearch

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

10K stars90 trust93 audit77 safety

PaddleX

document-processingResearch

All-in-One Development Tool based on PaddlePaddle

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

6.2K stars91 trust94 audit86 safety

Markitdown

document-processingResearch

Python tool for converting files and office documents to Markdown.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

156K stars90 trust92 audit84 safety

PaddleOCR

document-processingResearch

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

83K stars94 trust95 audit87 safety

Deepdoctection

document-processingResearch

A Repo For Document AI

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

3.2K stars86 trust93 audit77 safety

Marked

document-processingResearch

A markdown parser and compiler. Built for speed.

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

37K stars88 trust92 audit76 safety

OCRmyPDF

document-processingResearch

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

34K stars93 trust95 audit87 safety

Core

document-processingResearch

A modern PDF library for TypeScript. Parse, modify, and generate PDFs with a clean, intuitive API.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

1.8K stars89 trust93 audit85 safety

Koodo Reader

document-processingResearch

A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

27K stars93 trust95 audit87 safety

Umi OCR

document-processingResearch

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

Low metadata risk · Review the audit page, then allow agent install in a sandboxed workflow.

45K stars89 trust89 audit81 safety

Markdown It Py

document-processingResearch

Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

1.3K stars91 trust94 audit86 safety

Markdown It

document-processingResearch

Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

22K stars92 trust93 audit85 safety

FAQ

How to choose skills for this workflow

These answers are written for both human builders and agents consuming the Registry API.

What are the best AI agent skills for document processing?

Start by comparing MinerU, Docling, Llama Cloud Services. OpenAgentSkill ranks them by workflow fit, GitHub adoption, trust score, safety gate, and install readiness.

Can an AI agent use this page directly?

Yes. Use the linked Registry API prompt to query /api/skills/search with the task: "I need my agent to read PDFs, extract tables, and turn documents into structured data." and retrieve install handoff links for the top results.

Should I install every recommended skill?

No. Start with the highest-fit skill, test it in a sandbox workflow, and add companion skills only when the task needs extra coverage.