Document skills

PDF extraction skills for AI agents.

Compare skills for PDF parsing, OCR, table extraction, markdown conversion, document metadata, and agent-ready file processing.

Built for users searching for AI agent skills that can parse PDFs, extract tables, and convert documents into usable context.

Matched

16

Stars

527K

Input

PDF

Output

Markdown

Agent jobs

Start from a real workflow, not a keyword.

These pages are built for high-intent search and for agents that need a structured shortlist before installing third-party code.

01

Convert PDFs into clean markdown for agents

02

Extract tables and metadata from reports

03

Prepare legal, finance, and research documents for review

04

Use OCR fallback when scanned pages need text extraction

Ranked shortlist

High-signal skills to inspect first.

Open best list
67K stars

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

100

Quality

100

Trust

77

Fit

document-processingJun 11, 2026 pushUnknown
$ npx skills add opendatalab/MinerU
62K stars

Get your documents ready for gen AI

100

Quality

100

Trust

77

Fit

document-processingJun 14, 2026 pushMIT
$ npx skills add docling-project/docling

Knowledge Agents and Management in the Cloud

100

Quality

100

Trust

67

Fit

document-processingMay 18, 2026 pushMIT
$ npx skills add run-llama/llama_cloud_services
15K stars

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

100

Quality

100

Trust

72

Fit

document-processingJun 11, 2026 pushApache-2.0
$ npx skills add Unstructured-IO/unstructured
10K stars

A fast, helpful, and open-source document parser

100

Quality

100

Trust

71

Fit

document-processingJun 12, 2026 pushApache-2.0
$ npx skills add run-llama/liteparse
9.0K stars

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

100

Quality

100

Trust

66

Fit

document-processingMar 25, 2026 pushUnknown
$ npx skills add bytedance/Dolphin

Evaluation

How to choose the right skill.

Handles layout, headings, and tables without destroying context

Reports extraction limits and OCR uncertainty

Supports batch or repeatable processing

Documents privacy and local processing assumptions

Questions

Which PDF skill should I choose first?

Choose a skill that supports your document type, preserves tables or headings, and makes extraction failures visible instead of silently guessing.

Can these skills handle scanned PDFs?

Some can, but scanned PDFs usually need OCR and human review for high-stakes documents.