Parse messy files

Document processing and PDF extraction skills

Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.

Try this task

I need my agent to read PDFs, extract tables, and turn documents into structured data.

Agent should be able to

  • +Read uploaded files
  • +Extract structured fields
  • +Prepare clean context for downstream agents

Workflow map

What to build with these skills

01

Extract tables from PDFs

02

Convert files to markdown

03

Run OCR over scans

04

Normalize document metadata

Best first installs

Start with high-signal skills

18 matched skills

PaddleOCR

VERIFIED

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

78K stars77 qualityMay 19, 2026 push
$ npx skills add PaddlePaddle/PaddleOCR

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

22K stars74 qualityMay 22, 2026 push
$ npx skills add opendataloader-project/opendataloader-pdf

MarkItDown

VERIFIED

Convert documents into Markdown for agent-readable context

125K stars79 qualityMay 22, 2026 push
$ npx skills add microsoft/markitdown

Skill shortlist

More options for this use case

Browse full marketplace

Crawlee Python

web-automation

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

9.1K stars69 quality

Firecrawl

agent-frameworks

🔥 Search, scrape, and clean the web for AI agents.

123K stars78 quality

Skills

security

A marketplace for AI-assisted security analysis and auditing plugins.

5.3K stars69 quality

RAGFlow

data

Build document intelligence and RAG workflows for agents

81K stars78 quality

Crawl4AI

web-automation

Open-source LLM-friendly web crawler and scraper

66K stars79 quality

Article Extractor

web-automation

To extract main article from given URL with Node.js

1.9K stars65 quality

Awesome Agent Skills

utility

A curated collection of over 380 agent skills from official teams and the community, enhancing AI capabilities.

23K stars73 quality

Kotaemon

data

An open-source RAG-based tool for chatting with your documents.

25K stars71 quality

Ppt Master

agent-frameworks

AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He

20K stars73 quality

Katana

web-automation

A next-generation crawling and spidering framework.

17K stars73 quality

WeKnora

data

Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.

15K stars73 quality

Pm Skills

productivity

A comprehensive marketplace for AI-driven product management skills and workflows.

12K stars71 quality

Paper Qa

data

High accuracy RAG for answering questions from scientific documents with citations

8.5K stars66 quality

Firecrawl

web-automation

Turn any website into LLM-ready markdown or structured data

123K stars76 quality

Browser Use

web-automation

Make AI agents interact with websites using natural language

95K stars75 quality