Parse messy files

Best document processing skills for AI agents

Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.

30
Ranked
1.2M
Stars
100
Top score
#1

PaddleOCR

28 fitExcellent · 100

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Excellent quality, 78K stars, and a 28 use-case fit score.

78K stars10K forksMay 19, 2026 pushdata
$ npx skills add PaddlePaddle/PaddleOCR
#2

Opendataloader Pdf

26 fitExcellent · 100

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

Excellent quality, 22K stars, and a 26 use-case fit score.

22K stars2.0K forksMay 22, 2026 pushdata
$ npx skills add opendataloader-project/opendataloader-pdf
#3

MarkItDown

25 fitExcellent · 100

Convert documents into Markdown for agent-readable context

Excellent quality, 125K stars, and a 25 use-case fit score.

125K stars8.5K forksMay 22, 2026 pushdata
$ npx skills add microsoft/markitdown
#4

Crawlee Python

22 fitExcellent · 100

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Excellent quality, 9.1K stars, and a 22 use-case fit score.

9.1K stars744 forksMay 22, 2026 pushweb-automation
$ npx skills add apify/crawlee-python
#5

Firecrawl

22 fitExcellent · 100

🔥 Search, scrape, and clean the web for AI agents.

Excellent quality, 123K stars, and a 22 use-case fit score.

123K stars7.5K forksMay 23, 2026 pushagent-frameworks
$ npx skills add firecrawl/firecrawl
#6

Skills

22 fitExcellent · 100

A marketplace for AI-assisted security analysis and auditing plugins.

Excellent quality, 5.3K stars, and a 22 use-case fit score.

5.3K stars472 forksMay 16, 2026 pushsecurity
$ npx skills add trailofbits/skills
#7

RAGFlow

22 fitExcellent · 100

Build document intelligence and RAG workflows for agents

Excellent quality, 81K stars, and a 22 use-case fit score.

81K stars9.3K forksMay 22, 2026 pushdata
$ npx skills add infiniflow/ragflow
#8

Crawl4AI

22 fitExcellent · 100

Open-source LLM-friendly web crawler and scraper

Excellent quality, 66K stars, and a 22 use-case fit score.

66K stars6.8K forksMay 22, 2026 pushweb-automation
$ npx skills add unclecode/crawl4ai
#9

Article Extractor

21 fitExcellent · 100

To extract main article from given URL with Node.js

Excellent quality, 1.9K stars, and a 21 use-case fit score.

1.9K stars160 forksMay 3, 2026 pushweb-automation
$ npx skills add extractus/article-extractor
#10

Awesome Agent Skills

20 fitExcellent · 100

A curated collection of over 380 agent skills from official teams and the community, enhancing AI capabilities.

Excellent quality, 23K stars, and a 20 use-case fit score.

23K stars2.4K forksMay 10, 2026 pushutility
$ npx skills add VoltAgent/awesome-agent-skills
#11

Kotaemon

20 fitExcellent · 100

An open-source RAG-based tool for chatting with your documents.

Excellent quality, 25K stars, and a 20 use-case fit score.

25K stars2.1K forksApr 3, 2026 pushdata
$ npx skills add Cinnamon/kotaemon
#12

Ppt Master

20 fitExcellent · 100

AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He

Excellent quality, 20K stars, and a 20 use-case fit score.

20K stars1.8K forksMay 23, 2026 pushagent-frameworks
$ npx skills add hugohe3/ppt-master
#13

Katana

20 fitExcellent · 100

A next-generation crawling and spidering framework.

Excellent quality, 17K stars, and a 20 use-case fit score.

17K stars1.1K forksMay 21, 2026 pushweb-automation
$ npx skills add projectdiscovery/katana
#14

WeKnora

20 fitExcellent · 100

Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.

Excellent quality, 15K stars, and a 20 use-case fit score.

15K stars2.0K forksMay 22, 2026 pushdata
$ npx skills add Tencent/WeKnora
#15

Pm Skills

20 fitExcellent · 100

A comprehensive marketplace for AI-driven product management skills and workflows.

Excellent quality, 12K stars, and a 20 use-case fit score.

12K stars1.4K forksMay 20, 2026 pushproductivity
$ npx skills add phuryn/pm-skills
#16

Paper Qa

19 fitExcellent · 100

High accuracy RAG for answering questions from scientific documents with citations

Excellent quality, 8.5K stars, and a 19 use-case fit score.

8.5K stars870 forksMar 20, 2026 pushdata
$ npx skills add Future-House/paper-qa
#17

Firecrawl

19 fitExcellent · 100

Turn any website into LLM-ready markdown or structured data

Excellent quality, 123K stars, and a 19 use-case fit score.

123K stars7.5K forksMay 22, 2026 pushweb-automation
$ npx skills add firecrawl/firecrawl
#18

Browser Use

19 fitExcellent · 100

Make AI agents interact with websites using natural language

Excellent quality, 95K stars, and a 19 use-case fit score.

95K stars11K forksMay 23, 2026 pushweb-automation
$ npx skills add browser-use/browser-use
#19

Awesome Agent Skills

19 fitExcellent · 98

A comprehensive resource for skills and tools tailored for AI coding agents.

Excellent quality, 5.0K stars, and a 19 use-case fit score.

5.0K stars455 forksApr 5, 2026 pushdevelopment
$ npx skills add heilcheng/awesome-agent-skills
#20

Scrapy

18 fitExcellent · 100

High-throughput crawling and scraping for agent data pipelines

Excellent quality, 62K stars, and a 18 use-case fit score.

62K stars12K forksMay 20, 2026 pushweb-automation
$ npx skills add scrapy/scrapy
#21

Trafilatura

18 fitExcellent · 91

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Excellent quality, 6.0K stars, and a 18 use-case fit score.

6.0K stars373 forksSep 12, 2025 pushweb-automation
$ npx skills add adbar/trafilatura
#22

Open Design

18 fitExcellent · 100

Open Design is a powerful, local-first design tool that integrates multiple coding-agent CLIs for generating various design outputs.

Excellent quality, 50K stars, and a 18 use-case fit score.

50K stars5.7K forksMay 23, 2026 pushproductivity
$ npx skills add nexu-io/open-design
#23

Awesome Openclaw Skills

18 fitExcellent · 100

A comprehensive collection of community-built OpenClaw skills organized by category.

Excellent quality, 49K stars, and a 18 use-case fit score.

49K stars4.8K forksMay 20, 2026 pushutility
$ npx skills add VoltAgent/awesome-openclaw-skills
#24

Career Ops

18 fitExcellent · 100

AI-powered job search system built on Claude Code. 14 skill modes, Go dashboard, PDF generation, batch processing.

Excellent quality, 47K stars, and a 18 use-case fit score.

47K stars9.8K forksMay 22, 2026 pushagent-frameworks
$ npx skills add santifer/career-ops
#25

EasySpider

18 fitExcellent · 100

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

Excellent quality, 44K stars, and a 18 use-case fit score.

44K stars5.3K forksMay 22, 2026 pushweb-automation
$ npx skills add NaiboWang/EasySpider
#26

Hamilton

18 fitExcellent · 100

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Excellent quality, 2.5K stars, and a 18 use-case fit score.

2.5K stars190 forksMay 20, 2026 pushdevelopment
$ npx skills add apache/hamilton
#27

Antigravity Awesome Skills

18 fitExcellent · 100

A comprehensive library of over 1,273 agentic skills for various AI coding assistants, featuring clear documentation and installation instructions.

Excellent quality, 38K stars, and a 18 use-case fit score.

38K stars6.3K forksMay 22, 2026 pushdevelopment
$ npx skills add sickn33/antigravity-awesome-skills
#28

WaterCrawl

18 fitExcellent · 93

Transform Web Content into LLM-Ready Data

Excellent quality, 1.8K stars, and a 18 use-case fit score.

1.8K stars224 forksMay 20, 2026 pushweb-automation
$ npx skills add watercrawl/WaterCrawl
#29

PageIndex

18 fitExcellent · 100

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

Excellent quality, 32K stars, and a 18 use-case fit score.

32K stars2.8K forksMay 15, 2026 pushagent-frameworks
$ npx skills add VectifyAI/PageIndex
#30

Browserless

18 fitExcellent · 100

The headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.

Excellent quality, 1.8K stars, and a 18 use-case fit score.

1.8K stars90 forksMay 22, 2026 pushweb-automation
$ npx skills add microlinkhq/browserless