OpenAgentSkill guide

Best document processing skills for AI agents

Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.

When to use this guide

Start from the job, then shortlist the tools.

Extract tables from PDFs

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Convert files to markdown

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Run OCR over scans

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Normalize document metadata

Use quality and freshness signals to decide whether a skill belongs in this workflow.

Shortlist

Top skills to evaluate

Compare top 4
#1PaddleOCRExcellent · 10078K stars

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#2Opendataloader PdfExcellent · 10022K stars

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#3MarkItDownExcellent · 100125K stars

Convert documents into Markdown for agent-readable context

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#4Crawlee PythonExcellent · 1009.1K stars

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#5FirecrawlExcellent · 100123K stars

🔥 Search, scrape, and clean the web for AI agents.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#6SkillsExcellent · 1005.3K stars

A marketplace for AI-assisted security analysis and auditing plugins.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#7RAGFlowExcellent · 10081K stars

Build document intelligence and RAG workflows for agents

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#8Crawl4AIExcellent · 10066K stars

Open-source LLM-friendly web crawler and scraper

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#9Article ExtractorExcellent · 1001.9K stars

To extract main article from given URL with Node.js

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.

#10Awesome Agent SkillsExcellent · 10023K stars

A curated collection of over 380 agent skills from official teams and the community, enhancing AI capabilities.

Best fit: High-confidence pick with strong adoption and healthy maintenance signals.