PaddleOCR
VERIFIEDTurn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
$ npx skills add PaddlePaddle/PaddleOCRParse messy files
Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.
Try this task
I need my agent to read PDFs, extract tables, and turn documents into structured data.
Agent should be able to
Workflow map
Extract tables from PDFs
Convert files to markdown
Run OCR over scans
Normalize document metadata
Best first installs
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
$ npx skills add PaddlePaddle/PaddleOCRPDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
$ npx skills add opendataloader-project/opendataloader-pdfConvert documents into Markdown for agent-readable context
$ npx skills add microsoft/markitdownSkill shortlist
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
🔥 Search, scrape, and clean the web for AI agents.
A marketplace for AI-assisted security analysis and auditing plugins.
Build document intelligence and RAG workflows for agents
Open-source LLM-friendly web crawler and scraper
To extract main article from given URL with Node.js
A curated collection of over 380 agent skills from official teams and the community, enhancing AI capabilities.
An open-source RAG-based tool for chatting with your documents.
AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He
A next-generation crawling and spidering framework.
Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.
A comprehensive marketplace for AI-driven product management skills and workflows.
High accuracy RAG for answering questions from scientific documents with citations
Turn any website into LLM-ready markdown or structured data
Make AI agents interact with websites using natural language