Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
Excellent quality, 67K stars, and a 43 use-case fit score.
$ npx skills add opendatalab/MinerUParse messy files
Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.
Builders choosing skills for extract tables from pdfs and convert files to markdown. Ranked from the OpenAgentSkill index using quality, trust, freshness, adoption, and install readiness.
Workflow
Extract tables from PDFs
Workflow
Convert files to markdown
Workflow
Run OCR over scans
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
Excellent quality, 67K stars, and a 43 use-case fit score.
$ npx skills add opendatalab/MinerUGet your documents ready for gen AI
Excellent quality, 61K stars, and a 39 use-case fit score.
$ npx skills add docling-project/doclingKnowledge Agents and Management in the Cloud
Excellent quality, 4.3K stars, and a 37 use-case fit score.
$ npx skills add run-llama/llama_cloud_servicesConvert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Excellent quality, 15K stars, and a 35 use-case fit score.
$ npx skills add Unstructured-IO/unstructuredA fast, helpful, and open-source document parser
Excellent quality, 9.9K stars, and a 34 use-case fit score.
$ npx skills add run-llama/liteparseThe official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Excellent quality, 9.0K stars, and a 34 use-case fit score.
$ npx skills add bytedance/DolphinA pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Excellent quality, 10K stars, and a 32 use-case fit score.
$ npx skills add py-pdf/pypdfAll-in-One Development Tool based on PaddlePaddle
Excellent quality, 6.2K stars, and a 31 use-case fit score.
$ npx skills add PaddlePaddle/PaddleXPython tool for converting files and office documents to Markdown.
Excellent quality, 152K stars, and a 31 use-case fit score.
$ npx skills add microsoft/markitdownA Repo For Document AI
Excellent quality, 3.2K stars, and a 30 use-case fit score.
$ npx skills add deepdoctection/deepdoctectionOCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Excellent quality, 34K stars, and a 30 use-case fit score.
$ npx skills add ocrmypdf/OCRmyPDFA modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web
Excellent quality, 27K stars, and a 30 use-case fit score.
$ npx skills add koodo-reader/koodo-readerOCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Excellent quality, 45K stars, and a 30 use-case fit score.
$ npx skills add hiroi-sora/Umi-OCR🪐 Markdown with superpowers: from ideas to papers, presentations, websites, books, and knowledge bases.
Excellent quality, 15K stars, and a 29 use-case fit score.
$ npx skills add iamgio/quarkdownXournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.
Excellent quality, 15K stars, and a 29 use-case fit score.
$ npx skills add xournalpp/xournalppUniversal File Online Preview Project based on Spring-Boot
Excellent quality, 14K stars, and a 29 use-case fit score.
$ npx skills add kekingcn/kkFileView🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
Excellent quality, 13K stars, and a 29 use-case fit score.
$ npx skills add T8RIN/ImageToolboxYour One-Stop Publication Workbench
Excellent quality, 13K stars, and a 29 use-case fit score.
$ npx skills add Zettlr/ZettlrOCR model that handles complex tables, forms, handwriting with full layout.
Excellent quality, 11K stars, and a 29 use-case fit score.
$ npx skills add datalab-to/chandraPyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Excellent quality, 10.0K stars, and a 28 use-case fit score.
$ npx skills add pymupdf/PyMuPDFA pure PHP library for reading and writing word processing documents
Excellent quality, 7.6K stars, and a 28 use-case fit score.
$ npx skills add PHPOffice/PHPWordPDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
Excellent quality, 5.8K stars, and a 28 use-case fit score.
$ npx skills add oomol-lab/pdf-craftRust 程序设计语言(2024 edition 施工完毕)
Excellent quality, 5.5K stars, and a 28 use-case fit score.
$ npx skills add KaiserY/trpl-zh-cnTransforms PDF, Documents and Images into Enriched Structured Data
Excellent quality, 6.2K stars, and a 28 use-case fit score.
$ npx skills add axa-group/ParsrOpen-source office suite pack that comprises all the tools you need to work with documents, spreadsheets, presentations, PDFs, and PDF forms on Windows, Linux, and macOS
Excellent quality, 4.9K stars, and a 28 use-case fit score.
$ npx skills add ONLYOFFICE/DesktopEditorsTurn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Excellent quality, 82K stars, and a 28 use-case fit score.
$ npx skills add PaddlePaddle/PaddleOCR#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere
Excellent quality, 81K stars, and a 28 use-case fit score.
$ npx skills add Stirling-Tools/Stirling-PDFTesseract Open Source OCR Engine (main repository)
Excellent quality, 75K stars, and a 28 use-case fit score.
$ npx skills add tesseract-ocr/tesseractAI comic and manga translator app/browser extension for automatically translating comics, manga, manhwa, BDs, fumetti, and more in multiple languages and formats (Images, PDF, EPUB, CBR, CBZ etc).
Excellent quality, 2.8K stars, and a 27 use-case fit score.
$ npx skills add ogkalu2/comic-translateRead and extract text and other content from PDFs in C# (port of PDFBox)
Excellent quality, 2.5K stars, and a 27 use-case fit score.
$ npx skills add UglyToad/PdfPigSelection method
OpenAgentSkill scores each candidate against the workflow keywords, then balances fit with GitHub stars, quality signals, trust profile, maintenance freshness, and whether there is a clear install path.
The ranking combines workflow fit, quality score, trust profile, GitHub adoption, maintenance freshness, and whether a clear install path exists.
No. Treat the list as a shortlist, open the skill detail page, inspect the repository and license, then test the install command in a sandbox workflow.
Yes. Use /api/skills/search with the related task or /api/agent/rankings?slug=best-document-processing-skills to fetch ranked skill data.