OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
$ npx skills add ocrmypdf/OCRmyPDFAlternatives
Compare similar skills by workflow fit, trust score, quality, GitHub adoption, maintenance, and install readiness.
Current skill
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
$ npx skills add ocrmypdf/OCRmyPDFTransforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
$ npx skills add opendatalab/MinerUPyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
$ npx skills add pymupdf/PyMuPDFPDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
$ npx skills add oomol-lab/pdf-craftThe official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
$ npx skills add bytedance/DolphinA web interface to extract tabular data from PDFs
$ npx skills add camelot-dev/excaliburOCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
$ npx skills add hiroi-sora/Umi-OCRA high-quality PDF to Markdown tool based on large language model visual recognition. 一款基于大模型视觉识别的高质量PDF转Markdown工具
$ npx skills add MarkPDFdown/markpdfdownA community-supported supercharged document management system: scan, index and archive all your documents
$ npx skills add paperless-ngx/paperless-ngxA developer-friendly API for converting many document formats into PDF files, and more!
$ npx skills add gotenberg/gotenbergPython tool for converting files and office documents to Markdown.
$ npx skills add microsoft/markitdownFile Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
$ npx skills add QuivrHQ/MegaParseA fast, helpful, and open-source document parser
$ npx skills add run-llama/liteparseGet your documents ready for gen AI
$ npx skills add docling-project/docling视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
$ npx skills add YaoFANGUK/video-subtitle-extractorTransforms PDF, Documents and Images into Enriched Structured Data
$ npx skills add axa-group/ParsrHow to choose
Use an alternative when it has a clearer install path, higher trust score, fresher maintenance, or better platform fit for your current agent stack. Keep Text Extract API if it already passes your workflow test and repository review.
Next step
Open the compare page, test the install commands in a sandbox, and check each repository before using a skill in production.