Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
$ npx skills add opendatalab/MinerUDocument shortlist
A practical guide to PDF parsing skills for Claude Code users: extract tables, convert PDFs to markdown, prepare documents for RAG, and review audit risk before installing.
Decision prompt
I need Claude Code skills to parse PDFs, extract tables, convert documents to markdown, and prepare files for RAG.
Recommended shortlist
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
$ npx skills add opendatalab/MinerUGet your documents ready for gen AI
$ npx skills add docling-project/doclingKnowledge Agents and Management in the Cloud
$ npx skills add run-llama/llama_cloud_servicesA fast, helpful, and open-source document parser
$ npx skills add run-llama/liteparseHow to use this guide
Test one simple PDF and one messy real document with tables, scans, or long sections.
Look for table quality, markdown structure, source traceability, and visible failure modes.
RAG, data analysis, or legal review skills should consume clean extracted content, not raw broken text.
Evaluation notes
The best PDF skills do not just extract text. They preserve layout, headings, tables, metadata, and uncertainty so an agent can reason over the document safely.
Install one document skill, run it against a representative PDF, inspect output quality, then pair it with RAG or data-analysis only after extraction works.
FAQ
It can reason over provided context, but a dedicated skill can make extraction, table handling, OCR, and repeatable conversion more reliable.
Silent extraction errors. Good workflows expose missing text, OCR uncertainty, table failures, and document privacy boundaries.
More candidates
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
All-in-One Development Tool based on PaddlePaddle
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Python tool for converting files and office documents to Markdown.
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
🪐 Markdown with superpowers: from ideas to papers, presentations, websites, books, and knowledge bases.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Next guides
Platform shortlist
A practical shortlist of skills for Claude Code users who want stronger repository analysis, repeatable coding workflows, browser checks, and agent-ready implementation plans.
Use-case shortlist
Find skills for document ingestion, retrieval, embeddings, source-grounded answers, and agent workflows that need reliable private knowledge.
Installation guide
A practical installation workflow for Claude Code users: choose one skill, copy the install command, run a sandbox task, and review permissions before adopting it.