Docext
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
Install with one command
$ npx skills add NanoNets/docextBest for
Document processing
Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.
Choose it when
- You want a GitHub-backed skill with 2.0K stars.
- You need a reusable install command for agents.
- You want to compare it with related marketplace skills.
Check before install
- Pushed 2mo ago
- License: Apache-2.0
- Review the repository README and examples.
Quality profile
Excellent candidate for agent workflows
High-confidence pick with strong adoption and healthy maintenance signals.
Workflow fit
Use this skill in these scenarios
Parse messy files
Document processing
I need my agent to read PDFs, extract tables, and turn documents into structured data.
Search private knowledge
RAG and knowledge
I need my agent to build a RAG workflow over documents and retrieve reliable context.
Build and ship code
Coding agents
I need a coding agent that can understand a repository, edit code, and review pull requests.
Stack fit
Add it to a complete workflow
Ingest, retrieve, and cite
RAG knowledge base
A stack for document-heavy agents that ingest files, create searchable knowledge, retrieve relevant context, and answer with grounded sources.
Scrape, clean, and reuse web data
Web data pipeline
A practical stack for agents that crawl public pages, extract clean content, normalize data, and hand it to downstream research or RAG workflows.
Inspect, patch, and verify code
Coding review agent
A stack for software agents that inspect repositories, review pull requests, generate tests, and turn findings into shippable patches.
Overview
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
Imported by the skill-only GitHub discovery pipeline because it matches agent skill, automation, RAG, or developer-tool signals. Protocol-server projects are excluded from automated imports.
Platform Compatibility
Technical Details
- Version
- 1.0.0
- License
- Apache-2.0
- Last Updated
- 5/24/2026
- Published
- 5/23/2026
Frameworks & Tools
Author
NanoNets✓
@nanonets
Tags
Platform Fit
Health Signals
- GitHub stars
- 2.0K
- Quality score
- 62/100
- Last GitHub push
- Mar 17, 2026
- Framework hints
- 2
Community Signal
Share whether this skill looks useful for your agent workflow. Aggregated feedback improves rankings over time.
Trust & Safety
- —Open source (public GitHub repo)
- —AI static analysis passed
- —License: Apache-2.0
- —Manually verified by team
Related Skills
MarkItDown
Convert documents into Markdown for agent-readable context
124.9K stars · 0 installsAwesome Llm Apps
100+ AI Agent & RAG apps you can actually run — clone, customize, ship.
111.5K stars · 0 installsRAGFlow
Build document intelligence and RAG workflows for agents
81.1K stars · 0 installsPaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
78.4K stars · 0 installs