Unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Install with one command
$ npx skills add Unstructured-IO/unstructuredDecision summary
Production-ready for Document processing
Use this as a leading candidate, then validate the README and install path in your own agent stack.
Best for
- Document processing workflows
- Claude Code teams
- teams that value GitHub adoption signals
Not ideal for
- teams that need a vendor-supported SLA
- high-compliance environments without internal security review
Risk notes
- No major risk signals from current metadata
Quality profile
Excellent candidate for agent workflows
High-confidence pick with strong adoption and healthy maintenance signals.
Workflow fit
Use this skill in these scenarios
Parse messy files
Document processing
I need my agent to read PDFs, extract tables, and turn documents into structured data.
Search private knowledge
RAG and knowledge
I need my agent to build a RAG workflow over documents and retrieve reliable context.
Collect structured data
Web scraping
I need my agent to scrape websites and extract structured data from pages.
Stack fit
Add it to a complete workflow
Scrape, clean, and reuse web data
Web data pipeline
A practical stack for agents that crawl public pages, extract clean content, normalize data, and hand it to downstream research or RAG workflows.
Ingest, retrieve, and cite
RAG knowledge base
A stack for document-heavy agents that ingest files, create searchable knowledge, retrieve relevant context, and answer with grounded sources.
Turn skills into distribution
Content growth agent
A stack for turning newly indexed skills into SEO briefs, social drafts, comparison pages, and reusable publishing workflows.
Overview
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Imported by the skill-only GitHub discovery pipeline because it matches agent skill, automation, RAG, or developer-tool signals. Protocol-server projects are excluded from automated imports.
Platform Compatibility
Technical Details
- Version
- 1.0.0
- License
- Apache-2.0
- Last Updated
- 6/6/2026
- Published
- 6/5/2026
Frameworks & Tools
Claim this skill
Project owners can request ownership review. Approved claims unlock a stronger trust signal.
Author
Unstructured-IO✓
@unstructured-io
Tags
Platform Fit
Health Signals
- GitHub stars
- 14.8K
- Quality score
- 72/100
- Last GitHub push
- Jun 6, 2026
- Framework hints
- 2
- OpenAgentSkill views
- 5
- Install copies
- 0
- Outbound clicks
- 0
Community Signal
Share whether this skill looks useful for your agent workflow. Aggregated feedback improves rankings over time.
Trust & Safety
- —Open source (public GitHub repo)
- —AI static analysis passed
- —License: Apache-2.0
- —Manually verified by team
Related Skills
Stirling PDF
#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere
80.3K stars · 0 installsTesseract
Tesseract Open Source OCR Engine (main repository)
74.6K stars · 0 installsMinerU
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
66.8K stars · 0 installsDocling
Get your documents ready for gen AI
61.1K stars · 0 installs