PDFLayoutTextStripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Install with one command
$ npx skills add JonathanLink/PDFLayoutTextStripperBest for
Document processing
Find skills for parsing PDFs, extracting tables, running OCR, converting documents, and preparing file content for agent workflows.
Choose it when
- You want a GitHub-backed skill with 1.6K stars.
- You need a reusable install command for agents.
- You want to compare it with related marketplace skills.
Check before install
- Pushed 2y ago
- License: Apache-2.0
- Review the repository README and examples.
Quality profile
Strong candidate for agent workflows
Solid option that is likely worth shortlisting for production workflows.
Workflow fit
Use this skill in these scenarios
Parse messy files
Document processing
I need my agent to read PDFs, extract tables, and turn documents into structured data.
Search private knowledge
RAG and knowledge
I need my agent to build a RAG workflow over documents and retrieve reliable context.
Build and ship code
Coding agents
I need a coding agent that can understand a repository, edit code, and review pull requests.
Stack fit
Add it to a complete workflow
Ingest, retrieve, and cite
RAG knowledge base
A stack for document-heavy agents that ingest files, create searchable knowledge, retrieve relevant context, and answer with grounded sources.
Turn skills into distribution
Content growth agent
A stack for turning newly indexed skills into SEO briefs, social drafts, comparison pages, and reusable publishing workflows.
Inspect, patch, and verify code
Coding review agent
A stack for software agents that inspect repositories, review pull requests, generate tests, and turn findings into shippable patches.
Overview
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Imported by the skill-only GitHub discovery pipeline because it matches agent skill, automation, RAG, or developer-tool signals. Protocol-server projects are excluded from automated imports.
Platform Compatibility
Technical Details
- Version
- 1.0.0
- License
- Apache-2.0
- Last Updated
- 5/24/2026
- Published
- 5/23/2026
Frameworks & Tools
Author
JonathanLink✓
@jonathanlink
Platform Fit
Health Signals
- GitHub stars
- 1.6K
- Quality score
- 49/100
- Last GitHub push
- Dec 17, 2023
- Framework hints
- 2
Community Signal
Share whether this skill looks useful for your agent workflow. Aggregated feedback improves rankings over time.
Trust & Safety
- —Open source (public GitHub repo)
- —AI static analysis passed
- —License: Apache-2.0
- —Manually verified by team
Related Skills
Pikepdf
A Python library for reading and writing PDF, powered by QPDF
2.7K stars · 0 installsMaroto
A maroto way to create PDFs. Maroto is inspired in Bootstrap and uses gofpdf. Fast and simple.
2.7K stars · 0 installsPdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
2.4K stars · 0 installsDecktape
PDF exporter for HTML presentations
2.4K stars · 0 installs