Decision filters

Choose skills by scenario, quality, and trust signals.

9 skills matching "parsing"

Best blend of quality, stars, freshness, and agent usage

1

PaddleOCR

VERIFIEDEXCELLENT · 100

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

$ npx skills add PaddlePaddle/PaddleOCR
78.4K stars77 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonrag
by PaddlePaddleQuick view
2

Opendataloader Pdf

VERIFIEDEXCELLENT · 100

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

$ npx skills add opendataloader-project/opendataloader-pdf
21.5K stars74 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
javarag
by opendataloader-projectQuick view
3

OpenOCR

VERIFIEDEXCELLENT · 100

OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful reproductions of the core implementations from a wide range of academic papers.

$ npx skills add Topdu/OpenOCR
1.4K stars64 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonocr
by TopduQuick view
4

Php Svg Lib

VERIFIEDEXCELLENT · 87

SVG file parsing / rendering library

$ npx skills add dompdf/php-svg-lib
1.4K stars57 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
phppdf
by dompdfQuick view
5

Skrape.It

EXCELLENT · 87

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

$ npx skills add skrapeit/skrape.it
871 stars54 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
kotlincrawler
by skrapeitQuick view
6

Dedoc

EXCELLENT · 86

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

$ npx skills add ispras/dedoc
704 stars54 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonocr
by isprasQuick view
7

Parser Avito

STRONG · 80

Avito Parser —бесплатный парсер для автоматического мониторинга новых объявлений Avito и\или выгрузки объявлений в файл

$ npx skills add Duff89/parser_avito
616 stars53 qualityClaude Code + Browser agents
Solid option that is likely worth shortlisting for production workflows.
pythonplaywright
by Duff89Quick view
8

ExtractThinker

VERIFIEDEXCELLENT · 85

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

$ npx skills add enoch3712/ExtractThinker
1.5K stars53 qualityClaude Code + LangChain
High-confidence pick with strong adoption and healthy maintenance signals.
pythonpdf
by enoch3712Quick view
9

Docstrange

VERIFIEDEXCELLENT · 85

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

$ npx skills add NanoNets/docstrange
1.5K stars53 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonocr
by NanoNetsQuick view