Web crawling built for AI
$ npx skills add unclecode/crawl4aiDecision filters
250 skills matching "extraction"
Best blend of quality, stars, freshness, and agent usage
Web crawling built for AI
$ npx skills add unclecode/crawl4ai🔥 Search, scrape, and clean the web for AI agents.
$ npx skills add firecrawl/firecrawlHigh-throughput crawling and scraping for agent data pipelines
$ npx skills add scrapy/scrapyA visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
$ npx skills add NaiboWang/EasySpiderElegant Scraper and Crawler Framework for Golang
$ npx skills add gocolly/collyPython ProxyPool for web spider
$ npx skills add jhao104/proxy_poolA next-generation crawling and spidering framework.
$ npx skills add projectdiscovery/katananewspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
$ npx skills add codelucas/newspaper👾 Fast and simple video download library and CLI tool written in Go
$ npx skills add iawia002/luxPython脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机
$ npx skills add injetlee/Python为你 7*24 在线搞钱的“云上牛马”团队
$ npx skills add TeamWiseFlow/wiseflowDeclarative web scraping
$ npx skills add MontFerret/ferretPython API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
$ npx skills add hect0x7/JMComic-Crawler-PythonRedis-based components for Scrapy.
$ npx skills add rmax/scrapy-redisStructured data extraction and instruction calling with ML, LLM and Vision LLM
$ npx skills add katanaml/sparrowAnalysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
$ npx skills add niespodd/browser-fingerprinting新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
$ npx skills add dataabc/weibo-crawlerHeadless Chrome .NET API
$ npx skills add hardkoded/puppeteer-sharp🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度
$ npx skills add Boris-code/feapderEvery web site provides APIs.
$ npx skills add elliotgao2/toapiTake a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
$ npx skills add edoardottt/cariddihttps://spatie.be/docs/crawler
$ npx skills add spatie/crawlerAll In One Web Recon
$ npx skills add thewhiteh4t/FinalReconA Python library for reading and writing PDF, powered by QPDF
$ npx skills add pikepdf/pikepdfA maroto way to create PDFs. Maroto is inspired in Bootstrap and uses gofpdf. Fast and simple.
$ npx skills add johnfercher/maroto:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
$ npx skills add jae-jae/QueryListRead and extract text and other content from PDFs in C# (port of PDFBox)
$ npx skills add UglyToad/PdfPigPDF exporter for HTML presentations
$ npx skills add astefanutti/decktape🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
$ npx skills add JayBizzle/Crawler-Detect基于搜狗微信搜索的微信公众号爬虫接口
$ npx skills add chyroc/WechatSogouiText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
$ npx skills add itext/itext-javaAssist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
$ npx skills add eikek/docspellDocument scanning app
$ npx skills add ossappscollective/OSS-DocumentScannerIncredibly fast crawler designed for OSINT.
$ npx skills add s0md3v/PhotonVideodl: A lightweight video downloader written in pure python. (轻量级视频下载器,优先高清无水印,支持抖音,快手,小红书,B站,TikTok,YouTube,FIFA+,优酷,腾讯,爱奇艺,1905电影网,乐视,芒果,咪咕,PPTV,搜狐,Facebook,Twitter,新浪微博,今日头条,网易公开课,全民K歌,CCTV央视频,酷狗音乐MV,新片场,知乎,百度贴吧,TED等海量流媒体平台)
$ npx skills add CharlesPikachu/videodl蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运行在本地、虚拟主机或云服务器中,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
$ npx skills add zorlan/skycaiji🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, BlueSky, TikTok, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
$ npx skills add AAndyProgram/SCrawlerWeb crawling framework based on asyncio.
$ npx skills add elliotgao2/gainDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
$ npx skills add crawlab-team/crawlabA search engine that "just works" for Obsidian. Supports OCR and PDF indexing.
$ npx skills add scambier/obsidian-omnisearch📐⚙ 2D vector line drawing and shape modeling for CNC and laser cutters.
$ npx skills add microsoft/maker.jsPDF editor for Windows. Install or run portable. GPLv3. No account, no subscription, no telemetry.
$ npx skills add SteveTheKiller/KillerPDFiText for .NET is the .NET version of the iText library, formerly known as iTextSharp, which it replaces. iText represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enha
$ npx skills add itext/itext-dotnetA scalable web crawler framework for Java.
$ npx skills add code4craft/webmagicTo extract main article from given URL with Node.js
$ npx skills add extractus/article-extractorComic and Manga reader, written with Node.js and using Electron
$ npx skills add ollm/OpenComicNewPipe's core library for extracting data from streaming sites
$ npx skills add TeamNewPipe/NewPipeExtractorFlexible Node.js AI-assisted crawler library
$ npx skills add coder-hxl/x-crawlPHP PDF Library (official TCPDF successor)
$ npx skills add tecnickcom/tc-lib-pdfTransform Web Content into LLM-Ready Data
$ npx skills add watercrawl/WaterCrawlMinimal PDF creation library. <400 LOC, zero dependencies, makes real PDFs.
$ npx skills add Lulzx/tinypdfVector graphics in Go
$ npx skills add tdewolff/canvasA web interface to extract tabular data from PDFs
$ npx skills add camelot-dev/excaliburA <Pdf /> component for react-native
$ npx skills add wonday/react-native-pdfRust Bindings for the Skia Graphics Library
$ npx skills add rust-skia/rust-skiaThe SILE Typesetter — Simon’s Improved Layout Engine
$ npx skills add sile-typesetter/sile在保留版面、公式与结构的前提下进行 PDF 翻译,适用于科研与技术文档
$ npx skills add wxyhgk/retain-pdfCollection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。
$ npx skills add hiddendevj/Crawler_Illegal_Cases_In_ChinaA modern PDF library for TypeScript. Parse, modify, and generate PDFs with a clean, intuitive API.
$ npx skills add LibPDF-js/core基于 manga-image-translator 的开源漫画翻译工具。支持日/韩/美漫自动翻译,内置 OpenAI、Gemini 等 5 种翻译引擎,并提供可视化编辑器自由调整文本样式。一键安装,开箱即用。如果喜欢,欢迎点亮 ⭐ Star 支持!
$ npx skills add hgmzhn/manga-translator-uiAn extensible Markdown Editor, Viewer and Weblog Publisher for Windows
$ npx skills add RickStrahl/MarkdownMonsterRust library to read, manipulate and write PDF files.
$ npx skills add pdf-rs/pdfDatabase Reporting Tool and Tasks (.Net)
$ npx skills add ariacom/Seal-ReportA lightning fast image processing and resizing library for Go
$ npx skills add davidbyttow/govips学习计算机科学的电子书
$ npx skills add tolerious/Programming_learning_resource抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。
$ npx skills add erma0/douyinDotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
$ npx skills add dotnetcore/DotnetSpiderMORT 번역기 프로젝트 - Real-time game translator with OCR
$ npx skills add killkimno/MORTA lightweight 2D graphics library for modern GPUs, delivering high-performance text, image, and vector rendering across major platforms.
$ npx skills add Tencent/tgfxDeclarative way to run AI models in React Native on device, powered by ExecuTorch.
$ npx skills add software-mansion/react-native-executorchPython wrapper for the arXiv API
$ npx skills add lukasschwab/arxiv.pyCollection of open-source libraries and tools for Robotic Process Automation (RPA), designed to be used with both Robot Framework and Python
$ npx skills add robocorp/rpaframeworkScopeSentry-Cyberspace mapping, subdomain enumeration, port scanning, sensitive information discovery, vulnerability scanning, distributed nodes
$ npx skills add Autumn-27/ScopeSentryDownload comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫畫下載:腾讯漫画 大角虫漫画 有妖气 咪咕 SF漫画 哦漫画 看漫画 漫画柜 汗汗酷漫 動漫伊甸園 快看漫画 微博动漫 733动漫网 大古漫画网 漫画DB 無限動漫 動漫狂 卡推漫画 动漫之家 动漫屋 古风漫画网 36漫画网 亲亲漫画网 乙女漫画 webtoons 咚漫 ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミック サイコミ;アルファポリス カクヨム ハーメルン 小説家になろう 起点中文网 八一中文网 顶点小说 落霞小说网 努努书坊 笔趣阁→epub.
$ npx skills add kanasimi/work_crawlerElasticsearch File System Crawler (FS Crawler)
$ npx skills add dadoonet/fscrawlerGive your AI the power to browse, scrape, and extract structured data from complex websites — with faster execution, lower cost, and more reliable results.
$ npx skills add browser-act/skillsYomiTokuはAIを活用した日本語文書解析エンジンを提供するPythonパッケージです。 Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.
$ npx skills add kotaro-kinoshita/yomitokuA web privacy measurement framework
$ npx skills add openwpm/OpenWPMVersatile PDF creation and manipulation for Ruby
$ npx skills add gettalong/hexapdf📰 Binary distribution of PDFium
$ npx skills add bblanchon/pdfium-binariesOpen source PDF editor.
$ npx skills add JakubMelka/PDF4QTOpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful reproductions of the core implementations from a wide range of academic papers.
$ npx skills add Topdu/OpenOCRA self-hosted file conversion server & share tool that supports 445 file formats in 13 languages.
$ npx skills add zelon88/HRConvert2JasperReports® - Free Java Reporting Library
$ npx skills add Jaspersoft/jasperreportsjavascript based business reporting platform :rocket:
$ npx skills add jsreport/jsreportAn iOS OCR Server Using Apple’s Vision Framework
$ npx skills add riddleling/iOS-OCR-ServerA curated collection of practical AI projects implementing OCR systems, RAG, AI agents, and other AI use cases.
$ npx skills add Sumanth077/Hands-On-AI-EngineeringMouseover Translate Any Language At Once - Chrome Extension: PDF Translator, EBOOK, EPUB, OCR, TTS, NETFLIX, YOUTUBE DUAL SUBTITLES, GOOGLE DOCS, AI, VIEWER, GMAIL, WRITING, IMAGE, DUAL SUBS, MANGA, HOVER, DICTIONARY, WEBTOON, EDGE, JAPANESE, ENGLISH
$ npx skills add ttop32/MouseTooltipTranslator🌝 MLKit是一个强大易用的工具包。通过ML Kit您可以很轻松的实现文字识别、条码识别、图像标记、人脸检测、对象检测等功能。
$ npx skills add jenly1314/MLKitA Simple Mihomo GUI. 一个简易的 Mihomo 桌面客户端
$ npx skills add snakem982/Pandora-BoxA group of notebooks and other files which can help you learn AI from scratch.
$ npx skills add Ramakm/ai-hands-onFess is very powerful and easily deployable Enterprise Search Server.
$ npx skills add codelibs/fess📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
$ npx skills add AndyTheFactory/newspaper4kNode.js scraper to get data from Google Play
$ npx skills add facundoolano/google-play-scraperRun a high-fidelity browser-based web archiving crawler in a single Docker container
$ npx skills add webrecorder/browsertrix-crawlerPDF references add-on for Zotero.
$ npx skills add MuiseDestiny/zotero-referenceOCR engine for all the languages
$ npx skills add mittagessen/krakenCross-platform desktop GUI app to clean image metadata
$ npx skills add szTheory/exifcleanernews-please - an integrated web crawler and information extractor for news that just works
$ npx skills add fhamborg/news-pleaseconverts binary PDF to JSON and text, for server-side PDF processing and command-line use. Zero dependency.
$ npx skills add modesty/pdf2jsonWebsite Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.
$ npx skills add goclone-dev/gocloneAn on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
$ npx skills add NanoNets/docextCreate chatbots with ease
$ npx skills add n4ze3m/dialoqbaseDiskover Community Edition - Open source file indexer, file search engine and data management and analytics powered by Elasticsearch
$ npx skills add diskoverdata/diskover-community一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
$ npx skills add shengqiangzhang/examples-of-web-crawlersSelfhosted PDF manager, viewer and editor offering a seamless user experience on multiple devices.
$ npx skills add mrmn2/PdfDingEnjoy reading with your favorite style.
$ npx skills add jesselau76/ebook-GPT-translatorDocument reader
$ npx skills add baskerville/plato中文古籍刻本風格直排電子書製作工具 Chinese Ancient eBooks Generator
$ npx skills add shanleiguang/vRainGet clean data from tricky documents, powered by vision-language models ⚡
$ npx skills add emcf/thepipePdf creation module for dart/flutter
$ npx skills add DavBfr/dart_pdfA CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.
$ npx skills add nazdridoy/kokoro-ttsConvert a pdf to an image
$ npx skills add spatie/pdf-to-image浏览过的精彩逆向文章汇总,值得一看
$ npx skills add darbra/spermDisplay paginated content in the browser and generate print books using web technology
$ npx skills add pagedjs/pagedjsLightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
$ npx skills add felipecsl/wombatA multi-threaded PDF password cracking utility equipped with commonly encountered password format builders and dictionary attacks.
$ npx skills add mufeedvh/pdfripThe Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.
$ npx skills add 0xInfection/XSRFProbe基于appium的app自动遍历工具
$ npx skills add seveniruby/AppCrawlerA community-driven way to read and chat with AI bots - powered by chatGPT.
$ npx skills add myreader-io/myGPTReaderDark Web OSINT Tool
$ npx skills add DedSecInside/TorBotEasy to use lightweight web crawler(易用的轻量化网络爬虫)
$ npx skills add xtuhcy/geccoA library for converting HTML into PDFs using ReportLab
$ npx skills add xhtml2pdf/xhtml2pdfAn Open source app to download and read books from shadow library (Anna’s Archive)
$ npx skills add dstark5/OpenlibSpecify a github or local repo, github pull request, arXiv or Sci-Hub paper, Youtube transcript or documentation URL on the web and scrape into a text file and clipboard for easier LLM ingestion
$ npx skills add jimmc414/onefilellmWeb Crawler/Spider for NodeJS + server-side jQuery ;-)
$ npx skills add bda-research/node-crawlerHackable CLI tool for converting Markdown files to PDF using Node.js and headless Chrome.
$ npx skills add simonhaenisch/md-to-pdfOffline markdown to pdf, choose -> edit -> transform 🥂
$ npx skills add realdennis/md2pdfkramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
$ npx skills add gettalong/kramdownA high-quality PDF to Markdown tool based on large language model visual recognition. 一款基于大模型视觉识别的高质量PDF转Markdown工具
$ npx skills add MarkPDFdown/markpdfdownRead Japanese manga inside browser with selectable text.
$ npx skills add kha-white/mokuroPython & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
$ npx skills add adbar/trafilaturaSVG file parsing / rendering library
$ npx skills add dompdf/php-svg-lib小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
$ npx skills add xisuo67/XHS-SpiderFree Offline OCR 离线的中文文本检测+识别SDK
$ npx skills add myhub/tr📄 PDF Viewer Component for Angular
$ npx skills add VadimDez/ng2-pdf-viewer新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
$ npx skills add ssssssss-team/spider-flowKibana Alert & Report App for Elasticsearch
$ npx skills add sentinl/sentinlAn app to convert images to PDF file!
$ npx skills add Swati4star/Images-to-PDFEasily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片,视频
$ npx skills add dixudx/tumblr-crawlerIntelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
$ npx skills add MikeChongCan/scyllaMovie metadata scraper
$ npx skills add sqzw-x/mdcxOpen Source Document Management System for Digital Archives (Scanned Documents)
$ npx skills add ciur/papermergeAV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
$ npx skills add guyueyingmu/avbookRMT (RuoMengTu) is a free, open-source macro tool built on AHKv2. Let the code handle the tedious work—you have more meaningful things to do.
$ npx skills add zclucas/RMTA scalable, mature and versatile web crawler based on Apache Storm
$ npx skills add apache/stormcrawlerSnapX is a free, open-source, cross-platform tool that lets you capture or record any area of your screen and instantly share it with a single keypress. Upload images, videos, text, and more to multiple supported destinations—all with ease. ShareX fork
$ npx skills add SnapXL/SnapX一个用于抓取和分析 X (Twitter) 用户数据和推文的工具。
$ npx skills add xiaoxiunique/x-kitGolang短视频去水印:抖音,皮皮虾,火山,微视,最右,快手,全民小视频,皮皮搞笑,西瓜视频,虎牙,梨视频,acfun,好看视频...
$ npx skills add wujunwei928/parse-videoPIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation
$ npx skills add microsoft/PIKE-RAGCCExtractor - Official version maintained by the core team
$ npx skills add CCExtractor/ccextractor📜 A Cheat-Sheet Collection from the WWW
$ npx skills add sk3pp3r/cheat-sheet-pdfA Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
$ npx skills add skrapeit/skrape.it爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链
$ npx skills add 0xAllenChen/spider_reverseOpen-source screenshot and screen recording for macOS. The free, native alternative to CleanShot X. Built with Swift 6.0 and SwiftUI.
$ npx skills add lzhgus/CapsoLocal-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
$ npx skills add eclaire-labs/eclaireArrowDL (Arrow Downloader) is a download manager for Windows, MacOS and Linux
$ npx skills add setvisible/ArrowDLNote Companion: AI assistant for Obsidian that goes beyond just a chat. (prev File Organizer 2000)
$ npx skills add Nexus-JPF/note-companionPDF++: the most Obsidian-native PDF annotation & viewing tool ever. Comes with optional Vim keybindings.
$ npx skills add RyotaUshio/obsidian-pdf-plusA minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training pipeline available for self-reproduction. | 超轻量SOTA LaTeX公式识别模型,仅20M参数量,可在浏览器中运行。训练全流程代码开源,以便自学复现。
$ npx skills add alephpi/TexoAn ergonomic Rust HTTP Client with TLS fingerprint
$ npx skills add 0x676e67/wreqFree Open Source Document Management System (mirror, no pull request or issues)
$ npx skills add mayan-edms/Mayan-EDMSDownload your resume from resume.io as PDF
$ npx skills add felipeall/resumeio-to-pdfJava API For Chrome and Firefox
$ npx skills add fanyong920/jvppeteerCnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包
$ npx skills add breezedeus/CnSTDZotero Plugin for OCR
$ npx skills add UB-Mannheim/zotero-ocrWeb interface for recognizing text, proofreading OCR, and creating fully-digitized documents.
$ npx skills add scribeocr/scribeocrA lightweight web crawler framework.(Java爬虫框架)
$ npx skills add xuxueli/xxl-crawlerAn local, offline (after initial setup), portable OCR software that can process images and PDF files, using DeepSeek-OCR AI (running directly on your machine).
$ npx skills add th1nhhdk/local_ai_ocrQuick, painless, intuitive OCR platform written in Rust and TypeScript. Modern UI with modern API, with an emphasis on intuitive user experience.
$ npx skills add readur/readurThe best PTT library
$ npx skills add PyPtt/PyPttA Tumblr and Twitter Blog Backup Application
$ npx skills add TumblThreeApp/TumblThreeA collection of awesome web crawler,spider in different languages
$ npx skills add BruceDone/awesome-crawlerOpen source SEO audit tool.
$ npx skills add StJudeWasHere/seonautWscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.
$ npx skills add chushuai/wscanDedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
$ npx skills add ispras/dedocMulti-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
$ npx skills add raphael-seo/Versatile-OCR-ProgramExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
$ npx skills add enoch3712/ExtractThinkerDistributed crawler powered by Headless Chrome
$ npx skills add yujiosaka/headless-chrome-crawler:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
$ npx skills add SpiderClub/haipproxy实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:
$ npx skills add DropsDevopsOrg/ECommerceCrawlersExtract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
$ npx skills add NanoNets/docstrangeA packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.
$ npx skills add faustomorales/keras-ocrKrawl is a customizable, lightweight, cloud-native web deception server and anti-crawler that creates fake web applications with low-hanging vulnerabilities using realistic, randomly generated decoy data and AI-generated HTML templates.
$ npx skills add BlessedRebuS/KrawlOpen source web infrastructure for AI. Scrape, crawl, and automate the web, clean markdown, browser sessions, ready for your agents.
$ npx skills add vakra-dev/readerJavaScript Promiseの本
$ npx skills add azu/promises-bookCrawl and extract (regular or onion) webpages through TOR network
$ npx skills add MikeMeliz/TorCrawl.pytranslate scientific papers in latex, especially arxiv papers
$ npx skills add SUSYUSTC/MathTranslate(eBook,PDFs Translation) A multilingual eBook processing tool supporting all eBook formats. Features online and offline translation while preserving original layouts. Compatible with both scanned and digital PDFs. Elegant user interface. The world's highest-performing open-source layout-preserving eBook translator.
$ npx skills add CBIhalsen/PolyglotPDF一个好用的哔哩哔哩漫画下载器,拥有图形界面,支持关键词搜索漫画和二维码登入,黑科技下载未解锁章节,多线程下载,多种保存格式,本地漫画管理,一键检查更新!
$ npx skills add Zeal-L/BiliBili-Manga-DownloaderPaddleOCR inference in PyTorch. Converted from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
$ npx skills add frotms/PaddleOCR2PytorchProxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:
$ npx skills add constverum/ProxyBrokerCrawly, a high-level web crawling & scraping framework for Elixir.
$ npx skills add elixir-crawly/crawlyAutomatically crawls proxy nodes on the public internet, de-duplicates and tests for usability and then provides a list of nodes
$ npx skills add zu1k/proxypoolUse Web Scraper API to extract data from Google Finance, including stock titles, pricing, and price changes in percentages.
$ npx skills add oxylabs/how-to-scrape-google-financeAll in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers
$ npx skills add Tuhinshubhra/RED_HAWKSpiderSuite releases, wiki and roadmap
$ npx skills add spidersuite/SpiderSuiteONNX Model Exporter for PaddlePaddle
$ npx skills add PaddlePaddle/Paddle2ONNXPython爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
$ npx skills add wkunzhi/Python3-SpiderHTTP API for Scrapy spiders
$ npx skills add scrapinghub/scrapyrtA powerful browser crawler for web vulnerability scanners
$ npx skills add Qianlitp/crawlergoGospider - Fast web spider written in Go
$ npx skills add jaeles-project/gospiderDecryptLogin: APIs for loginning some websites by using requests.
$ npx skills add CharlesPikachu/DecryptLoginowllook-小说搜索引擎
$ npx skills add howie6879/owllookA Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
$ npx skills add NikolaiT/GoogleScraperSiteOne Crawler is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers, DevOps, QA engineers, and consultants. Supports Windows, macOS, and Linux (x64 and arm64).
$ npx skills add janreges/siteone-crawler:newspaper: Let ChatGPT Summarize Hacker News for You
$ npx skills add polyrabbit/hacker-news-digestGeziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
$ npx skills add geziyor/geziyor:blue_book: 电子书 -《Real-Time Rendering 3rd》提炼总结 | 全书共9万7千余字。你可以把它看做中文通俗版的《Real-Time Rendering 3rd》,也可以把它看做《Real-Time Rendering 3rd》的解读版与配套学习伴侣,或者《Real-Time Rendering 4th》的前置阅读材料。
$ npx skills add QianMo/Real-Time-Rendering-3rd-CN-Summary-Ebook🤖 A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs
$ npx skills add esbenp/pdf-botAI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.
$ npx skills add AkagawaTsurunaki/ZerolanLiveRobotScan, index, and archive all of your paper documents (acquired by Mayan EDMS)
$ npx skills add zhoubear/open-paperlessHigh-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
$ npx skills add TikHub/TikHub-API-Python-SDKLeaked GPTs Prompts Bypass the 25 message limit or to try out GPTs without a Plus subscription.
$ npx skills add friuns2/Leaked-GPTsSimple wrapper of tabula-java: extract table from PDF into pandas DataFrame
$ npx skills add chezou/tabula-pyCross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
$ npx skills add sjdirect/abotvue.js pdf viewer
$ npx skills add FranckFreiburger/vue-pdfMoodle-DL downloads course content fast from Moodle (eg. lecture pdfs)
$ npx skills add C0D3D3V/Moodle-DLA set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
$ npx skills add WZBSocialScienceCenter/pdftabextractbooks pdf
$ npx skills add huyubing/books-pdfAn HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!
$ npx skills add danfickle/openhtmltopdf100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy to cloudflare or vercel by yourself.
$ npx skills add lumpinif/deepcrawlvulnx 🕷️ an intelligent Bot, Shell can achieve automatic injection, and help researchers detect security vulnerabilities CMS system. It can perform a quick CMS security detection, information collection (including sub-domain name, ip address, country information, organizational information and time zone, etc.) and vulnerability scanning.
$ npx skills add anouarbensaad/vulnxPolite, slim and concurrent web crawler.
$ npx skills add PuerkitoBio/gocrawlFind web directories without bruteforce
$ npx skills add Nekmo/dirhuntA python module that wraps the pdftoppm utility to convert PDF to PIL Image object
$ npx skills add Belval/pdf2image爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》
$ npx skills add lixi5338619/lxSpider浏览器内存漫游解决方案(探索中...)
$ npx skills add JSREI/ast-hook-for-js-REOpen Source Virtual (Network) Printer for Windows that allows you to create PDFs, OCR text, and print images, with advanced features usually available only in enterprise solutions.
$ npx skills add clawsoftware/clawPDF磁力網站U3C3介紹以及域名更新
$ npx skills add u3c3/BT-btt简单易用的Python爬虫框架,QQ交流群:597510560
$ npx skills add xianhu/PSpider[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
$ npx skills add hu17889/go_spiderConverts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text
$ npx skills add sajari/docconvPython tool for grabbing text via screenshot
$ npx skills add ianzhao/textshotVision utilities for web interaction agents 👀
$ npx skills add reworkd/tarsierCAJ 转 PDF 转换器(GUI 版本)
$ npx skills add sainnhe/caj2pdf-qtFast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
$ npx skills add yobix-ai/extractousAsync Python 3.6+ web scraping micro-framework based on asyncio
$ npx skills add howie6879/ruiaA plugin for reading and annotating PDFs and EPUBs in obsidian.
$ npx skills add elias-sundqvist/obsidian-annotatorAndroid widget that can render PDF documents stored on SD card, linked as assets, or downloaded from a remote URL.
$ npx skills add voghDev/PdfViewPagerGoogle, Naver multiprocess image web crawler (Selenium)
$ npx skills add YoongiKim/AutoCrawlerMixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
$ npx skills add RQLuo/MixTeX-Latex-OCRpython爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩哔哩视频封面提取器,ip代理池封装,知乎百万级用户爬虫+数据分析,github用户爬虫
$ npx skills add srx-2000/spider_collectionConverts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
$ npx skills add JonathanLink/PDFLayoutTextStripperThe archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
$ npx skills add ArchiveTeam/grab-sitea "Proof of Concept or GTFO" mirror with an extensive index with also whole issues or individual articles as clean PDFs.
$ npx skills add angea/pocorgtfoA curated list of resources for Document Understanding (DU) topic
$ npx skills add tstanislawek/awesome-document-understandingOCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。提供各种语言API。由 PaddleOCR C++ 编译。
$ npx skills add hiroi-sora/PaddleOCR-jsonList of Elixir books
$ npx skills add sger/ElixirBooksOpen-source platform for extracting structured data from documents using AI.
$ npx skills add DocumindHQ/documind