Web crawling built for AI
$ npx skills add unclecode/crawl4aiDecision filters
109 skills matching "extract"
Best blend of quality, stars, freshness, and agent usage
Web crawling built for AI
$ npx skills add unclecode/crawl4ai🔥 Search, scrape, and clean the web for AI agents.
$ npx skills add firecrawl/firecrawlTurn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
$ npx skills add PaddlePaddle/PaddleOCRHigh-throughput crawling and scraping for agent data pipelines
$ npx skills add scrapy/scrapyA visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
$ npx skills add NaiboWang/EasySpiderGive your AI agent a web browser
$ npx skills add browser-use/browser-useExtract web data with LLM-guided scraping graphs
$ npx skills add ScrapeGraphAI/Scrapegraph-aiElegant Scraper and Crawler Framework for Golang
$ npx skills add gocolly/collyPython ProxyPool for web spider
$ npx skills add jhao104/proxy_poolA next-generation crawling and spidering framework.
$ npx skills add projectdiscovery/katananewspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
$ npx skills add codelucas/newspaper👾 Fast and simple video download library and CLI tool written in Go
$ npx skills add iawia002/luxPython脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机
$ npx skills add injetlee/PythonCrawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
$ npx skills add apify/crawlee-python为你 7*24 在线搞钱的“云上牛马”团队
$ npx skills add TeamWiseFlow/wiseflowTrail of Bits Claude Code skills for security research, vulnerability detection, and audit workflows
$ npx skills add trailofbits/skillsDeclarative web scraping
$ npx skills add MontFerret/ferretPython API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
$ npx skills add hect0x7/JMComic-Crawler-PythonRedis-based components for Scrapy.
$ npx skills add rmax/scrapy-redisStructured data extraction and instruction calling with ML, LLM and Vision LLM
$ npx skills add katanaml/sparrowAnalysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
$ npx skills add niespodd/browser-fingerprinting新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
$ npx skills add dataabc/weibo-crawlerscrape data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
$ npx skills add gosom/google-maps-scraperHeadless Chrome .NET API
$ npx skills add hardkoded/puppeteer-sharp🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度
$ npx skills add Boris-code/feapderEvery web site provides APIs.
$ npx skills add elliotgao2/toapiTake a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
$ npx skills add edoardottt/cariddiFree Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.
$ npx skills add oxylabs/amazon-scraperhttps://spatie.be/docs/crawler
$ npx skills add spatie/crawlerAll In One Web Recon
$ npx skills add thewhiteh4t/FinalRecon:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
$ npx skills add jae-jae/QueryList🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
$ npx skills add JayBizzle/Crawler-Detect基于搜狗微信搜索的微信公众号爬虫接口
$ npx skills add chyroc/WechatSogouIncredibly fast crawler designed for OSINT.
$ npx skills add s0md3v/PhotonVideodl: A lightweight video downloader written in pure python. (轻量级视频下载器,优先高清无水印,支持抖音,快手,小红书,B站,TikTok,YouTube,FIFA+,优酷,腾讯,爱奇艺,1905电影网,乐视,芒果,咪咕,PPTV,搜狐,Facebook,Twitter,新浪微博,今日头条,网易公开课,全民K歌,CCTV央视频,酷狗音乐MV,新片场,知乎,百度贴吧,TED等海量流媒体平台)
$ npx skills add CharlesPikachu/videodl蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运行在本地、虚拟主机或云服务器中,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
$ npx skills add zorlan/skycaiji🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, BlueSky, TikTok, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
$ npx skills add AAndyProgram/SCrawlerWeb crawling framework based on asyncio.
$ npx skills add elliotgao2/gainDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
$ npx skills add crawlab-team/crawlabA scalable web crawler framework for Java.
$ npx skills add code4craft/webmagicTo extract main article from given URL with Node.js
$ npx skills add extractus/article-extractorNewPipe's core library for extracting data from streaming sites
$ npx skills add TeamNewPipe/NewPipeExtractorFlexible Node.js AI-assisted crawler library
$ npx skills add coder-hxl/x-crawlTransform Web Content into LLM-Ready Data
$ npx skills add watercrawl/WaterCrawlThe headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.
$ npx skills add microlinkhq/browserlessCollection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。
$ npx skills add hiddendevj/Crawler_Illegal_Cases_In_China抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。
$ npx skills add erma0/douyinDotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
$ npx skills add dotnetcore/DotnetSpiderScopeSentry-Cyberspace mapping, subdomain enumeration, port scanning, sensitive information discovery, vulnerability scanning, distributed nodes
$ npx skills add Autumn-27/ScopeSentryDownload comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫畫下載:腾讯漫画 大角虫漫画 有妖气 咪咕 SF漫画 哦漫画 看漫画 漫画柜 汗汗酷漫 動漫伊甸園 快看漫画 微博动漫 733动漫网 大古漫画网 漫画DB 無限動漫 動漫狂 卡推漫画 动漫之家 动漫屋 古风漫画网 36漫画网 亲亲漫画网 乙女漫画 webtoons 咚漫 ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミック サイコミ;アルファポリス カクヨム ハーメルン 小説家になろう 起点中文网 八一中文网 顶点小说 落霞小说网 努努书坊 笔趣阁→epub.
$ npx skills add kanasimi/work_crawlerElasticsearch File System Crawler (FS Crawler)
$ npx skills add dadoonet/fscrawlerGive your AI the power to browse, scrape, and extract structured data from complex websites — with faster execution, lower cost, and more reliable results.
$ npx skills add browser-act/skillsA web privacy measurement framework
$ npx skills add openwpm/OpenWPMAgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale. Includes REST API, Python and JavaScript SDKs, browser debugger.
$ npx skills add tinyfish-io/agentqlNode.js scraper to get data from Google Play
$ npx skills add facundoolano/google-play-scrapernews-please - an integrated web crawler and information extractor for news that just works
$ npx skills add fhamborg/news-pleaseWebsite Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.
$ npx skills add goclone-dev/gocloneDiskover Community Edition - Open source file indexer, file search engine and data management and analytics powered by Elasticsearch
$ npx skills add diskoverdata/diskover-community一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
$ npx skills add shengqiangzhang/examples-of-web-crawlers浏览过的精彩逆向文章汇总,值得一看
$ npx skills add darbra/spermAI Map is an AI-powered website mapping tool by Oxylabs AI Studio that uses natural language prompts to intelligently discover and extract relevant URLs from any website.
$ npx skills add oxylabs/ai-map-pyA community-driven way to read and chat with AI bots - powered by chatGPT.
$ npx skills add myreader-io/myGPTReaderDark Web OSINT Tool
$ npx skills add DedSecInside/TorBotLearn step-by-step how to scrape Google Trends data and make a result comparison using Python and Oxylabs SERP API. Extract keywords, their popularity, breakdown by region, related queries, and more.
$ npx skills add oxylabs/how-to-scrape-google-trendsEasy to use lightweight web crawler(易用的轻量化网络爬虫)
$ npx skills add xtuhcy/geccoWeb Crawler/Spider for NodeJS + server-side jQuery ;-)
$ npx skills add bda-research/node-crawlerA guide for extracting titles, authors, and citations from Google Scholar using Python and Oxylabs SERP Scraper API.
$ npx skills add oxylabs/how-to-scrape-google-scholarPython & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
$ npx skills add adbar/trafilatura小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
$ npx skills add xisuo67/XHS-Spider新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
$ npx skills add ssssssss-team/spider-flowIntelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
$ npx skills add MikeChongCan/scyllaMovie metadata scraper
$ npx skills add sqzw-x/mdcxThe process of extracting product data from Amazon using Python, including titles, ratings, prices, images, and descriptions.
$ npx skills add oxylabs/how-to-scrape-amazon-product-dataAV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
$ npx skills add guyueyingmu/avbookA collection of awesome web crawler,spider in different languages
$ npx skills add BruceDone/awesome-crawlerA code for extracting best-selling items, search results, and currently available deals from Amazon using Python and Oxylabs E-Commerce Scraper API.
$ npx skills add oxylabs/how-to-scrape-amazon-pricesThe archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
$ npx skills add ArchiveTeam/grab-siteDistributed crawler powered by Headless Chrome
$ npx skills add yujiosaka/headless-chrome-crawler:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
$ npx skills add SpiderClub/haipproxy实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:
$ npx skills add DropsDevopsOrg/ECommerceCrawlersOpen source web infrastructure for AI. Scrape, crawl, and automate the web, clean markdown, browser sessions, ready for your agents.
$ npx skills add vakra-dev/readerProxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:
$ npx skills add constverum/ProxyBrokerAutomatically crawls proxy nodes on the public internet, de-duplicates and tests for usability and then provides a list of nodes
$ npx skills add zu1k/proxypoolUse Web Scraper API to extract data from Google Finance, including stock titles, pricing, and price changes in percentages.
$ npx skills add oxylabs/how-to-scrape-google-financeAll in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers
$ npx skills add Tuhinshubhra/RED_HAWKPython爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝,如果喜欢请start ❤️
$ npx skills add wkunzhi/Python3-SpiderA powerful browser crawler for web vulnerability scanners
$ npx skills add Qianlitp/crawlergoGospider - Fast web spider written in Go
$ npx skills add jaeles-project/gospiderDecryptLogin: APIs for loginning some websites by using requests.
$ npx skills add CharlesPikachu/DecryptLoginowllook-小说搜索引擎
$ npx skills add howie6879/owllookA Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
$ npx skills add NikolaiT/GoogleScraperGeziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
$ npx skills add geziyor/geziyorLeaked GPTs Prompts Bypass the 25 message limit or to try out GPTs without a Plus subscription.
$ npx skills add friuns2/Leaked-GPTsCross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
$ npx skills add sjdirect/abot100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy to cloudflare or vercel by yourself.
$ npx skills add lumpinif/deepcrawlvulnx 🕷️ an intelligent Bot, Shell can achieve automatic injection, and help researchers detect security vulnerabilities CMS system. It can perform a quick CMS security detection, information collection (including sub-domain name, ip address, country information, organizational information and time zone, etc.) and vulnerability scanning.
$ npx skills add anouarbensaad/vulnxPolite, slim and concurrent web crawler.
$ npx skills add PuerkitoBio/gocrawlFind web directories without bruteforce
$ npx skills add Nekmo/dirhunt爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》
$ npx skills add lixi5338619/lxSpider浏览器内存漫游解决方案(探索中...)
$ npx skills add JSREI/ast-hook-for-js-RE磁力網站U3C3介紹以及域名更新
$ npx skills add u3c3/BT-btt简单易用的Python爬虫框架,QQ交流群:597510560
$ npx skills add xianhu/PSpider[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
$ npx skills add hu17889/go_spiderAsync Python 3.6+ web scraping micro-framework based on asyncio
$ npx skills add howie6879/ruiaGoogle, Naver multiprocess image web crawler (Selenium)
$ npx skills add YoongiKim/AutoCrawlerpython爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩哔哩视频封面提取器,ip代理池封装,知乎百万级用户爬虫+数据分析,github用户爬虫
$ npx skills add srx-2000/spider_collection🤖 Scrape data from HTML websites automatically by just providing examples
$ npx skills add lorey/mlscraper🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.
$ npx skills add ScrapeGraphAI/scrapecraftUscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
$ npx skills add z0m31en7/Uscrapper