Web crawling built for AI
$ npx skills add unclecode/crawl4aiDecision filters
29 skills matching "crawling"
Best blend of quality, stars, freshness, and agent usage
Web crawling built for AI
$ npx skills add unclecode/crawl4aiHigh-throughput crawling and scraping for agent data pipelines
$ npx skills add scrapy/scrapyWeb data for AI applications
$ npx skills add firecrawl/firecrawlElegant Scraper and Crawler Framework for Golang
$ npx skills add gocolly/collyA next-generation crawling and spidering framework.
$ npx skills add projectdiscovery/katananewspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
$ npx skills add codelucas/newspaperCrawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
$ npx skills add apify/crawlee-pythonDeclarative web scraping
$ npx skills add MontFerret/ferretHeadless Chrome .NET API
$ npx skills add hardkoded/puppeteer-sharpTake a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
$ npx skills add edoardottt/cariddiList of libraries, tools and APIs for web scraping and data processing.
$ npx skills add lorien/awesome-web-scraping蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运行在本地、虚拟主机或云服务器中,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
$ npx skills add zorlan/skycaijiWeb crawling framework based on asyncio.
$ npx skills add elliotgao2/gainDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
$ npx skills add crawlab-team/crawlabTransform Web Content into LLM-Ready Data
$ npx skills add watercrawl/WaterCrawlDotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
$ npx skills add dotnetcore/DotnetSpiderA Chrome DevTools Protocol driver for web automation and scraping.
$ npx skills add go-rod/rodStructured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.
$ npx skills add oxylabs/oxylabs-ai-studio-pyPython & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
$ npx skills add adbar/trafilaturaScalable Python web scraping scripts for +40 popular domains
$ npx skills add scrapfly/scrapfly-scrapersWeb Scraping Framework
$ npx skills add lorien/grabDistributed crawler powered by Headless Chrome
$ npx skills add yujiosaka/headless-chrome-crawlerThe complete web scraping toolkit for PHP.
$ npx skills add roach-php/coreAll in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers
$ npx skills add Tuhinshubhra/RED_HAWKGeziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
$ npx skills add geziyor/geziyor100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy to cloudflare or vercel by yourself.
$ npx skills add lumpinif/deepcrawlAsync Python 3.6+ web scraping micro-framework based on asyncio
$ npx skills add howie6879/ruia🤖 Scrape data from HTML websites automatically by just providing examples
$ npx skills add lorey/mlscraperCollection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
$ npx skills add rebrowser/rebrowser-patches