Collect structured data

Best web scraping skills for AI agents

Find skills for crawling websites, extracting structured data, monitoring pages, and turning messy web content into agent-ready inputs.

Builders choosing skills for extract product data from websites and monitor competitor pages. Ranked from the OpenAgentSkill index using quality, trust, freshness, adoption, and install readiness.

30
Ranked
721K
Stars
100
Top trust

Workflow

Extract product data from websites

Workflow

Monitor competitor pages

Workflow

Turn HTML into clean markdown

#1

Crawlee

37 fitTrust 100Excellent 100

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Excellent quality, 24K stars, and a 37 use-case fit score.

24K starsJun 12, 2026 pushProduction candidateTypeScriptPlaywright
$ npx skills add apify/crawlee
#2

Crawlee Python

36 fitTrust 100Excellent 100

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Excellent quality, 9.2K stars, and a 36 use-case fit score.

9.2K starsJun 11, 2026 pushProduction candidatePythonPlaywright
$ npx skills add apify/crawlee-python
#3

Firecrawl

34 fitTrust 100Excellent 100

The API to search, scrape, and interact with the web at scale. 🔥

Excellent quality, 132K stars, and a 34 use-case fit score.

132K starsJun 12, 2026 pushProduction candidateTypeScriptAI Agents
$ npx skills add firecrawl/firecrawl
#4

Scrapegraph AI

33 fitTrust 100Excellent 100

Python scraper based on AI

Excellent quality, 27K stars, and a 33 use-case fit score.

27K starsJun 11, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add ScrapeGraphAI/Scrapegraph-ai
#5

Maxun

31 fitTrust 100Excellent 100

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

Excellent quality, 16K stars, and a 31 use-case fit score.

16K starsJun 11, 2026 pushProduction candidateTypeScriptBrowser Automation
$ npx skills add getmaxun/maxun
#6

Crawl4AI

31 fitTrust 100Excellent 100

Open-source LLM-friendly web crawler and scraper

Excellent quality, 66K stars, and a 31 use-case fit score.

66K starsMay 22, 2026 pushProduction candidateClaudeGPT-4
$ npx skills add unclecode/crawl4ai
#7

Lux

30 fitTrust 100Excellent 100

👾 Fast and simple video download library and CLI tool written in Go

Excellent quality, 31K stars, and a 30 use-case fit score.

31K starsMar 29, 2026 pushProduction candidateGoCrawler
$ npx skills add iawia002/lux
#8

Colly

30 fitTrust 100Excellent 100

Elegant Scraper and Crawler Framework for Golang

Excellent quality, 25K stars, and a 30 use-case fit score.

25K starsMay 25, 2026 pushProduction candidateGoCrawler
$ npx skills add gocolly/colly
#9

Scrapling

29 fitTrust 100Excellent 100

Adaptive web scraping for agent data collection

Excellent quality, 54K stars, and a 29 use-case fit score.

54K starsMay 18, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add D4Vinci/Scrapling
#10

Newspaper

29 fitTrust 100Excellent 100

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Excellent quality, 15K stars, and a 29 use-case fit score.

15K starsMay 13, 2026 pushProduction candidatePythonCrawler
$ npx skills add codelucas/newspaper
#11

AnyCrawl

27 fitTrust 100Excellent 100

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

Excellent quality, 3.2K stars, and a 27 use-case fit score.

3.2K starsJun 9, 2026 pushProduction candidateMDXRAG
$ npx skills add any4ai/AnyCrawl
#12

Amazon Scraper

27 fitTrust 100Excellent 100

Free Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.

Excellent quality, 3.0K stars, and a 27 use-case fit score.

3.0K starsJun 8, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add oxylabs/amazon-scraper
#13

How To Scrape Amazon Product Data

27 fitTrust 100Excellent 100

The process of extracting product data from Amazon using Python, including titles, ratings, prices, images, and descriptions.

Excellent quality, 2.9K stars, and a 27 use-case fit score.

2.9K starsJun 8, 2026 pushProduction candidateWeb AutomationClaude Code
$ npx skills add oxylabs/how-to-scrape-amazon-product-data
#14

Google Play Scraper

27 fitTrust 100Excellent 100

Node.js scraper to get data from Google Play

Excellent quality, 2.9K stars, and a 27 use-case fit score.

2.9K starsMay 31, 2026 pushProduction candidateJavaScriptCrawler
$ npx skills add facundoolano/google-play-scraper
#15

QueryList

27 fitTrust 100Excellent 100

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Excellent quality, 2.7K stars, and a 27 use-case fit score.

2.7K starsJun 10, 2026 pushProduction candidatePHPCrawler
$ npx skills add jae-jae/QueryList
#16

EasySpider

27 fitTrust 100Excellent 100

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

Excellent quality, 44K stars, and a 27 use-case fit score.

44K starsMay 22, 2026 pushProduction candidateJavaScriptCrawler
$ npx skills add NaiboWang/EasySpider
#17

Ferret

27 fitTrust 100Excellent 100

Declarative web scraping

Excellent quality, 6.0K stars, and a 27 use-case fit score.

6.0K starsJun 13, 2026 pushProduction candidateGoCrawler
$ npx skills add MontFerret/ferret
#18

Goclone

27 fitTrust 100Excellent 100

Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.

Excellent quality, 2.1K stars, and a 27 use-case fit score.

2.1K starsJun 2, 2026 pushProduction candidateGoCrawler
$ npx skills add goclone-dev/goclone
#19

Pydoll

25 fitTrust 100Excellent 100

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

Excellent quality, 6.9K stars, and a 25 use-case fit score.

6.9K starsMay 24, 2026 pushProduction candidatePythonBrowser Automation
$ npx skills add autoscrape-labs/pydoll
#20

Obscura

25 fitTrust 100Excellent 100

The headless browser for AI agents and web scraping

Excellent quality, 16K stars, and a 25 use-case fit score.

16K starsJun 12, 2026 pushProduction candidateRustBrowser Automation
$ npx skills add h4ckf0r0day/obscura
#21

Trafilatura

25 fitTrust 100Excellent 100

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Excellent quality, 6.1K stars, and a 25 use-case fit score.

6.1K starsJun 10, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add adbar/trafilatura
#22

Browser Use

25 fitTrust 100Excellent 100

Make AI agents interact with websites using natural language

Excellent quality, 97K stars, and a 25 use-case fit score.

97K starsJun 1, 2026 pushProduction candidateClaudeGPT-4
$ npx skills add browser-use/browser-use
#23

MediaCrawler

25 fitTrust 100Excellent 100

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 知乎问答文章 | 评论爬虫。支持多平台社交媒体内容抓取,提供完整的数据采集解决方案。

Excellent quality, 50K stars, and a 25 use-case fit score.

50K starsMay 25, 2026 pushProduction candidatepython
$ npx skills add NanmiCoder/MediaCrawler
#24

Toapi

24 fitTrust 100Excellent 100

Every web site provides APIs.

Excellent quality, 3.6K stars, and a 24 use-case fit score.

3.6K starsJun 4, 2026 pushProduction candidatePythonCrawler
$ npx skills add elliotgao2/toapi
#25

Awesome Web Scraping

24 fitTrust 100Excellent 100

List of libraries, tools and APIs for web scraping and data processing.

Excellent quality, 7.9K stars, and a 24 use-case fit score.

7.9K starsMay 28, 2026 pushProduction candidateMakefileWeb Automation
$ npx skills add lorien/awesome-web-scraping
#26

Fingerprint Suite

24 fitTrust 100Excellent 100

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

Excellent quality, 2.4K stars, and a 24 use-case fit score.

2.4K starsMay 20, 2026 pushProduction candidateTypeScriptPlaywright
$ npx skills add apify/fingerprint-suite
#27

Proxy Pool

23 fitTrust 100Excellent 100

Python ProxyPool for web spider

Excellent quality, 23K stars, and a 23 use-case fit score.

23K starsJun 9, 2026 pushProduction candidatePythonCrawler
$ npx skills add jhao104/proxy_pool
#28

Katana

23 fitTrust 100Excellent 100

A next-generation crawling and spidering framework.

Excellent quality, 17K stars, and a 23 use-case fit score.

17K starsJun 8, 2026 pushProduction candidateGoCrawler
$ npx skills add projectdiscovery/katana
#29

SeleniumBase

23 fitTrust 100Excellent 100

📊 Python's all-in-one framework for web crawling, scraping, and testing. Supports pytest. CDP Mode provides stealth. Includes many tools.

Excellent quality, 13K stars, and a 23 use-case fit score.

13K starsJun 12, 2026 pushProduction candidatePythonPlaywright
$ npx skills add seleniumbase/SeleniumBase
#30

Python

22 fitTrust 100Excellent 96

Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机

Excellent quality, 11K stars, and a 22 use-case fit score.

11K starsMay 9, 2026 pushProduction candidatePythonCrawler
$ npx skills add injetlee/Python

Selection method

How this list is ranked

OpenAgentSkill scores each candidate against the workflow keywords, then balances fit with GitHub stars, quality signals, trust profile, maintenance freshness, and whether there is a clear install path.

How does OpenAgentSkill rank web scraping?

The ranking combines workflow fit, quality score, trust profile, GitHub adoption, maintenance freshness, and whether a clear install path exists.

Should I install the top skill immediately?

No. Treat the list as a shortlist, open the skill detail page, inspect the repository and license, then test the install command in a sandbox workflow.

Can my agent consume this ranking through an API?

Yes. Use /api/skills/search with the related task or /api/agent/rankings?slug=best-web-scraping-skills to fetch ranked skill data.