Codex web data workflows

Best Codex skills for web scraping

Ranked OpenAgentSkill shortlist for Codex users crawling websites, extracting structured data, monitoring pages, and preparing web content for agents.

Codex users building web scraping, monitoring, RAG ingestion, and data extraction workflows. Ranked from the OpenAgentSkill index using quality, trust, freshness, adoption, and install readiness.

best Codex skills for web scrapingCodex
30
Ranked
902K
Stars
96
Top trust

Search intent

Find Codex-ready skills for web scraping, crawling, browser automation, markdown extraction, and structured data collection.

These pages are generated from real registry records. The list below is not a generic article; every row links to a skill profile with install, trust, audit, and risk fields.

#1

Crawlee

37 fitTrust 96Excellent 100Audit 95 · Safe to try

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Excellent quality, 24K stars, and a 37 use-case fit score.

Best suited scenario

Crawl target URLs

24K starsJun 12, 2026 pushProduction candidateTypeScriptPlaywright
$ npx skills add apify/crawlee
#2

Crawlee Python

36 fitTrust 94Excellent 100Audit 95 · Safe to try

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Excellent quality, 9.2K stars, and a 36 use-case fit score.

Best suited scenario

Crawl target URLs

9.2K starsJun 15, 2026 pushProduction candidatePythonPlaywright
$ npx skills add apify/crawlee-python
#3

Firecrawl

34 fitTrust 95Excellent 100Audit 95 · Safe to try

The API to search, scrape, and interact with the web at scale. 🔥

Excellent quality, 132K stars, and a 34 use-case fit score.

Best suited scenario

Crawl target URLs

132K starsJun 12, 2026 pushProduction candidateTypeScriptAI Agents
$ npx skills add firecrawl/firecrawl
#4

Scrapegraph AI

33 fitTrust 93Excellent 100Audit 95 · Safe to try

Python scraper based on AI

Excellent quality, 27K stars, and a 33 use-case fit score.

Best suited scenario

Crawl target URLs

27K starsJun 15, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add ScrapeGraphAI/Scrapegraph-ai
#5

Maxun

31 fitTrust 97Excellent 100Audit 96 · Safe to try

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

Excellent quality, 16K stars, and a 31 use-case fit score.

Best suited scenario

Crawl target URLs

16K starsJun 11, 2026 pushProduction candidateTypeScriptBrowser Automation
$ npx skills add getmaxun/maxun
#6

Crawl4AI

31 fitTrust 97Excellent 100Audit 96 · Safe to try

Open-source LLM-friendly web crawler and scraper

Excellent quality, 66K stars, and a 31 use-case fit score.

Best suited scenario

Crawl target URLs

66K starsMay 22, 2026 pushProduction candidateClaudeGPT-4
$ npx skills add unclecode/crawl4ai
#7

Lux

30 fitTrust 91Excellent 100Audit 92 · Safe to try

👾 Fast and simple video download library and CLI tool written in Go

Excellent quality, 31K stars, and a 30 use-case fit score.

Best suited scenario

Crawl target URLs

31K starsMar 29, 2026 pushProduction candidateGoCrawler
$ npx skills add iawia002/lux
#8

Colly

30 fitTrust 96Excellent 100Audit 96 · Safe to try

Elegant Scraper and Crawler Framework for Golang

Excellent quality, 25K stars, and a 30 use-case fit score.

Best suited scenario

Crawl target URLs

25K starsMay 25, 2026 pushProduction candidateGoCrawler
$ npx skills add gocolly/colly
#9

Scrapling

29 fitTrust 88Excellent 100Audit 91 · Needs review

Adaptive web scraping for agent data collection

Excellent quality, 54K stars, and a 29 use-case fit score.

Best suited scenario

Crawl target URLs

54K starsMay 18, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add D4Vinci/Scrapling
#10

Newspaper

29 fitTrust 97Excellent 100Audit 95 · Safe to try

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Excellent quality, 15K stars, and a 29 use-case fit score.

Best suited scenario

Crawl target URLs

15K starsMay 13, 2026 pushProduction candidatePythonCrawler
$ npx skills add codelucas/newspaper
#11

EasySpider

27 fitTrust 99Excellent 100Audit 97 · Safe to try

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

Excellent quality, 44K stars, and a 27 use-case fit score.

Best suited scenario

Crawl target URLs

44K starsMay 22, 2026 pushProduction candidateJavaScriptCrawler
$ npx skills add NaiboWang/EasySpider
#12

Ferret

27 fitTrust 88Excellent 100Audit 93 · Safe to try

Declarative web scraping

Excellent quality, 6.0K stars, and a 27 use-case fit score.

Best suited scenario

Crawl target URLs

6.0K starsJun 13, 2026 pushProduction candidateGoCrawler
$ npx skills add MontFerret/ferret
#13

Pydoll

25 fitTrust 95Excellent 100Audit 95 · Safe to try

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

Excellent quality, 6.9K stars, and a 25 use-case fit score.

Best suited scenario

Crawl target URLs

6.9K starsMay 24, 2026 pushProduction candidatePythonBrowser Automation
$ npx skills add autoscrape-labs/pydoll
#14

Obscura

25 fitTrust 95Excellent 100Audit 95 · Safe to try

The headless browser for AI agents and web scraping

Excellent quality, 16K stars, and a 25 use-case fit score.

Best suited scenario

Crawl target URLs

16K starsJun 13, 2026 pushProduction candidateRustBrowser Automation
$ npx skills add h4ckf0r0day/obscura
#15

Trafilatura

25 fitTrust 95Excellent 100Audit 95 · Safe to try

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Excellent quality, 6.1K stars, and a 25 use-case fit score.

Best suited scenario

Crawl target URLs

6.1K starsJun 10, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add adbar/trafilatura
#16

Browser Use

25 fitTrust 92Excellent 100Audit 94 · Safe to try

Make AI agents interact with websites using natural language

Excellent quality, 97K stars, and a 25 use-case fit score.

Best suited scenario

Crawl target URLs

97K starsJun 1, 2026 pushProduction candidateClaudeGPT-4
$ npx skills add browser-use/browser-use
#17

Awesome Web Scraping

24 fitTrust 91Excellent 100Audit 93 · Safe to try

List of libraries, tools and APIs for web scraping and data processing.

Excellent quality, 7.9K stars, and a 24 use-case fit score.

Best suited scenario

Crawl target URLs

7.9K starsMay 28, 2026 pushProduction candidateMakefileWeb Automation
$ npx skills add lorien/awesome-web-scraping
#18

Proxy Pool

23 fitTrust 94Excellent 100Audit 95 · Safe to try

Python ProxyPool for web spider

Excellent quality, 23K stars, and a 23 use-case fit score.

Best suited scenario

Crawl target URLs

23K starsJun 9, 2026 pushProduction candidatePythonCrawler
$ npx skills add jhao104/proxy_pool
#19

Katana

23 fitTrust 93Excellent 100Audit 94 · Safe to try

A next-generation crawling and spidering framework.

Excellent quality, 17K stars, and a 23 use-case fit score.

Best suited scenario

Crawl target URLs

17K starsJun 15, 2026 pushProduction candidateGoCrawler
$ npx skills add projectdiscovery/katana
#20

Python

22 fitTrust 88Excellent 96Audit 89 · Needs review

Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机

Excellent quality, 11K stars, and a 22 use-case fit score.

Best suited scenario

Crawl target URLs

11K starsMay 9, 2026 pushProduction candidatePythonCrawler
$ npx skills add injetlee/Python
#21

Wiseflow

22 fitTrust 88Excellent 100Audit 92 · Needs review

为所有人准备的AI搞钱团队,帮你把经验和方法跑成一门生意。

Excellent quality, 8.2K stars, and a 22 use-case fit score.

Best suited scenario

Crawl target URLs

8.2K starsJun 5, 2026 pushProduction candidateTypeScriptCrawler
$ npx skills add TeamWiseFlow/wiseflow
#22

Llm Scraper

22 fitTrust 92Excellent 100Audit 94 · Safe to try

Turn any webpage into structured data using LLMs

Excellent quality, 6.8K stars, and a 22 use-case fit score.

Best suited scenario

Crawl target URLs

6.8K starsJun 1, 2026 pushProduction candidateTypeScriptBrowser Automation
$ npx skills add mishushakov/llm-scraper
#23

Node Crawler

22 fitTrust 95Excellent 100Audit 96 · Safe to try

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

Excellent quality, 6.8K stars, and a 22 use-case fit score.

Best suited scenario

Crawl target URLs

6.8K starsJun 4, 2026 pushProduction candidateTypeScriptCrawler
$ npx skills add bda-research/node-crawler
#24

JMComic Crawler Python

22 fitTrust 95Excellent 100Audit 95 · Safe to try

Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

Excellent quality, 6.0K stars, and a 22 use-case fit score.

Best suited scenario

Crawl target URLs

6.0K starsJun 11, 2026 pushProduction candidatePythonCrawler
$ npx skills add hect0x7/JMComic-Crawler-Python
#25

Curl Cffi

22 fitTrust 94Excellent 100Audit 95 · Safe to try

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

Excellent quality, 5.8K stars, and a 22 use-case fit score.

Best suited scenario

Crawl target URLs

5.8K starsJun 7, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add lexiforest/curl_cffi
#26

Scrapy Redis

22 fitTrust 94Excellent 100Audit 95 · Safe to try

Redis-based components for Scrapy.

Excellent quality, 5.6K stars, and a 22 use-case fit score.

Best suited scenario

Crawl target URLs

5.6K starsMay 19, 2026 pushProduction candidatePythonCrawler
$ npx skills add rmax/scrapy-redis
#27

Playwright

22 fitTrust 97Excellent 100Audit 96 · Safe to try

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

Excellent quality, 91K stars, and a 22 use-case fit score.

Best suited scenario

Navigate pages

91K starsJun 14, 2026 pushProduction candidateTypeScriptPlaywright
$ npx skills add microsoft/playwright
#28

SeleniumBase

22 fitTrust 96Excellent 100Audit 95 · Safe to try

SeleniumBase is a framework for UI Testing, Web Scraping, and Stealth. Passes every bot-detection test with CDP Mode, and extends Playwright.

Excellent quality, 13K stars, and a 22 use-case fit score.

Best suited scenario

Navigate pages

13K starsJun 13, 2026 pushProduction candidatePythonTesting
$ npx skills add seleniumbase/SeleniumBase
#29

Scrapy

21 fitTrust 96Excellent 100Audit 96 · Safe to try

Scrapy, a fast high-level web crawling & scraping framework for Python.

Excellent quality, 62K stars, and a 21 use-case fit score.

Best suited scenario

Crawl target URLs

62K starsJun 11, 2026 pushProduction candidatePythonWeb Automation
$ npx skills add scrapy/scrapy
#30

Docling

21 fitTrust 92Excellent 100Audit 95 · Safe to try

Get your documents ready for gen AI

Excellent quality, 62K stars, and a 21 use-case fit score.

Best suited scenario

Read uploaded files

62K starsJun 15, 2026 pushProduction candidatePythonPDF
$ npx skills add docling-project/docling

Selection method

How this list is ranked

OpenAgentSkill scores each candidate against the workflow keywords, then balances fit with GitHub stars, quality signals, trust profile, maintenance freshness, and whether there is a clear install path.

How does OpenAgentSkill rank codex web scraping?

The ranking combines workflow fit, quality score, trust profile, GitHub adoption, maintenance freshness, and whether a clear install path exists.

Should I install the top skill immediately?

No. Treat the list as a shortlist, open the skill detail page, inspect the repository and license, then test the install command in a sandbox workflow.

Can my agent consume this ranking through an API?

Yes. Use /api/skills/search with the related task or /api/agent/rankings?slug=best-codex-web-scraping-skills to fetch ranked skill data.