🔥 Search, scrape, and clean the web for AI agents.
$ npx skills add firecrawl/firecrawlScrape, clean, and reuse web data
A practical stack for agents that crawl public pages, extract clean content, normalize data, and hand it to downstream research or RAG workflows.
Built for Growth, research, and data teams building repeatable web collection workflows.
Outcomes
Workflow map
Start with a crawler or browser skill that can discover and fetch target pages.
Use extraction skills to turn HTML, tables, and page metadata into structured text.
Add checks for freshness, duplicates, blocked pages, and schema consistency.
Send clean output into reports, databases, or knowledge-base ingestion.
Recommended stack
Ranked by workflow relevance, quality score, GitHub adoption, and maintenance freshness.
🔥 Search, scrape, and clean the web for AI agents.
$ npx skills add firecrawl/firecrawlCrawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
$ npx skills add apify/crawlee-python$ npx skills add watercrawl/WaterCrawlOpen-source LLM-friendly web crawler and scraper
$ npx skills add unclecode/crawl4aiA Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
$ npx skills add NikolaiT/GoogleScraper👾 Fast and simple video download library and CLI tool written in Go
$ npx skills add iawia002/luxElegant Scraper and Crawler Framework for Golang
$ npx skills add gocolly/collyOpen source web infrastructure for AI. Scrape, crawl, and automate the web, clean markdown, browser sessions, ready for your agents.
$ npx skills add vakra-dev/readerIdeal for
Avoid when