Collect structured data

Web scraping and data extraction skills

Find skills for crawling websites, extracting structured data, monitoring pages, and turning messy web content into agent-ready inputs.

Try this task

I need my agent to scrape websites and extract structured data from pages.

Agent should be able to

  • +Crawl target URLs
  • +Extract tables and metadata
  • +Normalize messy page content

Recommended stack

Turn this use case into a workflow

Workflow map

What to build with these skills

01

Extract product data from websites

02

Monitor competitor pages

03

Turn HTML into clean markdown

04

Feed crawled content into RAG

Best first installs

Start with high-signal skills

18 matched skills

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

9.1K stars69 qualityMay 22, 2026 push
$ npx skills add apify/crawlee-python

Firecrawl

VERIFIED

🔥 Search, scrape, and clean the web for AI agents.

123K stars78 qualityMay 23, 2026 push
$ npx skills add firecrawl/firecrawl

Crawl4AI

VERIFIED

Open-source LLM-friendly web crawler and scraper

66K stars79 qualityMay 22, 2026 push
$ npx skills add unclecode/crawl4ai

Skill shortlist

More options for this use case

Browse full marketplace

WaterCrawl

web-automation

Transform Web Content into LLM-Ready Data

1.8K stars65 quality

Lux

web-automation

👾 Fast and simple video download library and CLI tool written in Go

31K stars72 quality

Colly

web-automation

Elegant Scraper and Crawler Framework for Golang

25K stars74 quality

GoogleScraper

web-automation

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

2.8K stars51 quality

Scrapling

web-automation

Adaptive web scraping for agent data collection

53K stars77 quality

Newspaper

web-automation

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

15K stars72 quality

Mlscraper

web-automation

🤖 Scrape data from HTML websites automatically by just providing examples

1.4K stars49 quality

Scrapecraft

web-automation

🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.

641 stars46 quality

Google Maps Scraper

web-automation

scrape data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place

4.1K stars67 quality

Awesome Crawler

web-automation

A collection of awesome web crawler,spider in different languages

7.2K stars54 quality

Amazon Scraper

web-automation

Free Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.

3.0K stars66 quality

Headless Chrome Crawler

web-automation

Distributed crawler powered by Headless Chrome

5.6K stars53 quality

QueryList

web-automation

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

2.7K stars66 quality

EasySpider

web-automation

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

44K stars76 quality

Google Play Scraper

web-automation

Node.js scraper to get data from Google Play

2.9K stars63 quality