Collect structured data
Crawl a documentation site
Find skills for crawling docs, converting HTML to markdown, preserving links, and preparing agent-ready source material.
Agent prompt
Find the best skill for crawling a documentation website and converting pages into clean markdown with useful metadata.
Best first install
WaterCrawl
Transform Web Content into LLM-Ready Data
Install with one command
$ npx skills add watercrawl/WaterCrawlInstall targets
Install this skill in your agent workflow
Copy the registry command or an agent-specific install prompt for Codex, Claude Code, and Cursor.
OpenAgentSkill CLI
Use the registry command when your workflow supports the OpenAgentSkill installer.
$ npx skills add watercrawl/WaterCrawlDecision guide
Use and avoid conditions
Success criteria
- Preserves source URLs
- Produces clean markdown
- Can limit crawl scope
Do not use when
- Docs block crawling
- The content is private without authorization
- You need pixel-perfect browser state
Alternatives
Compare before installing
MarkdownMonster
700An extensible Markdown Editor, Viewer and Weblog Publisher for Windows
Firecrawl
674The API to search, scrape, and interact with the web at scale. 🔥
AnyCrawl
659AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.
Quarkdown
642🪐 Markdown with superpowers: from ideas to papers, presentations, websites, books, and knowledge bases.
Trafilatura
639Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Crawler Illegal Cases In China
633Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。