LLMWebCrawler
A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.
Supply asset profile
Research and knowledge work
Deep research, source comparison, literature review, RAG, knowledge search, and reports.
Scenario
RAG and knowledge
I need my agent to build a RAG workflow over documents and retrieve reliable context.
Agent fit
Claude Code + CLI + Codex
Codex, Claude Code, Cursor, CLI, or custom agents.
Install
Ready
npx skills add Aavache/LLMWebCrawler
Maintenance
stale
3y since push
Risk
Risky
License is unclear
GitHub quality
101
41/100 quality · 66/100 trust
Coverage tags
Review notes
License is unclear · Permission surface may require sandboxing
Agent adoption scorecard
Trust, audit, and install readiness at a glance
These scores combine public repository metadata, OpenAgentSkill review signals, maintenance freshness, and install readiness. They are a shortlist signal, not a replacement for human review.
Quality
Needs reviewInspect the repository carefully before adding it to an agent workflow.
Trust
Do not auto-installTrust Score v5 found insufficient evidence for agent installation. Treat this as discovery material, not an executable recommendation.
Audit
RiskyInstall readiness, security metadata, maintenance, and adoption risk.
Trust Score v5
Human review before install
Choose a stronger alternative or inspect the source manually before any install attempt.
Stars
101 GitHub stars
Repo activity
101 stars, 12 forks
Maintenance
3y since push
License
Unknown
Install
npx skills add Aavache/LLMWebCrawler
Install safety
standard package or runtime install path
Permission surface
filesystem or document access, network or browser access
Agent outcomes
No agent outcome data yet
Docs
Strong README/SKILL.md context
Risk summary
Review before production
- License is unclear
- Repository looks stale
- Quality score needs review
- Permission surface needs review: filesystem or document access, network or browser access
Install readiness
Install path available
- Install path is available
- Repository evidence is available
- License is unclear
- No Agent Proven outcome evidence yet
Agent-readable metadata
Machine-readable decision data for this skill.
Use this block or the embedded JSON to decide whether an agent should install this skill, choose an alternative, or ask for human review first.
Suited tasks
- RAG and knowledge workflows
- Claude Code teams
- builders willing to evaluate younger projects
- Chunk documents
Suited agents
Install decision
- Command
- npx skills add Aavache/LLMWebCrawler
- Policy
- block
- Human review
- yes
Trust and risk
- Trust
- 58/100
- Audit
- 57/100
- Risk level
- Risky
Outcome loop
- Endpoint
- /api/agent/outcome
- Event ID
- resolve
- Outcomes
- 5
Install command
npx skills add Aavache/LLMWebCrawlerDo not use when
- teams that require actively maintained dependencies
- production agents without a repository review
- Repository looks stale
- No OpenAgentSkill engagement data yet
- Audit risk risky exceeds max_risk=medium
Alternative
Generative AI For Beginners
112.2K stars
npx skills add microsoft/generative-ai-for-beginners
Alternative
Elasticsearch
77.0K stars
npx skills add elastic/elasticsearch
Alternative
Graphify
76.9K stars
npx skills add safishamsi/graphify
Alternative
Understand Anything
70.4K stars
npx skills add Egonex-AI/Understand-Anything
Agent safety v2
37/100 · Avoid automatic install
This skill should not be selected by an agent without explicit human security review.
Do not auto-install. Inspect the source, dependencies, and permission surface first.
medium
Network access
Skill likely fetches remote pages, APIs, repositories, or external services.
medium
Filesystem access
Skill may read or write project files, documents, generated artifacts, or local workspace state.
medium
Database access
Skill may inspect schemas, query databases, or work with persistent stores.
- Audit risk risky exceeds max_risk=medium
- License is unclear
Install targets
Install this skill in your agent workflow
Copy the registry command or an agent-specific install prompt for Codex, Claude Code, and Cursor.
OpenAgentSkill CLI
Use the registry command when your workflow supports the OpenAgentSkill installer.
$ npx skills add Aavache/LLMWebCrawlerAgent resolve plan
Let an agent verify fit before installing.
The Resolve API returns the selected skill, alternatives, safety policy, audit notes, install target, and copy-paste prompt an agent can follow without scraping this page.
Resolve JSON
/api/agent/resolve?task=Use%20LLMWebCrawler%20for%20an%20agent%20workflow&agent=codex&max_risk=medium
Resolve text
/api/agent/resolve?task=Use%20LLMWebCrawler%20for%20an%20agent%20workflow&agent=codex&max_risk=medium&format=text
Install handoff
/api/skills/aavache-llmwebcrawler/install
Agent should check
- Task fit and alternatives from Resolve API.
- Audit score, trust score, and safety policy warnings.
- Install target compatibility for Codex, Claude Code, Cursor, or CLI.
Copy prompt
Task: Use LLMWebCrawler in this workspace.
Resolve first: https://www.openagentskill.com/api/agent/resolve?task=Use%20LLMWebCrawler%20for%20an%20agent%20workflow&agent=codex&max_risk=medium
Review install handoff: https://www.openagentskill.com/api/skills/aavache-llmwebcrawler/install
Install command: npx skills add Aavache/LLMWebCrawler
Before running it, summarize audit warnings, required permissions, and the fallback skill if install is risky.Agent handoff
Give an agent the install path, not another directory page.
Use the public install endpoint to fetch the command, safety checklist, target prompts, and canonical links for this skill.
Install handoff
/api/skills/aavache-llmwebcrawler/install
LLM text format
/api/skills/aavache-llmwebcrawler/install?format=text
Find alternatives
/api/skills/search?q=LLMWebCrawler&limit=3
Agent prompt
Use LLMWebCrawler for this task. Review https://www.openagentskill.com/api/skills/aavache-llmwebcrawler/install, then install with: npx skills add Aavache/LLMWebCrawlerRegistry metadata
Agent-readable profile for automatic skill selection.
This page exposes the same decision, trust, audit, use-case, and install signals through the Registry API, so agents can rank this skill without scraping the UI.
Manifest
/api/registry/manifest/aavache-llmwebcrawler
LLM text
/api/registry/manifest/aavache-llmwebcrawler?format=text
Install alias
/api/registry/install/aavache-llmwebcrawler
Recommend
/api/registry/recommend?task=Use%20LLMWebCrawler%20in%20an%20agent%20workflow&limit=3
Agent fit
RAG and knowledge
Use-case tags
Platforms
Python, RAG, Claude Code
Audit report
Risky · 57/100
Review install readiness, maintenance, trust, quality, and metadata warnings before adding this skill to an agent workflow.
Agent decision cockpit
Needs validation for RAG and knowledge
Do a manual repository review before adding this to an agent workflow.
Role in stack
Needs validation
Primary fit
RAG and knowledge
Trust label
Needs manual review
Install path
Command ready
Use when
- RAG and knowledge workflows
- Claude Code teams
- builders willing to evaluate younger projects
Evidence
- install command or GitHub repo available
- 41/100 quality profile
Review first
- Repository looks stale
- No OpenAgentSkill engagement data yet
Implementation path
- 1Install it in a sandbox agent and run one RAG and knowledge task end to end.
- 2Compare output quality, latency, and failure behavior against at least one alternative.
- 3Promote it into production only after reviewing repository permissions, license, and maintenance signals.
Trust profile
Do not auto-install
Trust Score v5 found insufficient evidence for agent installation. Treat this as discovery material, not an executable recommendation.
GitHub adoption
INFO101 GitHub stars
Stars/forks activity
CHECK101 stars, 12 forks; issue activity unavailable in current metadata
Recent maintenance
FIX3y since push
License clarity
CHECKUnknown
Good signals
- AI review approved
- Install path is available
- Repository evidence is available
- Install command has no obvious high-risk pattern
- Outcome loop is ready but needs first real agent run
Review before install
- License is unclear
- Repository looks stale
- Quality score needs review
- Permission surface needs review: filesystem or document access, network or browser access
- Stars/forks activity: 101 stars, 12 forks; issue activity unavailable in current metadata
- Recent maintenance: 3y since push
- License clarity: Unknown
- Permission surface: filesystem or document access, network or browser access
- No real agent outcome reports yet
- Human review required before unattended installation
Recommended action
Choose a stronger alternative or inspect the source manually before any install attempt.
Quality profile
Needs review candidate for agent workflows
Inspect the repository carefully before adding it to an agent workflow.
Workflow fit
Use this skill in these scenarios
Search private knowledge
RAG and knowledge
I need my agent to build a RAG workflow over documents and retrieve reliable context.
Collect structured data
Web scraping
I need my agent to scrape websites and extract structured data from pages.
Build and ship code
Coding agents
I need a coding agent that can understand a repository, edit code, and review pull requests.
Stack fit
Add it to a complete workflow
Ingest, retrieve, and cite
RAG knowledge base
A stack for document-heavy agents that ingest files, create searchable knowledge, retrieve relevant context, and answer with grounded sources.
Scrape, clean, and reuse web data
Web data pipeline
A practical stack for agents that crawl public pages, extract clean content, normalize data, and hand it to downstream research or RAG workflows.
Turn skills into distribution
Content growth agent
A stack for turning newly indexed skills into SEO briefs, social drafts, comparison pages, and reusable publishing workflows.
Alternative shortlist
Compare before you install
Similar skills in this category, ranked with the same readiness and quality signals.
Generative AI For Beginners
21 Lessons, Get Started Building with Generative AI
Elasticsearch
Free and Open Source, Distributed, RESTful Search Engine
Graphify
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.
Understand Anything
Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.
Overview
A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.
Imported by the skill-only GitHub discovery pipeline because it matches agent skill, automation, domain workflow, RAG, document-processing, data, finance, security, or developer-tool signals. Protocol-server projects are excluded from automated imports.
Platform Compatibility
Technical Details
- Version
- 1.0.0
- License
- Unknown
- Last Updated
- 6/21/2026
- Published
- 6/21/2026
Frameworks & Tools
Decision snapshot
Needs validation
install command or GitHub repo available
Audit snapshot
Install review
Install and adoption review
- Security
- 77/100
- Maintenance
- 20/100
- Install
- 92/100
Agent-proven evidence
Agent Proven evidence
Outcome reports after resolve, review, install, and one narrow run.
- Success rate
- —
- Recent failure
- —
- Outcomes
- 0
- Output quality
- —
- Failed
- 0
- Not relevant
- 0
- Installs
- 0
- Risk blocked
- 0
- Setup needed
- 0
- Production
- 0
No agent outcome data yet. The first agent run can report success, setup needs, risk blocks, failure, or not-relevant through /api/agent/outcome.
Install
Add to agent workflow
Free and open source. Review the audit before production use.
Growth loop
Share kit
Scenario-led draft for LLMWebCrawler, ready for a manual X post.
Most web agents fail in the boring part: messy pages, missing context, repeatable extraction. LLMWebCrawler gives agents a cleaner path to browse, extract, and monitor web pages. 101 stars https://www.openagentskill.com/skills/aavache-llmwebcrawler?ref=x #AIAgents
Optional reply with install command
Listing + install path for LLMWebCrawler: https://www.openagentskill.com/skills/aavache-llmwebcrawler?ref=x Install: npx skills add Aavache/LLMWebCrawler
Listing source
Community indexed
This listing was indexed from public sources and is not marked official until a maintainer claim is approved.
- Creator
- Aavache
- Source
- Aavache/LLMWebCrawler
- Indexed by
- OpenAgentSkill community index
Attribution links to the public repository or creator profile. Creators can claim the listing to update ownership signals.
Claim this skillOwner claim
Claim this skill listing
This community indexed listing is attributed to Aavache but is not marked official yet. Claim it to add a verified owner signal and make future launch, install, and audit updates easier to trust.
README badge
Add this badge to your GitHub README to show the listing, trust score, and install handoff.
[](https://www.openagentskill.com/skills/aavache-llmwebcrawler)Author
Aavache
@aavache
Tags
Platform Fit
Health Signals
- GitHub stars
- 101
- Quality score
- 33/100
- Last GitHub push
- Oct 15, 2023
- Framework hints
- 2
- OpenAgentSkill views
- 0
- Install copies
- 0
- Outbound clicks
- 0
Community Signal
Share whether this skill looks useful for your agent workflow. Aggregated feedback improves rankings over time.
Trust & Safety
Do not auto-install
- GitHub adoption101 GitHub starsINFO
- Stars/forks activity101 stars, 12 forks; issue activity unavailable in current metadataCHECK
- Recent maintenance3y since pushFIX
- License clarityUnknownCHECK
- README/SKILL.md completenessMetadata includes enough usage and workflow contextPASS
- Dependency/runtime risknetwork or browser surface, database surfaceINFO
Related Skills
Generative AI For Beginners
21 Lessons, Get Started Building with Generative AI
112.2K stars · 0 installsElasticsearch
Free and Open Source, Distributed, RESTful Search Engine
77.0K stars · 0 installsGraphify
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.
76.9K stars · 0 installsUnderstand Anything
Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.
70.4K stars · 0 installs