LLMWebCrawler

REVIEW · 58

Community indexed

A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.

Downloads0

Stars101

Version1.0.0

Quality41/100 · Needs review

Trust58/100 · Do not auto-install

Audit57/100 · Risky

Supply asset profile

Research and knowledge work

Deep research, source comparison, literature review, RAG, knowledge search, and reports.

Browse track

Scenario

RAG and knowledge

I need my agent to build a RAG workflow over documents and retrieve reliable context.

Agent fit

Claude Code + CLI + Codex

Codex, Claude Code, Cursor, CLI, or custom agents.

Install

Ready

npx skills add Aavache/LLMWebCrawler

Maintenance

stale

3y since push

Risk

Risky

License is unclear

GitHub quality

101

41/100 quality · 66/100 trust

Coverage tags

ResearchRAG and knowledgerag-knowledgeragretrieval

Review notes

License is unclear · Permission surface may require sandboxing

Agent adoption scorecard

Trust, audit, and install readiness at a glance

These scores combine public repository metadata, OpenAgentSkill review signals, maintenance freshness, and install readiness. They are a shortlist signal, not a replacement for human review.

Quality

Needs review

Inspect the repository carefully before adding it to an agent workflow.

Trust

Do not auto-install

Trust Score v5 found insufficient evidence for agent installation. Treat this as discovery material, not an executable recommendation.

Audit

Risky

Install readiness, security metadata, maintenance, and adoption risk.

Trust Score v5

Human review before install

Choose a stronger alternative or inspect the source manually before any install attempt.

PythonRAGCodexClaude CodeCursor

Stars

101 GitHub stars

Repo activity

101 stars, 12 forks

Maintenance

3y since push

License

Unknown

Install

npx skills add Aavache/LLMWebCrawler

Install safety

standard package or runtime install path

Permission surface

filesystem or document access, network or browser access

Agent outcomes

No agent outcome data yet

Docs

Strong README/SKILL.md context

Risk summary

Review before production

License is unclear
Repository looks stale
Quality score needs review
Permission surface needs review: filesystem or document access, network or browser access

Install readiness

Install path available

Install path is available
Repository evidence is available
License is unclear
No Agent Proven outcome evidence yet

Agent-readable metadata

Machine-readable decision data for this skill.

Use this block or the embedded JSON to decide whether an agent should install this skill, choose an alternative, or ask for human review first.

Open JSON

Suited tasks

RAG and knowledge workflows
Claude Code teams
builders willing to evaluate younger projects
Chunk documents

Suited agents

PythonRAGCodexClaude CodeCursorOpenAgentSkill CLICLI

Install decision

Command: npx skills add Aavache/LLMWebCrawler
Policy: block
Human review: yes

Trust and risk

Trust: 58/100
Audit: 57/100
Risk level: Risky

Outcome loop

Endpoint: /api/agent/outcome
Event ID: resolve
Outcomes: 5

Install command

npx skills add Aavache/LLMWebCrawler

Public audit Eval report Resolve API Install handoff

Do not use when

teams that require actively maintained dependencies
production agents without a repository review
Repository looks stale
No OpenAgentSkill engagement data yet
Audit risk risky exceeds max_risk=medium

Alternative

Generative AI For Beginners

112.2K stars

npx skills add microsoft/generative-ai-for-beginners

Alternative

Elasticsearch

77.0K stars

npx skills add elastic/elasticsearch

Alternative

Graphify

76.9K stars

npx skills add safishamsi/graphify

Alternative

Understand Anything

70.4K stars

npx skills add Egonex-AI/Understand-Anything

Agent safety v2

37/100 · Avoid automatic install

Blocked for auto-installblock

This skill should not be selected by an agent without explicit human security review.

Do not auto-install. Inspect the source, dependencies, and permission surface first.

Resolve via API

medium

Network access

Skill likely fetches remote pages, APIs, repositories, or external services.

medium

Filesystem access

Skill may read or write project files, documents, generated artifacts, or local workspace state.

medium

Database access

Skill may inspect schemas, query databases, or work with persistent stores.

Audit risk risky exceeds max_risk=medium
License is unclear

Install targets

Install this skill in your agent workflow

Copy the registry command or an agent-specific install prompt for Codex, Claude Code, and Cursor.

skill install

OpenAgentSkill CLI

Use the registry command when your workflow supports the OpenAgentSkill installer.

$ npx skills add Aavache/LLMWebCrawler

Agent resolve plan

Let an agent verify fit before installing.

The Resolve API returns the selected skill, alternatives, safety policy, audit notes, install target, and copy-paste prompt an agent can follow without scraping this page.

Open text plan

Resolve JSON

/api/agent/resolve?task=Use%20LLMWebCrawler%20for%20an%20agent%20workflow&agent=codex&max_risk=medium

Resolve text

/api/agent/resolve?task=Use%20LLMWebCrawler%20for%20an%20agent%20workflow&agent=codex&max_risk=medium&format=text

Install handoff

/api/skills/aavache-llmwebcrawler/install

Agent should check

Task fit and alternatives from Resolve API.
Audit score, trust score, and safety policy warnings.
Install target compatibility for Codex, Claude Code, Cursor, or CLI.

Copy prompt

Task: Use LLMWebCrawler in this workspace.
Resolve first: https://www.openagentskill.com/api/agent/resolve?task=Use%20LLMWebCrawler%20for%20an%20agent%20workflow&agent=codex&max_risk=medium
Review install handoff: https://www.openagentskill.com/api/skills/aavache-llmwebcrawler/install
Install command: npx skills add Aavache/LLMWebCrawler
Before running it, summarize audit warnings, required permissions, and the fallback skill if install is risky.

Agent handoff

Give an agent the install path, not another directory page.

Use the public install endpoint to fetch the command, safety checklist, target prompts, and canonical links for this skill.

Open install API

Install handoff

/api/skills/aavache-llmwebcrawler/install

LLM text format

/api/skills/aavache-llmwebcrawler/install?format=text

Find alternatives

/api/skills/search?q=LLMWebCrawler&limit=3

Agent prompt

Use LLMWebCrawler for this task. Review https://www.openagentskill.com/api/skills/aavache-llmwebcrawler/install, then install with: npx skills add Aavache/LLMWebCrawler

Registry metadata

Agent-readable profile for automatic skill selection.

This page exposes the same decision, trust, audit, use-case, and install signals through the Registry API, so agents can rank this skill without scraping the UI.

Open manifest

Manifest

/api/registry/manifest/aavache-llmwebcrawler

LLM text

/api/registry/manifest/aavache-llmwebcrawler?format=text

Install alias

/api/registry/install/aavache-llmwebcrawler

Recommend

/api/registry/recommend?task=Use%20LLMWebCrawler%20in%20an%20agent%20workflow&limit=3

Agent fit

31/100

RAG and knowledge

Use-case tags

RAG and knowledge Web scraping Coding agents

Platforms

Python, RAG, Claude Code

Audit report

Risky · 57/100

Review install readiness, maintenance, trust, quality, and metadata warnings before adding this skill to an agent workflow.

View audit report View eval report

Agent decision cockpit

Needs validation for RAG and knowledge

Do a manual repository review before adding this to an agent workflow.

Readiness

Review

Stage

Role in stack

Needs validation

Primary fit

RAG and knowledge

Trust label

Needs manual review

Install path

Command ready

Use when

RAG and knowledge workflows
Claude Code teams
builders willing to evaluate younger projects

Evidence

install command or GitHub repo available
41/100 quality profile

Review first

Repository looks stale
No OpenAgentSkill engagement data yet

Implementation path

1Install it in a sandbox agent and run one RAG and knowledge task end to end.
2Compare output quality, latency, and failure behavior against at least one alternative.
3Promote it into production only after reviewing repository permissions, license, and maintenance signals.

Trust profile

Do not auto-install

Trust Score v5 found insufficient evidence for agent installation. Treat this as discovery material, not an executable recommendation.

Trust score

GitHub adoption

INFO

101 GitHub stars

Stars/forks activity

CHECK

101 stars, 12 forks; issue activity unavailable in current metadata

Recent maintenance

FIX

3y since push

License clarity

CHECK

Unknown

Good signals

AI review approved
Install path is available
Repository evidence is available
Install command has no obvious high-risk pattern
Outcome loop is ready but needs first real agent run

Review before install

License is unclear
Repository looks stale
Quality score needs review
Permission surface needs review: filesystem or document access, network or browser access
Stars/forks activity: 101 stars, 12 forks; issue activity unavailable in current metadata
Recent maintenance: 3y since push
License clarity: Unknown
Permission surface: filesystem or document access, network or browser access
No real agent outcome reports yet
Human review required before unattended installation

Recommended action

Choose a stronger alternative or inspect the source manually before any install attempt.

Quality profile

Needs review candidate for agent workflows

Inspect the repository carefully before adding it to an agent workflow.

GitHub stars

101

Freshness

3y ago

Install ready

Yes

License

Unknown

Check before install: Repository looks stale

Workflow fit

Use this skill in these scenarios

Search private knowledge

RAG and knowledge

I need my agent to build a RAG workflow over documents and retrieve reliable context.

Collect structured data

Web scraping

I need my agent to scrape websites and extract structured data from pages.

Build and ship code

Coding agents

I need a coding agent that can understand a repository, edit code, and review pull requests.

Stack fit

Add it to a complete workflow

Ingest, retrieve, and cite

RAG knowledge base

A stack for document-heavy agents that ingest files, create searchable knowledge, retrieve relevant context, and answer with grounded sources.

Scrape, clean, and reuse web data

Web data pipeline

A practical stack for agents that crawl public pages, extract clean content, normalize data, and hand it to downstream research or RAG workflows.

Turn skills into distribution

Content growth agent

A stack for turning newly indexed skills into SEO briefs, social drafts, comparison pages, and reusable publishing workflows.

Alternative shortlist

Compare before you install

Similar skills in this category, ranked with the same readiness and quality signals.

Compare all

Generative AI For Beginners

21 Lessons, Get Started Building with Generative AI

Elasticsearch

Free and Open Source, Distributed, RESTful Search Engine

Graphify

AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

Understand Anything

Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.

Overview

A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.

Imported by the skill-only GitHub discovery pipeline because it matches agent skill, automation, domain workflow, RAG, document-processing, data, finance, security, or developer-tool signals. Protocol-server projects are excluded from automated imports.

Platform Compatibility

pythonFULL

ragFULL

Technical Details

Version: 1.0.0
License: Unknown
Last Updated: 6/21/2026
Published: 6/21/2026

Frameworks & Tools

PythonRAG

Decision snapshot

Needs validation

Ready

Review

Stage

install command or GitHub repo available

Audit snapshot

Install review

Install and adoption review

Risky

Security: 77/100
Maintenance: 20/100
Install: 92/100

Open full audit Open eval report

Agent-proven evidence

Agent Proven evidence

Outcome reports after resolve, review, install, and one narrow run.

Proven

Needs first agent runAuto-install: review firstLast: Unknown

Success rate: —
Recent failure: —
Outcomes: 0
Output quality: —
Failed: 0
Not relevant: 0
Installs: 0
Risk blocked: 0
Setup needed: 0
Production: 0

No agent outcome data yet. The first agent run can report success, setup needs, risk blocks, failure, or not-relevant through /api/agent/outcome.

Agent-Proven ranking Outcome contract

Install

Add to agent workflow

Free and open source. Review the audit before production use.

Compare Alternatives Auto-resolve Plan View on GitHub Documentation

Growth loop

Share kit

Scenario-led draft for LLMWebCrawler, ready for a manual X post.

Curator note

Most web agents fail in the boring part: messy pages, missing context, repeatable extraction.

LLMWebCrawler gives agents a cleaner path to browse, extract, and monitor web pages.

101 stars

https://www.openagentskill.com/skills/aavache-llmwebcrawler?ref=x
#AIAgents

Open X draft

Optional reply with install command

Listing + install path for LLMWebCrawler:
https://www.openagentskill.com/skills/aavache-llmwebcrawler?ref=x

Install: npx skills add Aavache/LLMWebCrawler

Open reply draft

Listing source

Community indexed

Claimable

This listing was indexed from public sources and is not marked official until a maintainer claim is approved.

Creator: Aavache
Source: Aavache/LLMWebCrawler
Indexed by: OpenAgentSkill community index

Attribution links to the public repository or creator profile. Creators can claim the listing to update ownership signals.

Claim this skill

Owner claim

Claim this skill listing

This community indexed listing is attributed to Aavache but is not marked official yet. Claim it to add a verified owner signal and make future launch, install, and audit updates easier to trust.

README badge

Add this badge to your GitHub README to show the listing, trust score, and install handoff.

[![OpenAgentSkill](https://www.openagentskill.com/api/badge/aavache-llmwebcrawler)](https://www.openagentskill.com/skills/aavache-llmwebcrawler)

Preview badge Audit badge

Author

Aavache

@aavache

Platform Fit

Claude Code

Health Signals

GitHub stars: 101
Quality score: 33/100
Last GitHub push: Oct 15, 2023
Framework hints: 2
OpenAgentSkill views: 0
Install copies: 0
Outbound clicks: 0

Community Signal

Share whether this skill looks useful for your agent workflow. Aggregated feedback improves rankings over time.

Trust & Safety

Do not auto-install

GitHub adoption101 GitHub starsINFO
Stars/forks activity101 stars, 12 forks; issue activity unavailable in current metadataCHECK
Recent maintenance3y since pushFIX
License clarityUnknownCHECK
README/SKILL.md completenessMetadata includes enough usage and workflow contextPASS
Dependency/runtime risknetwork or browser surface, database surfaceINFO

Related Skills

Generative AI For Beginners

21 Lessons, Get Started Building with Generative AI

112.2K stars · 0 installs

Elasticsearch

Free and Open Source, Distributed, RESTful Search Engine

77.0K stars · 0 installs

Graphify

76.9K stars · 0 installs

Understand Anything

70.4K stars · 0 installs

LLMWebCrawler

Research and knowledge work

Trust, audit, and install readiness at a glance

Human review before install

Review before production

Install path available

Machine-readable decision data for this skill.

Generative AI For Beginners

Elasticsearch

Graphify

Understand Anything

37/100 · Avoid automatic install

Network access

Filesystem access

Database access

Install this skill in your agent workflow

OpenAgentSkill CLI

Let an agent verify fit before installing.

Give an agent the install path, not another directory page.

Agent-readable profile for automatic skill selection.

Risky · 57/100

Needs validation for RAG and knowledge

Do not auto-install

Needs review candidate for agent workflows

Use this skill in these scenarios

RAG and knowledge

Web scraping

Coding agents

Add it to a complete workflow

RAG knowledge base

Web data pipeline

Content growth agent

Compare before you install

Generative AI For Beginners

Elasticsearch

Graphify

Understand Anything

Overview

Platform Compatibility

Technical Details

Needs validation

Install review

Agent Proven evidence

Add to agent workflow

Share kit

Community indexed

Claim this skill listing

README badge

Author

Tags

Platform Fit

Health Signals

Community Signal

Trust & Safety

Related Skills