{"slug":"onejune2018-awesome-llm-eval","name":"Awesome LLM Eval","description":"Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs.  一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.","tagline":"Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs.  一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.","category":"rag-knowledge","tags":["rag","retrieval","knowledge","awsome-list","awsome-lists","benchmark","bert","chatglm","chatgpt","dataset"],"author":{"name":"onejune2018","verified":false,"url":"https://github.com/onejune2018"},"attribution":{"status":"community_indexed","statusLabel":"Community indexed","shortLabel":"COMMUNITY INDEXED","sourceLabel":"GitHub star discovery","sourceDetail":"onejune2018/Awesome-LLM-Eval","creatorName":"onejune2018","creatorUrl":"https://github.com/onejune2018","sourceUrl":"https://github.com/onejune2018/Awesome-LLM-Eval","indexedBy":"OpenAgentSkill community index","claimUrl":"https://www.openagentskill.com/skills/onejune2018-awesome-llm-eval#claim-this-skill","claimCta":"Claim this skill","trustNote":"This listing was indexed from public sources and is not marked official until a maintainer claim is approved.","publicNote":"Attribution links to the public repository or creator profile. Creators can claim the listing to update ownership signals."},"stats":{"stars":642,"forks":76,"downloads":0,"rating":0,"review_count":0,"quality_score":42.36},"quality":{"score":67,"tier":"promising","label":"Promising","summary":"Useful candidate, but compare it with alternatives before adopting.","signals":[{"label":"GitHub stars","value":"642","tone":"positive"},{"label":"Freshness","value":"7mo ago","tone":"positive"},{"label":"Install ready","value":"Yes","tone":"positive"},{"label":"License","value":"MIT","tone":"neutral"}],"warnings":[]},"trust":{"version":"trust-score-v3","score":80,"tier":"strong","label":"Strong shortlist","summary":"Good trust signals with a few areas worth checking before rollout.","recommendedAction":"Test in a sandbox workflow and compare its install path with close alternatives.","dimensions":[{"id":"github_adoption","label":"GitHub adoption","score":76,"weight":0.13,"status":"info","detail":"642 GitHub stars"},{"id":"repo_activity","label":"Stars/forks activity","score":71,"weight":0.08,"status":"info","detail":"642 stars, 76 forks; issue activity unavailable in current metadata"},{"id":"maintenance","label":"Recent maintenance","score":62,"weight":0.14,"status":"info","detail":"7mo since push"},{"id":"license","label":"License clarity","score":86,"weight":0.09,"status":"pass","detail":"MIT"},{"id":"documentation","label":"README/SKILL.md completeness","score":90,"weight":0.14,"status":"pass","detail":"Metadata includes enough usage and workflow context"},{"id":"dependency_risk","label":"Dependency/runtime risk","score":90,"weight":0.12,"status":"pass","detail":"no major dependency risk hints in public metadata"},{"id":"installability","label":"Install availability","score":92,"weight":0.1,"status":"pass","detail":"npx skills add onejune2018/Awesome-LLM-Eval"},{"id":"install_safety","label":"Install command safety","score":68,"weight":0.1,"status":"info","detail":"dynamic command execution, standard package or runtime install path"},{"id":"permission_surface","label":"Permission surface","score":86,"weight":0.07,"status":"pass","detail":"filesystem or document access"},{"id":"repository","label":"Repository evidence","score":86,"weight":0.04,"status":"pass","detail":"https://github.com/onejune2018/Awesome-LLM-Eval"},{"id":"review_status","label":"Review status","score":88,"weight":0.05,"status":"pass","detail":"AI review data available"}],"checks":[{"status":"info","label":"GitHub adoption","detail":"642 GitHub stars"},{"status":"info","label":"Stars/forks activity","detail":"642 stars, 76 forks; issue activity unavailable in current metadata"},{"status":"info","label":"Recent maintenance","detail":"7mo since push"},{"status":"pass","label":"License clarity","detail":"MIT"},{"status":"pass","label":"README/SKILL.md completeness","detail":"Metadata includes enough usage and workflow context"},{"status":"pass","label":"Dependency/runtime risk","detail":"no major dependency risk hints in public metadata"},{"status":"pass","label":"Install availability","detail":"npx skills add onejune2018/Awesome-LLM-Eval"},{"status":"info","label":"Install command safety","detail":"dynamic command execution, standard package or runtime install path"},{"status":"pass","label":"Permission surface","detail":"filesystem or document access"},{"status":"pass","label":"Repository evidence","detail":"https://github.com/onejune2018/Awesome-LLM-Eval"},{"status":"pass","label":"Review status","detail":"AI review data available"},{"status":"warn","label":"Ownership","detail":"No approved owner claim yet"},{"status":"info","label":"OpenAgentSkill usage","detail":"No local usage activity yet"}],"strengths":["AI review approved","Install path is available","Repository evidence is available","Meaningful GitHub adoption signal","Install command has no obvious high-risk pattern"],"warnings":["Quality score needs review"],"evidence":{"stars":"642 GitHub stars","repoActivity":"642 stars, 76 forks","lastPushed":"7mo since push","license":"MIT","repository":"https://github.com/onejune2018/Awesome-LLM-Eval","install":"npx skills add onejune2018/Awesome-LLM-Eval","installSafety":"dynamic command execution, standard package or runtime install path","permissionSurface":"filesystem or document access","documentation":"Strong README/SKILL.md context"},"installReadiness":{"ready":true,"command":"npx skills add onejune2018/Awesome-LLM-Eval","policy":"human_review_before_install","label":"Human review before install","notes":["Install path is available","Repository evidence is available","License is declared","7mo since push"]},"agentCompatibility":["RAG","Codex","Claude Code","Cursor","OpenAgentSkill CLI"],"riskSummary":{"level":"medium","label":"Review before production","notes":["Quality score needs review"]}},"safety":{"score":60,"level":"review_before_install","label":"Review before install","safety_tier":{"tier":"reviewed","label":"Reviewed with permission notes","badge":"REVIEWED","summary":"Usable candidate, but the agent should surface permission and audit notes before installation.","recommended_action":"Require human approval before installing into a real workspace.","auto_install_policy":"review","reasons":["Quality score needs review","60/100 agent safety score"]},"auto_install_allowed":false,"human_review_required":true,"blocked":false,"audit_risk":"needs_review","permission_hints":[{"id":"network","label":"Network access","reason":"Skill likely fetches remote pages, APIs, repositories, or external services.","severity":"medium"},{"id":"filesystem","label":"Filesystem access","reason":"Skill may read or write project files, documents, generated artifacts, or local workspace state.","severity":"medium"}],"policy_warnings":["Quality score needs review"],"constraints_applied":{"max_risk":"medium","needs_install_command":true,"min_stars":0}},"safety_gate":{"tier":"reviewed","label":"Reviewed with permission notes","badge":"REVIEWED","auto_install_policy":"review","auto_install_allowed":false,"human_review_required":true,"blocked":false,"recommended_action":"Require human approval before installing into a real workspace.","reasons":["Quality score needs review","60/100 agent safety score"]},"supply_profile":{"track":{"slug":"research","label":"Research and knowledge work","shortLabel":"Research","description":"Deep research, source comparison, literature review, RAG, knowledge search, and reports."},"scenario":{"label":"RAG and knowledge","description":"I need my agent to build a RAG workflow over documents and retrieve reliable context.","useCases":[{"slug":"rag-knowledge","title":"RAG and knowledge"},{"slug":"coding-agents","title":"Coding agents"},{"slug":"workflow-automation","title":"Workflow automation"}]},"applicableAgents":["Claude Code","OpenAI Agents","CLI","Codex","Cursor"],"install":{"ready":true,"command":"npx skills add onejune2018/Awesome-LLM-Eval","primaryTarget":"CLI","targetCount":4},"githubQuality":{"stars":642,"starsLabel":"642","forks":76,"license":"MIT","qualityScore":67,"trustScore":80,"auditScore":76},"maintenance":{"status":"stable","label":"7mo since push","daysSincePush":205,"lastPushedAt":"2025-11-24T01:59:12+00:00"},"risk":{"level":"needs_review","label":"Needs review","requiresReview":true,"notes":["Quality score needs review","Needs review"]},"coverageTags":["Research","RAG and knowledge","rag-knowledge","rag","retrieval","knowledge","awsome-list","awsome-lists"]},"audit":{"audit_score":76,"risk_level":"needs_review","risk_label":"Needs review","warnings":["Quality score needs review"]},"decision":{"readiness_score":69,"readiness_label":"Prototype first","headline":"Fallback candidate for RAG and knowledge","role":"Fallback candidate","primary_fit":"RAG and knowledge","best_for":["RAG and knowledge workflows","Claude Code teams","teams that value GitHub adoption signals"],"risks":["No OpenAgentSkill engagement data yet"],"next_steps":["Install it in a sandbox agent and run one RAG and knowledge task end to end.","Compare output quality, latency, and failure behavior against at least one alternative.","Promote it into production only after reviewing repository permissions, license, and maintenance signals."]},"platforms":["RAG","Claude Code","OpenAI Agents"],"use_cases":[{"slug":"rag-knowledge","title":"RAG and knowledge","url":"https://www.openagentskill.com/use-cases/rag-knowledge"},{"slug":"coding-agents","title":"Coding agents","url":"https://www.openagentskill.com/use-cases/coding-agents"},{"slug":"workflow-automation","title":"Workflow automation","url":"https://www.openagentskill.com/use-cases/workflow-automation"},{"slug":"sports-analytics","title":"Sports analytics","url":"https://www.openagentskill.com/use-cases/sports-analytics"}],"install":"npx skills add onejune2018/Awesome-LLM-Eval","install_targets":[{"id":"openagentskill-cli","label":"CLI","title":"OpenAgentSkill CLI","kind":"command","value":"npx skills add onejune2018/Awesome-LLM-Eval","description":"Use the registry command when your workflow supports the OpenAgentSkill installer.","copyLabel":"Copy command"},{"id":"codex","label":"Codex","title":"Codex install prompt","kind":"agent-prompt","value":"Install the \"Awesome LLM Eval\" agent skill from https://github.com/onejune2018/Awesome-LLM-Eval. Read its SKILL.md or equivalent instructions first, install only the files needed for this workspace, and summarize any required setup before using it. Skill purpose: Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.","description":"Give Codex a repo-aware install prompt when the skill is not available through a local CLI.","copyLabel":"Copy prompt"},{"id":"claude-code","label":"Claude Code","title":"Claude Code skill prompt","kind":"agent-prompt","value":"Add \"Awesome LLM Eval\" as a Claude Code skill from https://github.com/onejune2018/Awesome-LLM-Eval. Inspect the skill instructions, place the reusable skill files in the appropriate local skills location for this project, and report the activation steps. Skill purpose: Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.","description":"Use this prompt to ask Claude Code to add the skill and explain the local activation steps.","copyLabel":"Copy prompt"},{"id":"cursor","label":"Cursor","title":"Cursor rule prompt","kind":"agent-prompt","value":"Turn \"Awesome LLM Eval\" from https://github.com/onejune2018/Awesome-LLM-Eval into a reusable Cursor project rule or agent instruction. Preserve the core workflow, adapt paths to this repo, and keep the rule scoped to tasks where it is relevant. Skill purpose: Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.","description":"Use this when installing as Cursor project rules or reusable agent instructions.","copyLabel":"Copy prompt"}],"repository":"https://github.com/onejune2018/Awesome-LLM-Eval","github_repo":"onejune2018/Awesome-LLM-Eval","version":"1.0.0","license":"MIT","updated_at":"2026-06-15T04:00:43.743103+00:00","canonical_key":"onejune2018/awesome-llm-eval","recommendation_reasons":["Useful GitHub adoption: 642 stars","Install handoff is available","Repository freshness signal is available"],"urls":{"web":"https://www.openagentskill.com/skills/onejune2018-awesome-llm-eval","api":"https://www.openagentskill.com/api/agent/skills/onejune2018-awesome-llm-eval","install_api":"https://www.openagentskill.com/api/skills/onejune2018-awesome-llm-eval/install","audit":"https://www.openagentskill.com/skills/onejune2018-awesome-llm-eval/audit","repository":"https://github.com/onejune2018/Awesome-LLM-Eval"},"meta":{"endpoint":"/api/registry/manifest/{slug}","canonical_agent_endpoint":"/api/agent/skills/onejune2018-awesome-llm-eval","agent_friendly":true,"api_version":"1.0","generated_at":"2026-06-17T12:40:10.902Z"}}