High-intent entry points

Start from the task, not a keyword list.

These shortcuts use the same trust, supply, and relevance signals as the registry API, so humans and agents land on a useful shortlist faster.

Agent-readable index

FinanceStock analysis QuantTrading research SlidesPPT generation DocsPDF parsing ExtractWeb scraping CreativeDesign workflow SportsFootball analytics

Decision filters

Choose by scenario, quality, and trust signals.

Use casePlatform fitQuality tierTrust profileSafety gateGitHub adoption

Showing 1-16 of 828 ranked candidates matching "benchmark"

Best blend of quality, stars, freshness, and agent usage

DynamicMap Benchmark

PROMISING · 66TRUST · 77SAFE · REVIEWEDCODING

The First Dynamic Map Removal Benchmark | Included 8 SOTA methods | Continous updating

$ npx skills add KTH-RPL/DynamicMap_Benchmark

425 stars41 quality77 trustReviewed with permission notesClaude Code10mo since pushNeeds review

QualityUseful candidate, but compare it with alternatives before adopting.

TrustGood trust signals with a few areas worth checking before rollout.Review: Quality score needs review

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

jupyter-notebookrobotics

by KTH-RPLDetailsQuick view

Deep Visual Geo Localization Benchmark

STRONG · 71TRUST · 79SAFE · REVIEWEDCODING

Official code for CVPR 2022 (Oral) paper "Deep Visual Geo-localization Benchmark"

$ npx skills add gmberton/deep-visual-geo-localization-benchmark

256 stars44 quality79 trustReviewed with permission notesClaude Code4mo since pushNeeds review

QualitySolid option that is likely worth shortlisting for production workflows.

TrustGood trust signals with a few areas worth checking before rollout.Review: Quality score needs review

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

pythoncomputer-vision

by gmbertonDetailsQuick view

Speech To Text Benchmark

STRONG · 80TRUST · 79SAFE · REVIEWEDCODING

speech to text benchmark framework

$ npx skills add Picovoice/speech-to-text-benchmark

693 stars51 quality79 trustReviewed with permission notesClaude Code4mo since pushSafe to try

QualitySolid option that is likely worth shortlisting for production workflows.

TrustGood trust signals with a few areas worth checking before rollout.Review: Quality score needs review

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

pythonspeech

by PicovoiceDetailsQuick view

Browsers Benchmark

STRONG · 82TRUST · 80SAFE · REVIEWEDCODING

Browser automation engine benchmark - Test bypass rates, performance & stealth against Cloudflare, DataDome, reCAPTCHA, Kasada, Imperva, Akamai, PerimeterX and other bot…

$ npx skills add techinz/browsers-benchmark

335 stars51 quality80 trustReviewed with permission notesClaude Code + Browser agents30d since pushSafe to try

QualitySolid option that is likely worth shortlisting for production workflows.

TrustGood trust signals with a few areas worth checking before rollout.Review: Quality score needs review

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario Testing and QA · I need my agent to test a web app, reproduce bugs, and verify fixes.

Claude Code + Browser agents · 4 targets

pythonplaywright

by techinzDetailsQuick view

Roboflow 100 Benchmark

NEEDS REVIEW · 51TRUST · 74SAFE · EXPERIMENTALCODING

Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets

$ npx skills add roboflow/roboflow-100-benchmark

299 stars36 quality74 trustExperimentalClaude Code2y since pushNeeds review

QualityInspect the repository carefully before adding it to an agent workflow.Check: Repository looks stale

TrustGood trust signals with a few areas worth checking before rollout.Review: Repository looks stale

Safety gateSparse or mixed signals. Useful for discovery, but not for autonomous installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

jupyter-notebookmachine-learning

by roboflowDetailsQuick view

Street Tryon Benchmark

NEEDS REVIEW · 43TRUST · 71SAFE · EXPERIMENTALDESIGN

[WACV'25] StreetTryOn: A Benchmark for In-the-Wild Virtual Try-On and Cross-Domain Virtual Try-On

$ npx skills add cuiaiyu/street-tryon-benchmark

159 stars34 quality71 trustExperimentalClaude Code2y since pushNeeds review

QualityInspect the repository carefully before adding it to an agent workflow.Check: Repository looks stale

TrustPotentially useful, but at least one trust signal needs human inspection.Review: License is unclear

Safety gateSparse or mixed signals. Useful for discovery, but not for autonomous installation.

Scenario Design and creative · I need my agent to produce design assets, UI directions, presentations, or creative media workflows.

Claude Code + CLI · 4 targets

jupyter-notebookimage-generation

by cuiaiyuDetailsQuick view

BLINK Benchmark

PROMISING · 61TRUST · 77SAFE · REVIEWEDCODING

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

$ npx skills add zeyofu/BLINK_Benchmark

169 stars38 quality77 trustReviewed with permission notesClaude Code10mo since pushNeeds review

QualityUseful candidate, but compare it with alternatives before adopting.

TrustGood trust signals with a few areas worth checking before rollout.Review: Quality score needs review

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

pythoncomputer-vision

by zeyofuDetailsQuick view

Chinese Llm Benchmark

VERIFIEDEXCELLENT · 100TRUST · 87SAFE · REVIEWEDCODING

非线智能 NoneLinear - ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括374个大模型，覆盖chatgpt、gpt-5.4、谷歌gemini-3.1-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3.6-max、qwen3.6-plus、百川、讯飞星火、商汤senseChat…

$ npx skills add jeinlee1991/chinese-llm-benchmark

6.2K stars68 quality87 trustReviewedClaude Code + OpenAI Agents1mo since pushSafe to try

QualityHigh-confidence pick with strong adoption and healthy maintenance signals.

TrustStrong OpenAgentSkill Trust Score across adoption, recent maintenance, license clarity, documentation, dependency/…Review: License is unclear

Safety gateGood audit and safety signals with no high-risk permission hints in public metadata.

Scenario GitHub automation · I need my agent to triage GitHub issues, review pull requests, and summarize repository changes.

Claude Code + OpenAI Agents · 4 targets

llm

by jeinlee1991DetailsQuick view

Deep Text Recognition Benchmark

VERIFIEDSTRONG · 76TRUST · 83SAFE · REVIEWEDRESEARCH

Text recognition (optical character recognition) with deep learning methods, ICCV 2019

$ npx skills add clovaai/deep-text-recognition-benchmark

3.9K stars52 quality83 trustReviewed with permission notesClaude Code2y since pushNeeds review

QualitySolid option that is likely worth shortlisting for production workflows.Check: Repository looks stale

TrustGood trust signals with a few areas worth checking before rollout.Review: Repository looks stale

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario Document processing · I need my agent to read PDFs, extract tables, and turn documents into structured data.

Claude Code + CLI · 4 targets

jupyter-notebookocr

by clovaaiDetailsQuick view

Benchmarks

PROMISING · 55TRUST · 72SAFE · EXPERIMENTALRESEARCH

Benchmarking PDF libraries

$ npx skills add py-pdf/benchmarks

337 stars40 quality72 trustExperimentalClaude Code1y since pushNeeds review

QualityUseful candidate, but compare it with alternatives before adopting.Check: Repository looks stale

TrustGood trust signals with a few areas worth checking before rollout.Review: Repository looks stale

Safety gateSparse or mixed signals. Useful for discovery, but not for autonomous installation.

Scenario Document processing · I need my agent to read PDFs, extract tables, and turn documents into structured data.

Claude Code + CLI · 4 targets

pythonpdf

by py-pdfDetailsQuick view

Powerful Benchmarker

NEEDS REVIEW · 48TRUST · 67SAFE · EXPERIMENTALCODING

A library for ML benchmarking. It's powerful.

$ npx skills add KevinMusgrave/powerful-benchmarker

441 stars37 quality67 trustExperimentalClaude Code2y since pushNeeds review

QualityInspect the repository carefully before adding it to an agent workflow.Check: Repository looks stale

TrustPotentially useful, but at least one trust signal needs human inspection.Review: License is unclear

Safety gateSparse or mixed signals. Useful for discovery, but not for autonomous installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

jupyter-notebookcomputer-vision

by KevinMusgraveDetailsQuick view

Caliper Benchmarks

NEEDS REVIEW · 47TRUST · 75SAFE · EXPERIMENTALCODING

Sample benchmark files for Hyperledger Caliper https://wiki.hyperledger.org/display/caliper

$ npx skills add hyperledger-caliper/caliper-benchmarks

117 stars33 quality75 trustExperimentalClaude Code1y since pushNeeds review

QualityInspect the repository carefully before adding it to an agent workflow.Check: Repository looks stale

TrustGood trust signals with a few areas worth checking before rollout.Review: Repository looks stale

Safety gateSparse or mixed signals. Useful for discovery, but not for autonomous installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

javascriptblockchain

by hyperledger-caliperDetailsQuick view

Mteb

VERIFIEDEXCELLENT · 100TRUST · 89SAFE · VERIFIEDRESEARCH

MTEB: Massive Text Embedding Benchmark

$ npx skills add embeddings-benchmark/mteb

3.3K stars66 quality89 trustVerifiedClaude Code20d since pushSafe to try

QualityHigh-confidence pick with strong adoption and healthy maintenance signals.

TrustStrong OpenAgentSkill Trust Score across adoption, recent maintenance, license clarity, documentation, dependency/…

Safety gateStrong metadata, audit, install, and review signals. Suitable for agent shortlists after normal workspace review.

Scenario RAG and knowledge · I need my agent to build a RAG workflow over documents and retrieve reliable context.

Claude Code + CLI · 4 targets

pythonsemantic-search

by embeddings-benchmarkDetailsQuick view

MMMU

STRONG · 75TRUST · 81SAFE · REVIEWEDCODING

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

$ npx skills add MMMU-Benchmark/MMMU

578 stars46 quality81 trustReviewed with permission notesClaude Code5mo since pushNeeds review

QualitySolid option that is likely worth shortlisting for production workflows.

TrustGood trust signals with a few areas worth checking before rollout.Review: Quality score needs review

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario Coding agents · I need a coding agent that can understand a repository, edit code, and review pull requests.

Claude Code + CLI · 4 targets

pythoncomputer-vision

by MMMU-BenchmarkDetailsQuick view

CLUEDatasetSearch

VERIFIEDSTRONG · 72TRUST · 77SAFE · EXPERIMENTALRESEARCH

搜索所有中文NLP数据集，附常用英文NLP数据集

$ npx skills add CLUEbenchmark/CLUEDatasetSearch

4.5K stars52 quality77 trustExperimentalClaude Code4y since pushNeeds review

QualitySolid option that is likely worth shortlisting for production workflows.Check: Repository looks stale

TrustGood trust signals with a few areas worth checking before rollout.Review: License is unclear

Safety gateSparse or mixed signals. Useful for discovery, but not for autonomous installation.

Scenario RAG and knowledge · I need my agent to build a RAG workflow over documents and retrieve reliable context.

Claude Code + CLI · 4 targets

pythonknowledge-graph

by CLUEbenchmarkDetailsQuick view

On Policy

VERIFIEDSTRONG · 73TRUST · 83SAFE · REVIEWEDCODING

This is the official implementation of Multi-Agent PPO (MAPPO).

$ npx skills add marlbenchmark/on-policy

2.0K stars50 quality83 trustReviewed with permission notesClaude Code2y since pushNeeds review

QualitySolid option that is likely worth shortlisting for production workflows.Check: Repository looks stale

TrustGood trust signals with a few areas worth checking before rollout.Review: Repository looks stale

Safety gateUsable candidate, but the agent should surface permission and audit notes before installation.

Scenario GitHub automation · I need my agent to triage GitHub issues, review pull requests, and summarize repository changes.

Claude Code + CLI · 4 targets

pythonmulti-agent

by marlbenchmarkDetailsQuick view

Page 1

Showing the strongest 16 results to keep the registry fast for humans and agents. Refine by use case, platform, stars, or search query for a narrower shortlist.

Try the agent resolve API

AI Agent Skills Directory

Real skills, grouped by the work your agent needs to finish.

Build the registry by domain, not just by count.

Coding and developer agents

Research and knowledge work

Presentation and deck workflows

Finance and quant workflows

Marketing and growth automation

Design and creative production

Data, BI, and analytics

Legal, policy, and compliance

Education and tutoring

Football and World Cup analytics

Start from the task, not a keyword list.

Choose by scenario, quality, and trust signals.

DynamicMap Benchmark

Deep Visual Geo Localization Benchmark

Speech To Text Benchmark

Browsers Benchmark

Roboflow 100 Benchmark

Street Tryon Benchmark

BLINK Benchmark

Chinese Llm Benchmark

Deep Text Recognition Benchmark

Benchmarks

Powerful Benchmarker

Caliper Benchmarks

Mteb

MMMU

CLUEDatasetSearch

On Policy