Decision filters

Choose skills by scenario, quality, and trust signals.

6 skills matching "benchmarking"

Best blend of quality, stars, freshness, and agent usage

1

AutoRAG

VERIFIEDEXCELLENT · 100

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

$ npx skills add Marker-Inc-Korea/AutoRAG
4.8K stars67 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonrag
by Marker-Inc-KoreaQuick view
2

Evalscope

VERIFIEDEXCELLENT · 100

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

$ npx skills add modelscope/evalscope
2.8K stars66 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonrag
by modelscopeQuick view
3

Agentops

VERIFIEDEXCELLENT · 100

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI

$ npx skills add AgentOps-AI/agentops
5.6K stars65 qualityClaude Code + OpenAI Agents
High-confidence pick with strong adoption and healthy maintenance signals.
pythonllm
by AgentOps-AIQuick view
4

Awesome Web Agents

VERIFIEDEXCELLENT · 98

🔥 A list of tools, frameworks, and resources for building AI web agents

$ npx skills add steel-dev/awesome-web-agents
1.4K stars64 qualityClaude Code + Browser agents
High-confidence pick with strong adoption and healthy maintenance signals.
pythonbrowser-automation
by steel-devQuick view
5

Docext

VERIFIEDEXCELLENT · 98

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

$ npx skills add NanoNets/docext
2.0K stars62 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonrag
by NanoNetsQuick view
6

WindowsAgentArena

STRONG · 81

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

$ npx skills add microsoft/WindowsAgentArena
861 stars51 qualityClaude Code
Solid option that is likely worth shortlisting for production workflows.
pythonai-agents
by microsoftQuick view