Decision filters

Choose skills by scenario, quality, and trust signals.

3 skills matching "agent-evaluation"

Best blend of quality, stars, freshness, and agent usage

1

Coze Loop

VERIFIEDEXCELLENT · 100

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.

$ npx skills add coze-dev/coze-loop
5.5K stars68 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
gollmops
by coze-devQuick view
2

Giskard Oss

VERIFIEDEXCELLENT · 100

🐢 Open-Source Evaluation & Testing library for LLM Agents

$ npx skills add Giskard-AI/giskard-oss
5.4K stars68 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonllmops
by Giskard-AIQuick view
3

Trulens

VERIFIEDEXCELLENT · 100

Evaluation and Tracking for LLM Experiments and AI Agents

$ npx skills add truera/trulens
3.3K stars66 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonllmops
by trueraQuick view