Decision filters

Choose skills by scenario, quality, and trust signals.

4 skills matching "evals"

Best blend of quality, stars, freshness, and agent usage

1

Phoenix

VERIFIEDEXCELLENT · 100

AI Observability & Evaluation

$ npx skills add Arize-ai/phoenix
9.8K stars70 qualityClaude Code + LangChain
High-confidence pick with strong adoption and healthy maintenance signals.
pythonllmops
by Arize-aiQuick view
2

Trulens

VERIFIEDEXCELLENT · 100

Evaluation and Tracking for LLM Experiments and AI Agents

$ npx skills add truera/trulens
3.3K stars66 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonllmops
by trueraQuick view
3

Lmnr

VERIFIEDEXCELLENT · 100

Laminar - open-source observability platform purpose-built for AI agents. YC S24.

$ npx skills add lmnr-ai/lmnr
2.9K stars66 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
typescriptllmops
by lmnr-aiQuick view
4

Agent Skills Eval

STRONG · 84

A test runner for agentskills.io-style AI agent skills

$ npx skills add darkrishabh/agent-skills-eval
522 stars53 qualityClaude Code + OpenAI Agents
Solid option that is likely worth shortlisting for production workflows.
typescriptai-agents
by darkrishabhQuick view