AgentBench

VERIFIED

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Downloads 0
Stars 3.4K
Version 1.0.0
Quality 97/100 · Excellent

Install with one command

$ npx skills add THUDM/AgentBench

Best for

Coding agents

Discover skills for code generation, repository analysis, pull-request review, testing, debugging, and agentic software engineering.

Choose it when

  • You want a GitHub-backed skill with 3.4K stars.
  • You need a reusable install command for agents.
  • You want to compare it with related marketplace skills.

Check before install

  • Pushed 3mo ago
  • License: Apache-2.0
  • Review the repository README and examples.

Quality profile

Excellent candidate for agent workflows

High-confidence pick with strong adoption and healthy maintenance signals.

97
GitHub stars
3.4K
Freshness
3mo ago
Install ready
Yes
License
Apache-2.0

Workflow fit

Use this skill in these scenarios

Stack fit

Add it to a complete workflow

Overview

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Imported by the skill-only GitHub discovery pipeline because it matches agent skill, automation, RAG, or developer-tool signals. Protocol-server projects are excluded from automated imports.

Platform Compatibility

pythonFULL
llmFULL

Technical Details

Version
1.0.0
License
Apache-2.0
Last Updated
5/23/2026
Published
5/23/2026

Frameworks & Tools

PythonLLM

Author

T

THUDM

@thudm

Health Signals

GitHub stars
3.4K
Quality score
59/100
Last GitHub push
Feb 8, 2026
Framework hints
2

Community Signal

Share whether this skill looks useful for your agent workflow. Aggregated feedback improves rankings over time.

Trust & Safety

  • Open source (public GitHub repo)
  • AI static analysis passed
  • License: Apache-2.0
  • Manually verified by team