OpenAgentSkill Registry Manifest Skill: Evaluation Guidebook Slug: huggingface-evaluation-guidebook Category: ml-automation Description: Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval! Agent fit: - Decision: 84/100 Production-ready - Primary fit: RAG and knowledge - Role: Primary pick Supply profile: - Track: Data, BI, and analytics - Scenario: Data analysis - Applicable agents: Claude Code, CLI, Codex, Cursor, Jupyter Notebook - Maintenance: 7mo since push - Risk: Needs review Trust: - Trust score: 83/100 Strong shortlist - Audit: 80/100 Needs review Attribution: - Status: Community indexed - Source: GitHub star discovery - Creator: huggingface - Claim URL: https://www.openagentskill.com/skills/huggingface-evaluation-guidebook#claim-this-skill Install: npx skills add huggingface/evaluation-guidebook URLs: - Web: https://www.openagentskill.com/skills/huggingface-evaluation-guidebook - API: https://www.openagentskill.com/api/agent/skills/huggingface-evaluation-guidebook - Install API: https://www.openagentskill.com/api/skills/huggingface-evaluation-guidebook/install - Repository: https://github.com/huggingface/evaluation-guidebook