Alternatives

Promptfoo alternatives for AI agents.

Compare similar skills by workflow fit, trust score, quality, GitHub adoption, maintenance, and install readiness.

Current skill

Promptfoo

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

100
Quality
100
Trust
22K
Stars
#1

Opik

Similarity 136Trust 100Excellent 100

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

19K starsJun 5, 2026 pushdevelopmentPythonLLMOps
$ npx skills add comet-ml/opik
#2

Giskard Oss

Similarity 134Trust 100Excellent 100

🐒 Open-Source Evaluation & Testing library for LLM Agents

5.4K starsJun 5, 2026 pushdevelopmentPythonLLMOps
$ npx skills add Giskard-AI/giskard-oss
#3

Helicone

Similarity 132Trust 100Excellent 100

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 πŸ“

5.8K starsMay 18, 2026 pushdevelopmentTypeScriptLLMOps
$ npx skills add Helicone/helicone
#4

Agenta

Similarity 131Trust 100Excellent 100

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

4.2K starsJun 9, 2026 pushdevelopmentTypeScriptLLMOps
$ npx skills add Agenta-AI/agenta
#5

Langwatch

Similarity 131Trust 100Excellent 100

The platform for LLM evaluations and AI agent testing

3.3K starsJun 6, 2026 pushdevelopmentTypeScriptLLMOps
$ npx skills add langwatch/langwatch
#6

Mlflow

Similarity 128Trust 100Excellent 100

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

26K starsJun 5, 2026 pushdevelopmentPythonLLMOps
$ npx skills add mlflow/mlflow
#7

Lmnr

Similarity 123Trust 100Excellent 100

Laminar - open-source observability platform purpose-built for AI agents. YC S24.

3.0K starsJun 6, 2026 pushdevelopmentTypeScriptLLMOps
$ npx skills add lmnr-ai/lmnr
#8

Rig

Similarity 118Trust 100Excellent 100

βš™οΈπŸ¦€ Build modular and scalable LLM Applications in Rust

7.6K starsJun 9, 2026 pushdevelopmentRustLLMOps
$ npx skills add 0xPlaygrounds/rig
#9

Coze Loop

Similarity 118Trust 100Excellent 100

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.

5.5K starsJun 6, 2026 pushdevelopmentGoLLMOps
$ npx skills add coze-dev/coze-loop
#10

Bisheng

Similarity 117Trust 100Excellent 100

BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation, SFT, Dataset Management, Enterprise-level System Management, Observability and more.

11K starsJun 9, 2026 pushdevelopmentTypeScriptLLMOps
$ npx skills add dataelement/bisheng
#11

Observal

Similarity 116Trust 100Excellent 100

Observal is a unified platform for agent distribution, observability and insights.

2.1K starsJun 9, 2026 pushdevelopmentPythonLLMOps
$ npx skills add BlazeUp-AI/Observal
#12

RagaAI Catalyst

Similarity 112Trust 100Excellent 100

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

16K starsFeb 11, 2026 pushdevelopmentPythonLLMOps
$ npx skills add raga-ai-hub/RagaAI-Catalyst
#13

Metaflow

Similarity 111Trust 100Excellent 100

Build, Manage and Deploy AI/ML Systems

10K starsJun 5, 2026 pushdevelopmentPythonLLMOps
$ npx skills add Netflix/metaflow
#14

Phoenix

Similarity 111Trust 100Excellent 100

AI Observability & Evaluation

10K starsJun 9, 2026 pushdevelopmentPythonLLMOps
$ npx skills add Arize-ai/phoenix
#15

Evidently

Similarity 110Trust 100Excellent 100

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

7.6K starsMay 2, 2026 pushdevelopmentJupyter NotebookLLMOps
$ npx skills add evidentlyai/evidently
#16

Plano

Similarity 110Trust 100Excellent 100

Plano is an AI-native proxy and data plane for agentic apps β€” with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.

6.6K starsJun 9, 2026 pushdevelopmentRustLLMOps
$ npx skills add katanemo/plano

How to choose

When should you switch?

Use an alternative when it has a clearer install path, higher trust score, fresher maintenance, or better platform fit for your current agent stack. Keep Promptfoo if it already passes your workflow test and repository review.

Next step

Compare top candidates side by side

Open the compare page, test the install commands in a sandbox, and check each repository before using a skill in production.