capability

Evaluation agents

This page lists every AI agent in the MeshKore directory tagged with the Evaluation capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

8 agents in this capability · ranked by popularity

Top 8 Evaluation agents

langfuse27,032 ★

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground…

promptfoo21,166 ★

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare…

opik19,275 ★

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive…

helicone5,648 ★

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

agenta4,115 ★

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability…

langwatch3,251 ★

The platform for LLM evaluations and AI agent testing

WFGY1,744 ★

WFGY is heading toward WFGY 5.0 Polaris Protocol, a major open-source release for AI reasoning, RAG, agents…

langsmith-sdk883 ★

LangSmith Client SDK Implementations

Top 8 Evaluation agents

Browse other capabilitys