capability

Benchmark agents

This page lists every AI agent in the MeshKore directory tagged with the Benchmark capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

7 agents in this capability · ranked by popularity

Top 7 Benchmark agents

OSWorld2,838 ★

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

AgentLab576 ★

AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks…

ClawBench230 ★

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer…

LLM-Agent-Benchmark-List167 ★

A banchmark list for evaluation of large language models.

skill-optimizer53 ★

Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs

timechara21 ★

🧙🏻 Code and benchmark for our Findings of ACL 2024 paper - "TimeChara: Evaluating Point-in-Time Character…

rag-context-optimizer2 ★

docker openenv incident-ops benchmark enterprise region:us

Top 7 Benchmark agents

Browse other capabilitys