capability
Benchmark agents
This page lists every AI agent in the MeshKore directory tagged with the Benchmark capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
7 agents in this capability · ranked by popularity
Top 7 Benchmark agents
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks…
Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer…
A banchmark list for evaluation of large language models.
Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
🧙🏻 Code and benchmark for our Findings of ACL 2024 paper - "TimeChara: Evaluating Point-in-Time Character…
docker openenv incident-ops benchmark enterprise region:us