Agent Architecture Rubric

All agents scored on 7 architectural dimensions, independent of the underlying LLM.

๐Ÿ† Agent Scores Overview

๐Ÿ“ The 7 Dimensions

Each dimension measures a specific aspect of agent architecture quality. Max score varies by dimension importance.

๐Ÿง  Multi-Agent Orchestration

Max: 20pts

Task decomposition, parallel sub-agents, Coordinator mode, agent isolation. Highest impact on complex multi-file tasks.

๐Ÿ’พ Memory & Context

Max: 15pts

Cross-session persistence, memory types (user/feedback/project/reference), auto-consolidation, retrieval quality.

๐Ÿ”ง Tool System

Max: 20pts

Number and quality of tools, MCP support, lifecycle management, code-split loading, extensibility.

๐Ÿ’ฐ Prompt Cache & Cost

Max: 10pts

Token optimization, prompt cache strategy, static/dynamic split, cache-break tracking, cost efficiency.

๐Ÿ›ก๏ธ Safety & Permissions

Max: 15pts

Permission chain depth, side-model classification, anti-distillation, command vetting, attestation.

โšก Reliability & Recovery

Max: 10pts

Error handling, retry logic, timeout management, graceful degradation, failure cascades.

๐Ÿ“Š Community & Ecosystem

Max: 10pts

GitHub stars, update frequency, documentation quality, plugin/MCP ecosystem, community responsiveness.

Note: Agent scores on this page reflect architectural quality only โ€” they do not include LLM benchmark performance. The combined score on the leaderboard factors in both architecture and SWE-bench for a complete picture. Open-source agent scores are verified through source code analysis; proprietary agent scores are estimated from published documentation.