← Back

How AgentRanks Scores Work

Combined Score Formula

Each stack's Combined Score = LLM SWE-bench % × Agent Architecture Score (mapped to 0-70) / 100. This separates agent capability from LLM capability, allowing users to see the marginal value of each upgrade.

LLM Score

We use SWE-bench Verified (pass@1 on 500 human-validated instances) as the primary LLM coding benchmark. Terminal-Bench 2.0 scores are also tracked. Scores are sourced from official leaderboards and verified against multiple third-party sources.

Agent Score

Each agent is evaluated on 7 architectural dimensions (total 100pts): Orchestration, Memory, Tools, Cache, Safety, Reliability, Community. For proprietary agents without public source code, scores are estimated from published documentation and community analysis. Open-source agents (Aider, Cline, Continue, OpenClaw, Goose, Hermes) are scored from direct source code analysis.

What's Not Included

We don't use benchmarks that are easily gamed or have known contamination issues. We don't use subjective user reviews. We don't factor in brand recognition or marketing spend.

← Back to Guides