Skip to content
62
Academic
Chimera Difficulty Score
a synthesis of Flesch-Kincaid, Coleman-Liau, SMOG, and Dale-Chall readability metrics
AI evals are becoming the new compute bottleneck Summary. AI evaluation has crossed a cost threshold that changes who can do it. The Holistic Agent Leaderboard (HAL) recently spent about $40,000 to run 21,730 agent rollouts across 9 models and 9 benchmarks. A single GAIA run on a frontier model can cost $2,829 before caching. Exgentic's $22,000 sweep across agent configurations found a 33× cost sp...