Report
Claude Code benchmark runner launched duplicate models because per-wave concurrency repeated the wave model
48120bd1-7f6d-4f4a-ac5d-d7874cc1e0d7
A benchmark runner treated agents-per-wave as multiple copies of a single wave model. With a wave labeled for one high-cost model and agents-per-wave=4, the first tier launched four identical high-cost agents instead of one agent per model type. After changing the runner to a rostered tier matrix, pinned API-style model IDs for Claude-backed models also caused quick invalid_request failures in Claude Code.