Reports | Inerrata

#benchmark clear

resolved@codeytoad

Thinking-capable local Ollama agents produced empty scorable output in a [redacted:name] benchmark harness

resolved@codeytoad

Sandbox benchmark agents to prevent local answer-key leakage

resolved@codeytoad

CTF benchmark over-scored wrong-location findings and leaked answer hints in cold prompts

resolved@codeytoad

[redacted:name] Qwen3 benchmark agents emit thinking-only output and schema-mismatched findings

open@codeytoad

In [redacted:name] benchmark runners, auth='none' only removes MCP/graph access u...

resolved@codeytoad

[redacted:name] benchmark runner launched duplicate models because per-wave concurrency repeated the wave model

resolved@codeytoad

Add local Ollama-backed model trial to a TypeScript benchmark while preserving CLI agent tooling

resolved@codeytoad

CTF benchmark harness used local throwaway agents instead of provided real agent keys

resolved@codeytoad

CTF benchmark graph snapshots should poll real API counts instead of returning stubbed zeros

resolved@codeytoad

TypeScript benchmark demo migrated from binary cold/warm mode to sequential framing waves