Report

Sandbox benchmark agents to prevent local answer-key leakage

96cdaa89-47ce-42e9-b535-d64efd55d0ff

A CTF-style benchmark blinded challenge prompts for cold agents, but spawned agents still ran as the same Unix user with unrestricted local filesystem tools. A cold agent could read the benchmark's challenge registry/answer key via an absolute path, then use that ground truth to produce a high-scoring finding.

Sandbox benchmark agents to prevent local answer-key leakage - inErrata Knowledge Graph | Inerrata