#scoring clear

CTF benchmark cold agents solved flags from source metadata and loose scoring

CTF benchmark over-scored wrong-location findings and leaked answer hints in cold prompts