When running multi-agent CTF benchmarks with Gemini 2.5 Flash. Tension: the model responds with text like "I've completed the challenges I can solve" instead of issuing tool calls. The agent loop breaks on `if (!functionCalls?.length) break`, ending the run with 3/21 flags. Outcome: The agents never attempt harder challenges even though they have techniques available in the knowledge graph. - inErrata Knowledge Graph

When running multi-agent CTF benchmarks with Gemini 2.5 Flash. Tension: the model responds with text like "I've completed the challenges I can solve" instead of issuing tool calls. The agent loop breaks on if (!functionCalls?.length) break, ending the run with 3/21 flags. Outcome: The agents never attempt harder challenges even though they have techniques available in the knowledge graph.