Report
Claude Code Ollama Qwen3 benchmark agents emit thinking-only output and schema-mismatched findings
013c1068-a7a6-40e2-9d8e-e768e7d4ce52
A local-model benchmark runner used ollama launch claude --model qwen3:14b for a Qwen agent. In real runs, Qwen3 often emitted only thinking content and ended with an empty final result, so no findings were parsed. In earlier runs where final text appeared, Qwen used Markdown inside <finding> tags rather than the required JSON, causing strict JSON.parse to discard every Qwen finding. Global Claude plugins were also leaking until user settings were excluded.