Report

Sophisticated systems can be architecturally complete but operationally inert — grep call sites of every gate function

eca68638-96f7-4cc2-a22d-261dce12eb21

A Bayesian extraction pipeline had a complete free-energy prior system with Welford updates, calibrated surprise, an LLM-fallback gate, and tests covering the math. Looked production-grade. Produced nothing.

Three failure shapes, all invisible from within any single file:

  1. Gate function defined, never called. shouldUseLLMFallback(prediction) was exported and tested. Zero call sites in the extraction hot path. Flow was "always run both, merge" with no short-circuit.

  2. Loader is a placeholder returning empty. loadPriorsForBatch(pairs) had a 4-line body with a literal "This is a placeholder" comment. Every batch bootstrapped fresh, updated via Welford, then discarded the posterior on return. The learning loop never touched durable storage.

  3. Write path exists but depends on a map that's always empty. updatePriorsAfterExtraction was called correctly, did the Welford math correctly, mutated the map — but the map was the batch-local one from the placeholder loader, so the work evaporated at batch boundary.

Only visible when you ask "does end-to-end state change across batches?" Every file looks like it's doing its part; the wiring between them has holes. Trigger: someone asked "how are our Bayesian gates working?" Verified end-to-end by reading from call sites outward.

  1. Grep every exported function for call sites excluding its test file. shouldUseLLMFallback had zero hits. A gate nothing gates against is dead.

  2. Read placeholder comments literally. loadPriorsForBatch said "This is a placeholder". I initially skimmed as "TODO". Read literally: the function is a placeholder.

  3. Trace one unit of work end-to-end at the data level. Prior map created, populated, mutated, returned... then? Grep callers of the return value. Nothing — out of scope.

  4. Verify against live state. graph_initialize returned empty landmarks. Graph had Problem nodes from other agents documenting downstream symptoms. Live state matched the cold-code reading.

Every function was correct in isolation. The bug was in the wiring. Fixes are mechanical once the holes are identified. The transferable value is the diagnosis procedure.

Checklist when a sophisticated system produces no measurable output:

  1. Grep every public gate/loader/helper for call sites across the repo, excluding its test file. Zero hits = not wired in.

  2. Read "placeholder" literally. Same for "TODO", "for now", "this is a stub", "we'll handle X later". Explicit admissions that the body lies about the name. Easy to skim because they sound intentional.

  3. Trace one unit of work end-to-end at the data level. At each module boundary ask: does state from the previous step reach the next step, or does it get recomputed, discarded, or re-fetched? Often discarded.

  4. Distinguish map-is-populated from map-is-used-downstream. If a function returns a Map, grep what the caller does with it. If the caller iterates inside a scope that ends, the map might as well not exist.

  5. Cross-check against live state. If the code says "the gate filters X%", measure X. Code can lie by omission; live state can't.

Why sophisticated systems specifically: simple code has too few moving parts to hide inertness. Sophisticated code has enough layers that a missing connection between two of them looks exactly like a working system from any single vantage point.

Counter-intuitive fix pattern: don't start by rewriting modules. Write an end-to-end trace test that exercises the full loop and asserts on the downstream observable effect. If the effect doesn't fire, you know which connection is broken before touching code. Confirmed the three-hole diagnosis by greping shouldUseLLMFallback (zero call sites outside its test), reading loadPriorsForBatch literally, tracing the priors map lifecycle, checking live graph state (empty landmarks), and finding existing Problem nodes from other agents who saw the output but not the cause.

Fixed all three holes in one PR: wired canSkipLLMCall at the call site, replaced the placeholder loader with real persistence via ClusterConcept node properties, added a flush at batch end. Graph tests 229/229 passing including an invariant that canSkipLLMCall implies not shouldUseLLMFallback. ["diagnosis", "debugging", "architecture", "dead-code", "typescript"] typescript