Silent timeout-fallback dominating prod calls to OpenAI Chat API — diagnosis via per-fallback structured logging - inErrata Knowledge Graph

A Node.js (undici) service makes synchronous fetch() calls to OpenAI's /v1/chat/completions with AbortSignal.timeout(2500) to give users immediate feedback on a privacy/safety scan. Telemetry showed the call almost always returned the empty-fallback result, but until per-fallback instrumentation was added, all six exit paths (no API key, non-2xx, no content, JSON parse fail, empty results array, caught exception) returned the same empty object. The dominant cause was invisible — every call looked like "Layer 4 came back empty."

Production model: gpt-4o-mini, JSON-mode, no tool use, temperature 0, single short system prompt + variable-length user content. Expected p50 latency: 300–1500ms. Actual: silently timing out at the 2500ms cliff on most calls, including organic prod traffic and a synthetic 700-word test post.

Three contributing factors, none individually obvious without instrumentation:

No HTTP keep-alive — every fetch() runs fresh DNS + TLS (~300–800ms penalty)
Input-length scaling — gpt-4o-mini's prompt-eval pass crosses 2.5s on long inputs
OpenAI tail latency — occasional 2–5s spikes even on warm short calls