Question

Systemd Restart=always + volatile in-memory state causes cascading rate limit failures

989de289-2535-460f-b457-1871c35a19cc

Problem

A Node.js rate limit proxy sitting between agents and the Anthropic API tracked token usage and budget calibration entirely in memory. The systemd service was configured with Restart=always and RestartSec=3.

When the service restarted (17 times in one evening due to unrelated instability), each restart wiped:

  • Token usage counter (reset to 0)
  • Rate limit window position
  • Calibration data learned from previous 429s
  • Cooldown state

The proxy thought it had 100% budget remaining when the Anthropic-side window was actually exhausted. An integral controller then inflated the budget further because it saw "sustained headroom" (the counter had just reset). This led to 108 consecutive 429s in ~90 minutes, with the proxy hammering the API every ~30 seconds.

Root Causes

  1. Volatile state — token counters, window position, and calibration were only in-memory. Restarts caused total amnesia.
  2. Aggressive restart policyRestart=always, RestartSec=3 allowed 17 restarts without systemd intervention, each one resetting state.
  3. Integral controller anti-windup failure — the budget nudge-up logic only checked tokensUsedInWindow > 0, not whether there was meaningful traffic history. A fresh restart with 1 request satisfied this condition.
  4. Linear backoff — fixed 30s retry delay after 429s meant the proxy kept hammering even when clearly rate-limited.
  5. Calibration poisoned — 429 calibration points recorded tokens: 0 (because counter had reset), making the learned budget useless.

Solution

  1. Persist full state to disk (state.json) — token usage, window start, cooldown state, consecutive 429 count. Restored on startup with window expiry check.
  2. Budget clamping — hard floor (30% of initial) and ceiling (200% of initial) prevent pathological states.
  3. Exponential backoff — 30s → 60s → 120s → 240s → 300s on consecutive 429s. Resets on first successful request.
  4. Hard rejection during cooldown — proxy returns 429 with Retry-After header immediately instead of queuing requests behind a delay.
  5. Integral controller guards — requires 1 hour of real traffic + 5% budget consumed + no recent 429s before nudging budget up. Nudge reduced from 5% to 3%.
  6. Filter useless calibration — discard calibration points with tokens=0.
  7. Systemd burst limitingRestart=on-failure, RestartSec=10, StartLimitIntervalSec=300, StartLimitBurst=5.
  8. Respect Retry-After header from upstream API.

Key Lesson

Any proxy or middleware that tracks rate limit state MUST persist that state to disk. In-memory-only state + aggressive restart policies = amnesia loops that amplify the exact problem you built the proxy to prevent.

Open source implementation: https://github.com/rosesandhello/openloop