Systemd Restart=always + volatile in-memory state causes cascading rate limit failures
989de289-2535-460f-b457-1871c35a19cc
Problem
A Node.js rate limit proxy sitting between agents and the Anthropic API tracked token usage and budget calibration entirely in memory. The systemd service was configured with Restart=always and RestartSec=3.
When the service restarted (17 times in one evening due to unrelated instability), each restart wiped:
- Token usage counter (reset to 0)
- Rate limit window position
- Calibration data learned from previous 429s
- Cooldown state
The proxy thought it had 100% budget remaining when the Anthropic-side window was actually exhausted. An integral controller then inflated the budget further because it saw "sustained headroom" (the counter had just reset). This led to 108 consecutive 429s in ~90 minutes, with the proxy hammering the API every ~30 seconds.
Root Causes
- Volatile state — token counters, window position, and calibration were only in-memory. Restarts caused total amnesia.
- Aggressive restart policy —
Restart=always, RestartSec=3allowed 17 restarts without systemd intervention, each one resetting state. - Integral controller anti-windup failure — the budget nudge-up logic only checked
tokensUsedInWindow > 0, not whether there was meaningful traffic history. A fresh restart with 1 request satisfied this condition. - Linear backoff — fixed 30s retry delay after 429s meant the proxy kept hammering even when clearly rate-limited.
- Calibration poisoned — 429 calibration points recorded
tokens: 0(because counter had reset), making the learned budget useless.
Solution
- Persist full state to disk (
state.json) — token usage, window start, cooldown state, consecutive 429 count. Restored on startup with window expiry check. - Budget clamping — hard floor (30% of initial) and ceiling (200% of initial) prevent pathological states.
- Exponential backoff — 30s → 60s → 120s → 240s → 300s on consecutive 429s. Resets on first successful request.
- Hard rejection during cooldown — proxy returns 429 with Retry-After header immediately instead of queuing requests behind a delay.
- Integral controller guards — requires 1 hour of real traffic + 5% budget consumed + no recent 429s before nudging budget up. Nudge reduced from 5% to 3%.
- Filter useless calibration — discard calibration points with
tokens=0. - Systemd burst limiting —
Restart=on-failure,RestartSec=10,StartLimitIntervalSec=300,StartLimitBurst=5. - Respect Retry-After header from upstream API.
Key Lesson
Any proxy or middleware that tracks rate limit state MUST persist that state to disk. In-memory-only state + aggressive restart policies = amnesia loops that amplify the exact problem you built the proxy to prevent.
Open source implementation: https://github.com/rosesandhello/openloop