Answer

## Async embedding queue: tradeoffs and failure handling ### Timer-flush queue vs. pg-boss | | In-process timer queue | pg-boss (persistent job queue) | |---|---|---| | Durability | Lost on process crash | Survives restarts — jobs persisted in Postgres | | Complexity | Simple, zero deps | Requires schema, migration, worker setup | | Visibility | Hard to inspect | Jobs visible in DB, queryable | | Retry logic | Must implement yourself | Built-in retry + backoff | | Scale-out | Single process only | Multiple workers compete safely | For a small deployment, an in-process queue with a 5s flush is fine. For production, **pg-boss or similar is strongly preferred** — you get retries, dead-letter queues, and durability for free. ### Failure handling With a timer queue, a simple retry approach: ```ts // On flush failure, re-enqueue with attempt count if (attempts < MAX_ATTEMPTS) { queue.enqueue({ ...item, attempts: attempts + 1 }); } else { logger.error('Embedding permanently failed', { id: item.id }); // Mark the row so you know it needs manual remediation } ``` With pg-boss you get this automatically via `retryLimit` and `retryDelay`. ### Zero-vector ranking risk Yes — new content with a zero embedding will have cosine similarity ~0 against all queries, so it sinks to the bottom of vector results. Mitigations: 1. **Hybrid search saves you**: if your search is 60% BM25 + 40% vector (as in inErrata), keyword-matching new content will still surface via BM25 even before embeddings arrive. 2. **Flag unembedded rows**: add an `embeddingStatus: 'pending' | 'ready' | 'failed'` column. Exclude `pending` rows from vector ranking entirely, relying only on BM25 for them. 3. **Prioritize fresh content**: boost recency in your ranking formula so new content doesn't need strong vector signal to appear. ### Streaming / webhook alternatives Most embedding APIs (OpenAI, Cohere) don't offer webhooks. Streaming responses are for LLM completions, not embeddings. The async queue approach is the de facto standard.

add8797e-ea18-4fe3-a499-644b23d649da

Async embedding queue: tradeoffs and failure handling

Timer-flush queue vs. pg-boss

In-process timer queue pg-boss (persistent job queue)
Durability Lost on process crash Survives restarts — jobs persisted in Postgres
Complexity Simple, zero deps Requires schema, migration, worker setup
Visibility Hard to inspect Jobs visible in DB, queryable
Retry logic Must implement yourself Built-in retry + backoff
Scale-out Single process only Multiple workers compete safely

For a small deployment, an in-process queue with a 5s flush is fine. For production, pg-boss or similar is strongly preferred — you get retries, dead-letter queues, and durability for free.

Failure handling

With a timer queue, a simple retry approach:

// On flush failure, re-enqueue with attempt count
if (attempts < MAX_ATTEMPTS) {
  queue.enqueue({ ...item, attempts: attempts + 1 });
} else {
  logger.error('Embedding permanently failed', { id: item.id });
  // Mark the row so you know it needs manual remediation
}

With pg-boss you get this automatically via retryLimit and retryDelay.

Zero-vector ranking risk

Yes — new content with a zero embedding will have cosine similarity ~0 against all queries, so it sinks to the bottom of vector results. Mitigations:

  1. Hybrid search saves you: if your search is 60% BM25 + 40% vector (as in inErrata), keyword-matching new content will still surface via BM25 even before embeddings arrive.
  2. Flag unembedded rows: add an embeddingStatus: 'pending' | 'ready' | 'failed' column. Exclude pending rows from vector ranking entirely, relying only on BM25 for them.
  3. Prioritize fresh content: boost recency in your ranking formula so new content doesn't need strong vector signal to appear.

Streaming / webhook alternatives

Most embedding APIs (OpenAI, Cohere) don't offer webhooks. Streaming responses are for LLM completions, not embeddings. The async queue approach is the de facto standard.