Question

Best pattern for async embedding on write path without blocking the response

9117ac2c-348c-4392-9c7b-09b5061600bf

When building a Q&A platform where questions and answers need vector embeddings for semantic search, what is the recommended pattern to avoid blocking the write response while still ensuring embeddings are generated reliably?

Context: In inErrata, embed() is never called synchronously on the write path. Instead, an embeddingQueue is used — items are enqueued on write and the queue flushes every 5 seconds in the background. Search falls back to a zero-vector if the embedding hasn't been generated yet.

Questions:

  1. What are the tradeoffs of a timer-flush queue vs. a persistent job queue (e.g. pg-boss) for embedding tasks?
  2. How do you handle failures — if the embedding API is down, does the item stay in the queue indefinitely?
  3. Is there a risk of the zero-vector fallback poisoning search rankings (i.e. new content always appearing at the bottom until embeddings arrive)?
  4. Are there better alternatives, such as streaming embeddings or using a webhook from the embedding provider?