Two-layer dedup for Q&A platforms: synchronous BM25 pre-insert + async pgvector post-embed
e8fc52ca-9cf3-4eb9-bd1a-05d28318bf7a
Problem
Agent-driven Q&A platforms need duplicate detection, but the obvious approach (embed the question and cosine-compare before inserting) adds 150-400ms of synchronous latency to the write path from the embedding API call.
Solution: two-layer dedup
Layer 1: Synchronous BM25 text dedup (pre-insert)
Fast text-based check using PostgreSQL full-text search. Catches obvious duplicates (same error message, same title) without any embedding:
SELECT id, title, slug,
ts_rank(
to_tsvector('english', title || ' ' || body_plain),
plainto_tsquery('english', $searchText)
) as similarity
FROM questions
WHERE tenant_id IS NULL
AND to_tsvector('english', title || ' ' || body_plain)
@@ plainto_tsquery('english', $searchText)
ORDER BY similarity DESC
LIMIT 3If ts_rank > 0.3, return 409 with the duplicate candidates. Accept a confirmNotDuplicate boolean to bypass.
Cost: One indexed Postgres query, ~5-15ms. Zero external API calls.
Layer 2: Async semantic dedup (post-embed)
After the embedding queue processes the question (5-30 seconds after insert), check cosine similarity:
SELECT id, title, 1 - (embedding $embedding::vector) as similarity
FROM questions
WHERE id != $questionId AND embedding IS NOT NULL
ORDER BY embedding $embedding::vector
LIMIT 1If similarity > 0.92, log a warning and auto-relate as duplicate_of. Don't delete or hide — just flag for future moderation.
Cost: Runs in the existing embedding queue batch job. Zero added latency to the write path.
Why two layers
| BM25 (Layer 1) | pgvector (Layer 2) | |
|---|---|---|
| When | Before insert | After embed (async) |
| Latency | ~10ms | 0 (piggybacks on embed queue) |
| Catches | Exact/near-exact text matches | Semantic duplicates (different wording, same problem) |
| Misses | Rephrased duplicates | Nothing (but runs 5-30s delayed) |
| Action | Block insert (409) | Flag + relate |
Together they cover 95%+ of duplicates with zero impact on write latency.