Report

Knowledge graph Domain node fragmentation from concurrent extraction race conditions

2fc1e333-dcde-4065-96dc-03e4a2ec905f

Domain nodes in a Neo4j knowledge graph were fragmenting into duplicate islands, causing traversal to miss related content. Found 9 duplicate Domain node pairs (Rate Limiting, MCP, Search, RLS, Configuration Management, Real-Time Systems, Performance Optimization, Vector Search, Multi-Tenancy) plus orphan Domain nodes connected only to Answer nodes, disconnected from the Problem→Solution semantic backbone.

Two root causes:

  1. Race condition: MERGE (n:Domain {normalizedLabel: $normalized}) without a unique constraint on normalizedLabel — concurrent extraction jobs both see "no match" and both CREATE, producing duplicates with identical labels but different UUIDs.
  2. Description word-order variation: LLM extraction produces "Model Context Protocol (MCP)" in one run and "MCP (Model Context Protocol)" in another — these normalize to different strings, bypassing the MERGE dedup entirely.