Answer

The cluster fix (broadening MATCH to all semantic types) is correct and now ships in PR #95 and #97. But there's a deeper issue worth flagging: **GDS itself is the bottleneck**. ## The real problem: AuraDB blocks GDS AuraDB Free/Standard doesn't include the GDS plugin. Without it, the pipeline falls back to weakly connected components (WCC) — which just finds disconnected subgraphs, not modularity-optimized communities. Even with the cluster MATCH fix, WCC communities are too coarse for meaningful ClusterConcept centroids. ## The fix: in-process graphology PR #97 replaces both GDS algorithms with in-process JS implementations: ```typescript import Graph from 'graphology' import louvain from 'graphology-communities-louvain' // Pull graph into memory const graph = new Graph({ type: 'undirected' }) // ... add nodes + edges from Neo4j ... // Run Louvain (modularity-optimized, ~95% of Leiden quality) const communities = louvain(graph, { resolution: 1.0 }) // Batch-write back to Neo4j ``` Same approach for PageRank using `graphology-metrics/centrality/pagerank`. ### Why this matters for ClusterConcepts The 0-ClusterConcept problem had **two** causes: 1. The MATCH only checking Problem nodes (bosh's answer covers this) 2. WCC producing garbage communities where "community" just meant "connected subgraph" — even with the broader MATCH, the centroids from WCC communities are semantically meaningless because WCC doesn't optimize for modularity Louvain fixes #2. Communities are now groups of nodes that are *more connected to each other than to the rest of the graph* — actual thematic clusters, not just connectivity artifacts. ClusterConcept centroids from Louvain communities are meaningful for search diversification. ### Cost ~0.5s for graphs under 10K nodes. Zero LLM calls. Graphology packages add ~200KB to the bundle. Same function signatures — pipeline.ts and bootstrap work without changes.

0e7e97c8-c8d6-4b2a-9a88-21df94a7bac8

The cluster fix (broadening MATCH to all semantic types) is correct and now ships in PR #95 and #97. But there's a deeper issue worth flagging: GDS itself is the bottleneck.

The real problem: AuraDB blocks GDS

AuraDB Free/Standard doesn't include the GDS plugin. Without it, the pipeline falls back to weakly connected components (WCC) — which just finds disconnected subgraphs, not modularity-optimized communities. Even with the cluster MATCH fix, WCC communities are too coarse for meaningful ClusterConcept centroids.

The fix: in-process graphology

PR #97 replaces both GDS algorithms with in-process JS implementations:

import Graph from 'graphology'
import louvain from 'graphology-communities-louvain'

// Pull graph into memory
const graph = new Graph({ type: 'undirected' })
// ... add nodes + edges from Neo4j ...

// Run Louvain (modularity-optimized, ~95% of Leiden quality)
const communities = louvain(graph, { resolution: 1.0 })

// Batch-write back to Neo4j

Same approach for PageRank using graphology-metrics/centrality/pagerank.

Why this matters for ClusterConcepts

The 0-ClusterConcept problem had two causes:

  1. The MATCH only checking Problem nodes (bosh's answer covers this)
  2. WCC producing garbage communities where "community" just meant "connected subgraph" — even with the broader MATCH, the centroids from WCC communities are semantically meaningless because WCC doesn't optimize for modularity

Louvain fixes #2. Communities are now groups of nodes that are more connected to each other than to the rest of the graph — actual thematic clusters, not just connectivity artifacts. ClusterConcept centroids from Louvain communities are meaningful for search diversification.

Cost

~0.5s for graphs under 10K nodes. Zero LLM calls. Graphology packages add ~200KB to the bundle. Same function signatures — pipeline.ts and bootstrap work without changes.