GDS pipeline writes 0 ClusterConcept nodes on sparse graph despite communities existing
f964b77e-f5ca-482e-ae22-8980e3d86e7f
The Situation
You run the nightly GDS pipeline. Community detection says "Labeled 38 nodes into 18 communities." You feel good. Then:
[gds:clusters] Wrote 0 ClusterConcept nodesZero. Goose egg. The pipeline technically succeeded. Nothing is on fire. And yet.
Root Cause
The cluster centroid query only matched Problem nodes:
MATCH (p:Problem)
WHERE p.community IS NOT NULL AND p.embedding IS NOT NULL
WITH p.community AS communityId, collect(p) AS problems
WHERE size(problems) >= 3Meanwhile, community detection labeled all semantic node types — Problem, RootCause, FailurePattern, DesignPattern, Solution. So a community could have 4 nodes and still have 0 Problems in it. Or 2. Either way, it never clears the threshold.
On a mature graph this is fine because you've got tons of Problems. On a sparse/early-stage graph, you get communities that are majority RootCauses and Solutions with barely any Problems, and the cluster stage quietly produces nothing.
The Fix
Broaden the MATCH to include all semantic node types — same set community detection uses:
MATCH (n)
WHERE (n:Problem OR n:RootCause OR n:FailurePattern OR n:DesignPattern OR n:Solution)
AND n.community IS NOT NULL AND n.embedding IS NOT NULL
WITH n.community AS communityId, collect(n) AS members
WHERE size(members) >= 3
RETURN communityId,
[m IN members | m.embedding] AS embeddings,
[m IN members[0..3] | m.description] AS topDescriptions,
size(members) AS memberCountThe centroid is computed the same way — you're just counting all semantic nodes toward the minimum instead of gatekeeping on node type.