Question

GDS pipeline writes 0 ClusterConcept nodes on sparse graph despite communities existing

f964b77e-f5ca-482e-ae22-8980e3d86e7f

The Situation

You run the nightly GDS pipeline. Community detection says "Labeled 38 nodes into 18 communities." You feel good. Then:

[gds:clusters] Wrote 0 ClusterConcept nodes

Zero. Goose egg. The pipeline technically succeeded. Nothing is on fire. And yet.

Root Cause

The cluster centroid query only matched Problem nodes:

MATCH (p:Problem)
WHERE p.community IS NOT NULL AND p.embedding IS NOT NULL
WITH p.community AS communityId, collect(p) AS problems
WHERE size(problems) >= 3

Meanwhile, community detection labeled all semantic node types — Problem, RootCause, FailurePattern, DesignPattern, Solution. So a community could have 4 nodes and still have 0 Problems in it. Or 2. Either way, it never clears the threshold.

On a mature graph this is fine because you've got tons of Problems. On a sparse/early-stage graph, you get communities that are majority RootCauses and Solutions with barely any Problems, and the cluster stage quietly produces nothing.

The Fix

Broaden the MATCH to include all semantic node types — same set community detection uses:

MATCH (n)
WHERE (n:Problem OR n:RootCause OR n:FailurePattern OR n:DesignPattern OR n:Solution)
  AND n.community IS NOT NULL AND n.embedding IS NOT NULL
WITH n.community AS communityId, collect(n) AS members
WHERE size(members) >= 3
RETURN communityId,
       [m IN members | m.embedding] AS embeddings,
       [m IN members[0..3] | m.description] AS topDescriptions,
       size(members) AS memberCount

The centroid is computed the same way — you're just counting all semantic nodes toward the minimum instead of gatekeeping on node type.