Answer

This is a cascade failure with three links: **1. ETL model omits Pattern nodes** If your extraction prompt includes a rule like "omit nodes you're not confident about" or treats Pattern as optional, weaker models (e.g. Claude Haiku) will skip them almost entirely, especially when the extraction is batched and the model is optimizing for token efficiency. Problem, RootCause, and Solution nodes feel concrete — Pattern is abstract and gets skipped. **2. No Pattern nodes = no INSTANCE_OF edges** The `Problem INSTANCE_OF Pattern` edges are what the pageRank pipeline scores against. The GDS pageRank query counts inbound INSTANCE_OF edges to score Pattern nodes. If no Pattern nodes exist, no scores get written — all nodes stay at pageRank 0.0. **3. pageRank 0 = no landmarks** The isLandmark promotion step takes the top N% of nodes by pageRank. If everything is 0, nothing gets promoted. `graph_initialize` returns an empty landmark set. **Fix:** Make Pattern nodes **required** in your extraction prompt, not optional. Include a concrete worked example showing the full `Problem → RootCause → Solution → Pattern` chain with an explicit `INSTANCE_OF` edge. Change the rule from "omit if uncertain" to "always abstract the problem to its general pattern — ask yourself 'what class of problem is this?'". Also switch to a stronger model (Sonnet vs Haiku) for extraction — the quality difference on structured output is significant. After fixing the prompt and re-running ETL, the GDS pipeline will score Pattern nodes by INSTANCE_OF edge count and the top percentile will be promoted to `isLandmark=true`.

e459623d-0c01-43fa-a0f5-5414969ceebf

This is a cascade failure with three links:

1. ETL model omits Pattern nodes If your extraction prompt includes a rule like "omit nodes you're not confident about" or treats Pattern as optional, weaker models (e.g. Claude Haiku) will skip them almost entirely, especially when the extraction is batched and the model is optimizing for token efficiency. Problem, RootCause, and Solution nodes feel concrete — Pattern is abstract and gets skipped.

2. No Pattern nodes = no INSTANCE_OF edges The Problem INSTANCE_OF Pattern edges are what the pageRank pipeline scores against. The GDS pageRank query counts inbound INSTANCE_OF edges to score Pattern nodes. If no Pattern nodes exist, no scores get written — all nodes stay at pageRank 0.0.

3. pageRank 0 = no landmarks The isLandmark promotion step takes the top N% of nodes by pageRank. If everything is 0, nothing gets promoted. graph_initialize returns an empty landmark set.

Fix: Make Pattern nodes required in your extraction prompt, not optional. Include a concrete worked example showing the full Problem → RootCause → Solution → Pattern chain with an explicit INSTANCE_OF edge. Change the rule from "omit if uncertain" to "always abstract the problem to its general pattern — ask yourself 'what class of problem is this?'".

Also switch to a stronger model (Sonnet vs Haiku) for extraction — the quality difference on structured output is significant.

After fixing the prompt and re-running ETL, the GDS pipeline will score Pattern nodes by INSTANCE_OF edge count and the top percentile will be promoted to isLandmark=true.