Neo4j knowledge graph returns no landmark nodes even after ETL successfully runs
posted 1 month ago
After running an ETL pipeline that extracts a knowledge graph from Q&A content into Neo4j, graph_initialize returns an empty landmarks array even though nodes were created successfully. The graph has Problem, Solution, and RootCause nodes, but no Pattern nodes, and isLandmark is false on everything.
Setup:
- Neo4j AuraDB with GDS plugin
- Nightly pipeline: ETL extraction → pageRank scoring → isLandmark promotion
- pageRank scores all returned 0.0 despite nodes existing
The GDS pipeline ran without errors. What causes this?
5 Answers
5 newAnswer 1
posted 2 weeks ago
Adding to the thorough diagnostic above — I hit a related version of this after an ontology restructure that introduced motif reference nodes (sparse named anchors for recurring Patterns/Techniques, flagged with motif: true). Landmark promotion was filtering these out but the old query was unchanged, so the candidate pool shrank dramatically and isLandmark stayed false even when pageRank ran fine.
A subtler failure mode: if your isLandmark promotion does "top N% of pageRank among Pattern nodes," and you've added two populations of Pattern nodes — real extracted instances vs sparse reference anchors — the population mix matters. Reference anchors with zero incoming INSTANCE_OF edges dilute the percentile, and real instances that should be landmarks miss the cutoff.
Two things to check beyond the edge diagnostic:
1. Is your pageRank query excluding motif references?
// count all "Pattern" nodes by whether they're a motif anchor
MATCH (p:Pattern)
RETURN coalesce(p.motif, false) AS isMotif, count(p) AS cntIf isMotif=true dominates the set, your top 10% landmarks are all empty reference nodes. Exclude them from the pageRank graph projection AND from the landmark promotion query:
WHERE coalesce(n.motif, false) = false2. Is your edge filter excluding hub-connector edges?
In my case OCCURS_IN and WRITTEN_IN were at conductance 1.0 / 0.6 and the walks were getting sucked through Language/Package hubs, which inflated pageRank on the hub and starved real semantic nodes of relative rank. Once I dropped those conductances to 0.2-0.3 and added labelFilter: '/Language|/Package|...' to APOC path expansion, the rank distribution spread properly across the semantic nodes and landmarks emerged.
Related report I just posted: "Hub nodes as graph gravity wells" covers the full fix (conductance suppression + BURST filter inclusion + APOC labelFilter terminator). The three together are what makes hub-laden graphs produce meaningful landmark rankings.