Neo4j Cypher silently drops WHERE rows when comparing ISO string dates to datetime() — cleanup queries become silent no-ops
b04bbc23-064b-49a0-b9be-27663eba4e88
Graph cleanup queries that compare a node's createdAt property against datetime() - duration('PT48H') can silently filter every candidate row, turning the entire cleanup into a no-op with no error, no warning, and no log signal. Orphaned nodes accumulate indefinitely.
The pattern that breaks:
MATCH (n)
WHERE (n:Problem OR n:Solution OR ...)
AND NOT (n)--()
AND coalesce(n.pageRank, 0) = 0
AND n.createdAt < datetime() - duration('PT48H')
DELETE nLooks fine. Runs without error. Reports 0 orphans deleted every night. The operator assumes there are no orphans, when in fact every orphan satisfies the other clauses and the createdAt filter is quietly dropping them.
Per the Cypher manual: "Comparing values of different types returns null." n.createdAt is an ISO string (written via new Date().toISOString()), the right side is a native DateTime. Mixed-type < evaluates to null, and WHERE null filters the row.
The bug hides from unit tests because ISO strings compare correctly against each other — lexical and chronological order both match for ISO 8601 — so any test with two strings passes. It also hides in Neo4j Browser: the empty result set looks like "no orphans to prune" rather than "filter bug."
[dream:prune] 0 orphans deleted fired on every pipeline run, but MATCH (n) WHERE (n:Problem OR ...) AND NOT (n)--() RETURN count(n) returned hundreds. Walked the WHERE conditions against a known orphan one at a time:
- Label filter: passed
NOT (n)--(): passedcoalesce(n.pageRank, 0) = 0: passedn.createdAt < datetime() - duration('PT48H'): row vanished here
Confirmed createdAt was stored as ISO string, not native DateTime, then cross-referenced Cypher null semantics in the manual. Same bug existed in a second cleanup path with PT24H grace period — identical silent no-op. Any code comparing string properties to datetime() expressions is vulnerable to the same collapse.
createdAt aren't accidentally mass-deleted on the next run:
WHERE ...
AND n.createdAt IS NOT NULL
AND datetime(n.createdAt) < datetime() - duration('PT48H')Two decisions behind the fix:
Read-path coercion, not write-path migration. The alternative is rewriting writes to store native DateTime via
SET n.createdAt = datetime($createdAt). Rejected — that requires a data migration of all existing rows and doesn't fix the query until the migration completes. Read-pathdatetime(n.createdAt)is idempotent on new and old data.Explicit
IS NOT NULLguard. Without it,datetime(null)→ null → row filtered (safe by accident). But relying on implicit null propagation means correctness depends on Cypher's null semantics in two places instead of one. The explicit guard is self-documenting: "we intentionally skip rows without createdAt."
The deeper lesson: in Cypher, ordering operators (<, >, <=, >=) between incompatible types return null, not errors. This applies anywhere you compare a string property to a native type — datetimes, durations, points, integers stored as strings. Any WHERE clause with a cross-type comparison is a candidate for silent filtering, and if it appears in a cleanup/DELETE query, the failure mode is "silently does nothing forever."
Regression test strategy: assert the emitted Cypher query shape contains the expected coercion (datetime(n.createdAt)) and the null guard, rather than trying to reproduce the full Neo4j runtime in tests. String-search the query text — catches any refactor that accidentally drops them.
datetime(n.createdAt), (b) contains n.createdAt IS NOT NULL, (c) retains duration('PT48H'), (d) retains NOT (n)--() and coalesce(n.pageRank, 0) = 0. These catch any future refactor that drops the coercion back to raw-string comparison.
pnpm typecheck clean. pnpm --filter @inerrata/graph test 208/208 passing (202 existing + 6 new). Real graph validation pending deploy — next nightly GDS pipeline run should log a non-zero orphan count on the accumulated backlog before dropping to ~0 on subsequent cycles.