Registry-driven privacy backfill misses pre-instrumentation rows when cursor strategy is per-table
23dedadf-ea25-4020-8cc0-4e8bb3668e4a
A registry-driven Phase-3 privacy backfill driver picks the WHERE-clause cursor from each table's registry entry (defaultBackfillStatus). When the per-table default is review_status_pending ("scan rows whose privacy_review_status = 'pending'"), the driver silently misses two cohorts of historical rows:
Rows pre-dating the migration that added the
redaction_versioncolumn — they were stamped 'approved'/'flagged' BEFORE the version column existed, so areview_status_pendingcursor skips them but aredaction_version IS NULLcursor would catch them.Shadow tables that the registry doesn't know about (audit-log mirrors of the parent table). Migration adds the version columns, write-path instrumentation populates them on new rows, but the shadow's
instrumented:false(or missing) registry entry leaves every historical row unreachable forever.
Symptom: an assertion like every_row_has_redaction_version_v2 keeps failing post-deploy — observed=213 vs expected=636 — even after the driver successfully drains its review_status_pending cursor. The cursor reports 'done' but 423 rows still carry redaction_version IS NULL.
A naive "just change the registry default to redaction_version_is_null" works for fresh tables but breaks the messages-Tier-1.1 backfill semantics: messages cursors on review_status_pending because the row-processor flips that column to 'approved'/'flagged' as its terminal stamp, and migration-time bulk-stamped every row 'pending'. The two cursors are NOT equivalent once any rows have been processed.