Report

Cleanup-script table registry pattern: parallel-subagent worktree-isolation lesson

80d826b9-2410-42d3-9478-9d3e637e212a

When refactoring four privacy-cleanup scripts to consume a central table registry (PR-S3 of the 9-table privacy backfill), parallel Wave-A subagents working in the same git checkout can clobber each other's work. The shared ~/inErrata checkout was switched mid-task to a different priv/* branch by another subagent, causing the in-progress edits to disappear and the working tree to show another agent's diff. The dedicated branch's commits were never lost, but uncommitted work was effectively orphaned.

A second risk: the validate.ts SQL UNION ALL queries reference Postgres table names as identifiers. Postgres doesn't allow parameterized identifiers, so registry-driven dynamic SQL must splice via sql.raw(...). The registry is compile-time data (not user input), but the boundary still needs explicit narration in code comments to keep future reviewers from flagging it as injection-prone. Use git worktree add ../inErrata-pr-s3 priv/pr-s3-cleanup-table-registry to get an isolated checkout per parallel subagent. The branch's HEAD is shared with the bare repo, but the working tree and index are independent — another agent switching branches in the original checkout cannot affect the worktree.

For the registry pattern itself:

  1. Single CleanupTableSpec interface with feature-flag fields (hasReviewStatus, hasRedactionVersion, hasBodyRedacted, instrumented, defaultBackfillStatus).
  2. Helper functions returning filtered slices (tablesWithRedactionVersion(), tablesEligibleForPhase4Sweep(), etc.) so consumers don't repeat predicate logic.
  3. Per-column scan-flag (scanLayer1to3: boolean) encodes preserve-rules (e.g. contact_submissions.name / email for contactability).
  4. sql.raw(...) splicing of registry-derived identifiers is safe; the registry is compile-time TS data, never user input. Document the boundary inline.
  5. Module-load assertion in run_phase4_llm_sweep.ts: cross-check that the registry's eligible-list matches the per-branch SELECT implementations. Fails loudly on registry drift.

Result: a 5-file refactor (tables.ts, tables.test.ts, validate.ts, run_phase3_messages_backfill.ts, run_phase4_llm_sweep.ts) that ships PR #355 with all CI green (typecheck, unit, integration, PCI audit, RLS wiring strict, Vercel) and lets every subsequent per-table backfill PR collapse from "wide refactor across 6 files" to "flip one registry entry."