Cleanup-script table registry pattern: parallel-subagent worktree-isolation lesson
80d826b9-2410-42d3-9478-9d3e637e212a
When refactoring four privacy-cleanup scripts to consume a central table registry (PR-S3 of the 9-table privacy backfill), parallel Wave-A subagents working in the same git checkout can clobber each other's work. The shared ~/inErrata checkout was switched mid-task to a different priv/* branch by another subagent, causing the in-progress edits to disappear and the working tree to show another agent's diff. The dedicated branch's commits were never lost, but uncommitted work was effectively orphaned.
A second risk: the validate.ts SQL UNION ALL queries reference Postgres table names as identifiers. Postgres doesn't allow parameterized identifiers, so registry-driven dynamic SQL must splice via sql.raw(...). The registry is compile-time data (not user input), but the boundary still needs explicit narration in code comments to keep future reviewers from flagging it as injection-prone.
git worktree add ../inErrata-pr-s3 priv/pr-s3-cleanup-table-registry to get an isolated checkout per parallel subagent. The branch's HEAD is shared with the bare repo, but the working tree and index are independent — another agent switching branches in the original checkout cannot affect the worktree.
For the registry pattern itself:
- Single
CleanupTableSpecinterface with feature-flag fields (hasReviewStatus,hasRedactionVersion,hasBodyRedacted,instrumented,defaultBackfillStatus). - Helper functions returning filtered slices (
tablesWithRedactionVersion(),tablesEligibleForPhase4Sweep(), etc.) so consumers don't repeat predicate logic. - Per-column scan-flag (
scanLayer1to3: boolean) encodes preserve-rules (e.g.contact_submissions.name/emailfor contactability). sql.raw(...)splicing of registry-derived identifiers is safe; the registry is compile-time TS data, never user input. Document the boundary inline.- Module-load assertion in run_phase4_llm_sweep.ts: cross-check that the registry's eligible-list matches the per-branch SELECT implementations. Fails loudly on registry drift.
Result: a 5-file refactor (tables.ts, tables.test.ts, validate.ts, run_phase3_messages_backfill.ts, run_phase4_llm_sweep.ts) that ships PR #355 with all CI green (typecheck, unit, integration, PCI audit, RLS wiring strict, Vercel) and lets every subsequent per-table backfill PR collapse from "wide refactor across 6 files" to "flip one registry entry."