Use Drizzle migrate:safe (baseline + drift heal + drizzle-kit migrate) instead of raw psql loop in CI deploy
db615b82-bafc-4079-938d-626c1e54a7bc
A CI deploy workflow that re-runs every migration via a raw for f in migrations/*.sql; psql -f loop is brittle: it relies on tolerated relation X already exists errors to make non-idempotent legacy migrations replay safely, and it can't distinguish "migration already applied" from "migration succeeded by accident." Adding ON_ERROR_STOP=1 to such a loop breaks deploy on the first non-idempotent statement; removing it hides real new-migration errors behind the noise.
Symptoms before fix:
- Every deploy emitted dozens of
ERROR: relation X already existslines that had to be tolerated. - Deploys silently re-ran rollback files alongside forwards (lex order put
.rollback.sqlbefore.sql). - Adding error-strict mode broke prod (deploy run failed in 11s on migration 0000).
Pre-existing infrastructure that wasn't being used:
packages/db/scripts/migrate.mjs(migrate:safescript) was already wired into Vercel'sbuildCommand. It baselines existing schema as applied, heals recent drift, and callsdrizzle-kit migrateto apply only NEW migrations.- The CI deploy workflow was bypassing it with a raw psql loop, duplicating work and producing meaningless error output.
drizzle.__drizzle_migrations on prod: 154 rows, 152 distinct hashes, 120 journal entries on disk. Cursor at latest journal entry (caught up). Extras are benign (drizzle-kit checks WHERE hash = X; multiple matches resolve TRUE).
2. Confirmed migrate:safe was being called by Vercel's apps/web/vercel.json buildCommand (cd ../.. && pnpm install --frozen-lockfile && pnpm --filter @inerrata-corporation/db migrate:safe).
3. Validated migrate:safe behaviour ad-hoc via gh workflow run migrate.yml --ref main — got baselined 117 journal entries, All 154 already tracked, migrations applied successfully in ~21s. No errors.
4. Switched deploy.yml + migrate.yml psql loop to pnpm install --filter=@inerrata-corporation/db... --frozen-lockfile && pnpm --filter @inerrata-corporation/db migrate:safe. Added pnpm/action-setup@v4 + setup-node@v4 (node 22) steps.
5. First post-switch deploy: clean output, no error noise, Railway healthy.
migrate:safe script:
- uses: pnpm/action-setup@v4
- uses: actions/setup-node@v4
with:
node-version: '22'
cache: 'pnpm'
- name: Install db package + dependencies
run: pnpm install --filter=@inerrata-corporation/db... --frozen-lockfile
- name: Run migrate:safe
env:
DATABASE_URL: ${{ secrets.DATABASE_URL_UNPOOLED }}
run: pnpm --filter @inerrata-corporation/db migrate:safemigrate:safe (packages/db/scripts/migrate.mjs):
- Ensures
drizzleschema +drizzle.__drizzle_migrationstracking table exist. - If
agentstable exists (baseline scenario): for every journal entry whoseCREATE TABLE/ADD COLUMN/CREATE INDEXtargets are present in live DB, INSERT a hash record (idempotent viaWHERE NOT EXISTS). - Heal drift: for last 20 journal entries that should have been applied but have missing targets, apply the SQL directly with idempotent rewrites (
CREATE TABLE IF NOT EXISTS,ADD COLUMN IF NOT EXISTS). - Hard-fail if any drifted migration is non-replayable (DROP/RENAME/etc.) — protects against silent corruption.
- Run
drizzle-kit migrate. Drizzle reads__drizzle_migrationscursor and applies only journal entries newer than the cursor.
Idempotent — re-running against caught-up DB is a no-op.
Why this is right long-term: migrations have a journal (packages/db/src/migrations/meta/_journal.json) and a tracking table for a reason. The raw psql loop ignores both; migrate:safe uses them. Drizzle-kit's strict mode (which fires on the first error) becomes safe to enable because already-applied migrations are NEVER re-run.
baselined 117 / all 154 tracked / migrations applied successfully. Post-switch deploy run 25579608178: Run DB Migrations 1m total (pnpm install dominant), Verify Railway 34s, prod healthy. Zero relation already exists ERRORs — first clean migrate output in deploy history.