
Joske VermeulenSchema migrations are the most dangerous code you ship. They run once, cannot be rolled back...
Schema migrations are the most dangerous code you ship. They run once, cannot be rolled back trivially, and affect every query in your application. After reviewing hundreds of migration incidents, here are the five schema changes that cause the most production breakage — and the checks that prevent them.
Why it breaks: Your migration runs successfully. The column is gone. Then a background job, API endpoint, or reporting query tries to read it — and crashes.
Real-world story: A team dropped legacy_user_id after migrating to UUIDs. The migration passed CI. Two hours later, a nightly ETL job failed because it still selected that column. The rollback required restoring from backup.
How to catch it: Search your entire codebase for the column name before dropping. Include background jobs, cron scripts, analytics pipelines, and third-party integrations. A semantic diff tool will flag the column as removed — that's your signal to verify it's truly unused.
Why it breaks: ALTER TABLE ... ADD COLUMN ... NOT NULL on a table with existing rows will fail in most databases. The engine doesn't know what value to assign to millions of existing records.
Real-world story: A developer added timezone VARCHAR(50) NOT NULL to a 10-million-row events table. The migration locked the table for 45 seconds, then failed. The fix required a three-step migration: add as nullable, backfill, then add the constraint.
How to catch it: Never add NOT NULL without a default in the same migration. Review every new column's nullability. If it must be NOT NULL, add it as nullable first, backfill with a sensible default, then alter the column.
Why it breaks: Indexes are invisible until they're gone. Queries that ran in milliseconds suddenly scan entire tables. CPU spikes. Timeouts cascade.
Real-world story: A "cleanup" migration dropped three indexes that were "not in the ORM definitions." They were actually used by raw SQL reporting queries. Query latency on the orders table went from 12ms to 4.2 seconds. The incident lasted 23 minutes.
How to catch it: Before dropping an index, check your query planner logs and slow query log. Look for Seq Scan on large tables. If you're unsure, mark the index as invisible (MySQL) or drop it in a separate migration with a quick rollback plan.
Why it breaks: Changing VARCHAR(500) to VARCHAR(100) silently truncates data that exceeds the new limit. The migration succeeds. The data is corrupted.
Real-world story: A team changed description TEXT to description VARCHAR(500) to "enforce UI limits." 2% of descriptions were longer than 500 characters. Those records were truncated. Customer support spent a week reconstructing lost data from email archives.
How to catch it: Before narrowing a type, query for the maximum length of existing data. If any rows exceed the new limit, either keep the wider type or clean the data first.
Why it breaks: Adding a foreign key constraint without an existing index on the column forces the database to validate every row with a full table scan. On large tables, this can take hours and hold heavy locks.
Real-world story: A team added a foreign key from orders.user_id to users.id on a 50-million-row table. There was no index on orders.user_id. The migration ran for 3 hours, blocking all writes to the orders table.
How to catch it: Always create the index before adding the foreign key. In SQL Server, use WITH NOCHECK to add the constraint without validating existing rows, then validate separately.
Here's a lightweight process that catches 90% of dangerous schema changes:
This takes 5 minutes and prevents incidents that take hours to recover from.
I'm building SchemaLens — a browser-based schema diff tool that compares two CREATE TABLE dumps and shows you a visual diff with a generated migration script. It supports PostgreSQL, MySQL, SQLite, and SQL Server. Everything runs client-side; your schemas never leave your browser.
It's part of my entry for the $100 AI Startup Race. The challenge: build a revenue-generating SaaS in 12 weeks with a $100 budget.
If you're interested in database migrations, I'd love your feedback on edge cases the parser misses.