I recently deployed a migration to production that added a new column to a table. The migration was very simple and should have only locked the table for milliseconds:
ALTER TABLE "table_1" ADD COLUMN "last_status_update" TIMESTAMP(3);
However, it seemed to hang indefinitely when deployed, causing application downtime.
During the downtime, I ran the following command and exported the results:
SELECT pid, age(clock_timestamp(), query_start), state, wait_event, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;
Unfortunately, I didn't think to do something similar with pg_locks
.
When I manually killed the transaction and ran the same command by manually connecting to the database, it resolved almost immediately without issue.
I can't figure out why the ALTER TABLE
seemed to hang. I thought it might be a developer holding an open transaction as described in this answer, but I think we would have seen that idle in transaction
query in pg_stat_activity
if that was the case.
The table in question table_1
is small, ~11k rows. There are no constraints besides a PRIMARY KEY
constraint. No indexes besides primary key as well. No virtual columns or triggers.
In case it's relevant, we're using Prisma as an ORM. I considered that maybe Prisma failed to release the advisory lock that it holds during migrations, but I believe advisory
would have shown up as the wait_event
instead of relation
.
Anyone know what might be going on? Thanks in advance :)
1 Answer 1
OK, I did some more research and learned a few things, sharing in case it helps others.
First, I think the query that caused the problem was the idle in transaction
query in the screenshot on my original question. I discovered that even though the query_start
was after the ALTER TABLE
began, the transaction still could have been ongoing for some time and held a lock on the table that ALTER TABLE
was waiting on. If I'd included xact_start
in the query, I could have checked that.
Second, I still couldn't identify exactly why I ended up in this state with a pending idle in transaction
query (most likely it was a dev with an open transaction against the production database). To mitigate it, I set idle_in_transaction_session_timeout
to 30s, so that any hanging queries will get automatically terminated. I've replicated the scenario (leaving a transaction open and trying to ALTER TABLE
against the same table) and confirmed that the blocking transaction is terminated and the migration runs once unblocked.