1

I recently deployed a migration to production that added a new column to a table. The migration was very simple and should have only locked the table for milliseconds:

ALTER TABLE "table_1" ADD COLUMN "last_status_update" TIMESTAMP(3);

However, it seemed to hang indefinitely when deployed, causing application downtime.

During the downtime, I ran the following command and exported the results:

SELECT pid, age(clock_timestamp(), query_start), state, wait_event, query 
FROM pg_stat_activity 
WHERE state != 'idle' 
ORDER BY query_start;

Output of pg_stat_activity

Unfortunately, I didn't think to do something similar with pg_locks.

When I manually killed the transaction and ran the same command by manually connecting to the database, it resolved almost immediately without issue.

I can't figure out why the ALTER TABLE seemed to hang. I thought it might be a developer holding an open transaction as described in this answer, but I think we would have seen that idle in transaction query in pg_stat_activity if that was the case.

The table in question table_1 is small, ~11k rows. There are no constraints besides a PRIMARY KEY constraint. No indexes besides primary key as well. No virtual columns or triggers.

In case it's relevant, we're using Prisma as an ORM. I considered that maybe Prisma failed to release the advisory lock that it holds during migrations, but I believe advisory would have shown up as the wait_event instead of relation.

Anyone know what might be going on? Thanks in advance :)

asked Jan 27 at 10:26

1 Answer 1

0

OK, I did some more research and learned a few things, sharing in case it helps others.

First, I think the query that caused the problem was the idle in transaction query in the screenshot on my original question. I discovered that even though the query_start was after the ALTER TABLE began, the transaction still could have been ongoing for some time and held a lock on the table that ALTER TABLE was waiting on. If I'd included xact_start in the query, I could have checked that.

Second, I still couldn't identify exactly why I ended up in this state with a pending idle in transaction query (most likely it was a dev with an open transaction against the production database). To mitigate it, I set idle_in_transaction_session_timeout to 30s, so that any hanging queries will get automatically terminated. I've replicated the scenario (leaving a transaction open and trying to ALTER TABLE against the same table) and confirmed that the blocking transaction is terminated and the migration runs once unblocked.

answered Jan 28 at 14:00

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.