Have such a problem - table with over 20mln rows.
When i add new column with default - postgresql lock table for over 40 minutes so my application stop working for this time.
So instead of
ALTER TABLE "test" ADD COLUMN "field" boolean DEFAULT True NOT NULL;
I do
ALTER TABLE "test" ADD COLUMN "field" boolean NULL;
ALTER TABLE "test" ALTER COLUMN "field" SET DEFAULT true;
after which every new row will be with true by default, so now i need to update 20mln current rows. I update them in batches:
WITH cte AS (
SELECT id as pk
FROM "test"
WHERE "field" is null
LIMIT 10000
)
UPDATE "test" table_
SET "field" = true
FROM cte
WHERE table_.id = cte.pk
after which i do
ALTER TABLE "test" ALTER COLUMN "field" SET NOT NULL;
And everything is allwright but step when i update rows is so much slow. Can you give my some advice about improving speed of update?
Currently it update 10000 in about 2 minutes. I try decreasing size to 1000 - it was better (3.5 minutes for 10000) but it still to slow. I also try create index before update, but it doesnt give better results (as i understand it will give better result but when huge path of table will be updated).
2 Answers 2
This problem is solved in PostgreSQL 11, which was released on October 18, 2018.
There is a more detailed explanation in this blog post, but the short version is that in Version 11 the table doesn't have to be rewritten to add a default value to a column. So adding a column with a default should be pretty much instant.
All you can do now is,
- Create a column that is
NULLABLE
ALTER
the column to give it theDEFAULT
- UPDATE in batches to rewrite the rows (row-level locking)
- When the rows are updated, add the
NOT NULL
constraint (which should be pretty quick just a seq scan.
For more information, see this answer
Explore related questions
See similar questions with these tags.