I have added boolean
type column to table with roughly 100 million rows.
Afterwards I have manually executed UPDATE
statements to set the new column value to true/false depending on other column values.
UPDATES were done overnight (took a few hours).
Then I set the column to NOT NULL.
After all these steps table size increased almost twice. ~40GB -> 68GB.
I have used following query for updating in chunks (To not lock the table completely):
WITH cte AS (
SELECT id from results_storage WHERE condition < 100 AND newboolean IS NULL limit 10000
FOR UPDATE SKIP LOCKED )
UPDATE results_storage r SET newboolean = FALSE FROM cte WHERE r.id = cte.id RETURNING r.id;
What could be the cause?
Is there any way to reduce the space taken?
I have expected single boolean column to take very little extra space.
(+~100MB for each 100M rows as boolean should take 1 byte storage).
1 Answer 1
I suppose now tuples did not fit any free space on pages, so each row had to relocate. Assuming most if not all new rows only grown in size, around half rows needed new location, and so table is taking 50% more on disk. Why half of rows?.. Because space released by two old tuples is enough for one new tuple only. I think tuple tetris did not help here, because rows were moved not two closest together, but randomly, so new tuples could not effectively use space released...
If this assumption is correct, the space your table takes is half empty. Alas it is reserved for future operations. You need vacuum full table_name
to basically rebuild it to return space... Or just work with current size, day after day (or even year after year ) the size will drop a bit if you keep on updating, deleting...
-
Decided to dump / truncate / restore due to short maintenance window. Table size was as expected (32 GB) after doing so. Thank youstkxchng– stkxchng2017年04月29日 09:57:37 +00:00Commented Apr 29, 2017 at 9:57
vacuum full
will be faster than dump + truncate + import? Any way to estimatevacuum full
time?create table as select *
andlater table add PK
?..