I have created an empty UNLOGGED
table to faster copy a large amount of data (over 1 Billion rows). Loading the data took around 4 hours.
Now I want to set the table to LOGGED
to make it safe for unexpected shutdowns and crashes. This process takes a long time. In fact it takes longer than loading the data. Is this normal, or is there a way to speed it up?
2 Answers 2
Problem
I believe currently SET LOGGED
rewrites the table using the WAL (essentially doing the whole operation), and rewrites the indexes.
So I found a thread about this on the lists
A new relfilenode is filled with the data - the old one, including the init fork, gets removed by the normal mechanics of rewriting rels.
There was a long thread about it on
-hackers
. Doing it without a rewrite and without loosing transactional semantics is really rather hard. And having the capability of doing it with a rewrite is better than not having it at all.
You can see the patch that added the SET (LOGGED|UNLOGGED)
. The implementation hasn't changed much, though there was a plan to fix it that acknowledged the problems
this design lead us to performance problems with large relations because we need to rewrite the entire content of the relation twice, one into a new heap and other into the WAL, so this project will change the current desing of the mechanism of change an unlogged table to logged without the need to rewrite the entire heap, but just by removing the init forks and if the wal_level != minimal we'll write the contents to the WAL too.
But it seems no more work was done. Looking at the code you can see it,
case AT_SetLogged: /* SET LOGGED */ ATSimplePermissions(rel, ATT_TABLE); tab->chgPersistence = ATPrepChangePersistence(rel, true); /* force rewrite if necessary; see comment in ATRewriteTables */ if (tab->chgPersistence) { tab->rewrite |= AT_REWRITE_ALTER_PERSISTENCE; tab->newrelpersistence = RELPERSISTENCE_PERMANENT; }
Checking that comment,
* There are two reasons for requiring a rewrite when changing * persistence: on one hand, we need to ensure that the buffers * belonging to each of the two relations are marked with or without * BM_PERMANENT properly. On the other hand, since rewriting creates * and assigns a new relfilenode, we automatically create or drop an * init fork for the relation as appropriate.
So you can see, that a rewrite is still required. I guess.
Potential solution
You may be better of copying all of the data into the table on the same transactions that creates the table. Which disables the rewrite, and is mentioned in the docs.
In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include:
COPY
into tables that were created or truncated in the same transaction
This would skip the WAL write and heap rewrite.
-
Thanks for the detailed answer. I somehow expected behavior like this. I am loading a huge geospatial dataset with over 1 Billion rows, which is split up into files of ~10 Million rows. The tool that I am using does not create an unlogged table, so I create the unlogged table first and append to that. BTW: the table will only be used to select rows (spatial query), no new rows will be created, nothing will be updated, or deleted. The table will only be joined to other tables. As for WAL, I have to read more about it, I am not so familiar with it...Michael– Michael2018年01月20日 03:01:20 +00:00Commented Jan 20, 2018 at 3:01
-
What kind of geospatial data? You may want to check out
shp2pgsql
orogr2ogr
which can output in pgsql then you can process it as a stream.Evan Carroll– Evan Carroll2018年01月20日 21:24:18 +00:00Commented Jan 20, 2018 at 21:24 -
@Michael if you're satisfied can you mark as chosenEvan Carroll– Evan Carroll2021年11月08日 01:42:01 +00:00Commented Nov 8, 2021 at 1:42
This is normal. The advantage to loading as unlogged and then altering to logged would come if you were doing some kind of large-scale manipulation of the table (update ... from ...
) after loading it but before setting to logged, or if you for some reason couldn't load it with COPY
but had to use individual INSERT
statements. Neither of those apply to you, so I wouldn't expect this 2-step method to be of any benefit.
Explore related questions
See similar questions with these tags.
SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;
results in 0 rows. Do you expect, that it takes so much time?