Set PostgreSQL table to LOGGED after data loading

Question 1

I have created an empty UNLOGGED table to faster copy a large amount of data (over 1 Billion rows). Loading the data took around 4 hours.

Now I want to set the table to LOGGED to make it safe for unexpected shutdowns and crashes. This process takes a long time. In fact it takes longer than loading the data. Is this normal, or is there a way to speed it up?

Question 2

Maybe it's waiting for a lock? wiki.postgresql.org/wiki/Lock_Monitoring

Question 3

@a_horse_with_no_name SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED; results in 0 rows. Do you expect, that it takes so much time?

Question 4

@a_horse_with_no_name spot check my answer?

Question 5

Problem

I believe currently SET LOGGED rewrites the table using the WAL (essentially doing the whole operation), and rewrites the indexes.

So I found a thread about this on the lists

A new relfilenode is filled with the data - the old one, including the init fork, gets removed by the normal mechanics of rewriting rels.

There was a long thread about it on -hackers. Doing it without a rewrite and without loosing transactional semantics is really rather hard. And having the capability of doing it with a rewrite is better than not having it at all.

You can see the patch that added the SET (LOGGED|UNLOGGED). The implementation hasn't changed much, though there was a plan to fix it that acknowledged the problems

this design lead us to performance problems with large relations because we need to rewrite the entire content of the relation twice, one into a new heap and other into the WAL, so this project will change the current desing of the mechanism of change an unlogged table to logged without the need to rewrite the entire heap, but just by removing the init forks and if the wal_level != minimal we'll write the contents to the WAL too.

But it seems no more work was done. Looking at the code you can see it,

case AT_SetLogged: /* SET LOGGED */
 ATSimplePermissions(rel, ATT_TABLE);
 tab->chgPersistence = ATPrepChangePersistence(rel, true);
 /* force rewrite if necessary; see comment in ATRewriteTables */
 if (tab->chgPersistence)
 {
 tab->rewrite |= AT_REWRITE_ALTER_PERSISTENCE;
 tab->newrelpersistence = RELPERSISTENCE_PERMANENT;
 }

Checking that comment,

 * There are two reasons for requiring a rewrite when changing
 * persistence: on one hand, we need to ensure that the buffers
 * belonging to each of the two relations are marked with or without
 * BM_PERMANENT properly. On the other hand, since rewriting creates
 * and assigns a new relfilenode, we automatically create or drop an
 * init fork for the relation as appropriate.

So you can see, that a rewrite is still required. I guess.

Potential solution

You may be better of copying all of the data into the table on the same transactions that creates the table. Which disables the rewrite, and is mentioned in the docs.

In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include:

COPY into tables that were created or truncated in the same transaction

This would skip the WAL write and heap rewrite.

Question 6

Thanks for the detailed answer. I somehow expected behavior like this. I am loading a huge geospatial dataset with over 1 Billion rows, which is split up into files of ~10 Million rows. The tool that I am using does not create an unlogged table, so I create the unlogged table first and append to that. BTW: the table will only be used to select rows (spatial query), no new rows will be created, nothing will be updated, or deleted. The table will only be joined to other tables. As for WAL, I have to read more about it, I am not so familiar with it...

Question 7

What kind of geospatial data? You may want to check out shp2pgsql or ogr2ogr which can output in pgsql then you can process it as a stream.

Question 8

@Michael if you're satisfied can you mark as chosen

Question 9

This is normal. The advantage to loading as unlogged and then altering to logged would come if you were doing some kind of large-scale manipulation of the table (update ... from ...) after loading it but before setting to logged, or if you for some reason couldn't load it with COPY but had to use individual INSERT statements. Neither of those apply to you, so I wouldn't expect this 2-step method to be of any benefit.

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges510 bronze badges · Accepted Answer · 2018-01-19 22:00:55Z

Problem

I believe currently SET LOGGED rewrites the table using the WAL (essentially doing the whole operation), and rewrites the indexes.

So I found a thread about this on the lists

A new relfilenode is filled with the data - the old one, including the init fork, gets removed by the normal mechanics of rewriting rels.

There was a long thread about it on -hackers. Doing it without a rewrite and without loosing transactional semantics is really rather hard. And having the capability of doing it with a rewrite is better than not having it at all.

You can see the patch that added the SET (LOGGED|UNLOGGED). The implementation hasn't changed much, though there was a plan to fix it that acknowledged the problems

this design lead us to performance problems with large relations because we need to rewrite the entire content of the relation twice, one into a new heap and other into the WAL, so this project will change the current desing of the mechanism of change an unlogged table to logged without the need to rewrite the entire heap, but just by removing the init forks and if the wal_level != minimal we'll write the contents to the WAL too.

But it seems no more work was done. Looking at the code you can see it,

case AT_SetLogged: /* SET LOGGED */
 ATSimplePermissions(rel, ATT_TABLE);
 tab->chgPersistence = ATPrepChangePersistence(rel, true);
 /* force rewrite if necessary; see comment in ATRewriteTables */
 if (tab->chgPersistence)
 {
 tab->rewrite |= AT_REWRITE_ALTER_PERSISTENCE;
 tab->newrelpersistence = RELPERSISTENCE_PERMANENT;
 }

Checking that comment,

 * There are two reasons for requiring a rewrite when changing
 * persistence: on one hand, we need to ensure that the buffers
 * belonging to each of the two relations are marked with or without
 * BM_PERMANENT properly. On the other hand, since rewriting creates
 * and assigns a new relfilenode, we automatically create or drop an
 * init fork for the relation as appropriate.

So you can see, that a rewrite is still required. I guess.

Potential solution

You may be better of copying all of the data into the table on the same transactions that creates the table. Which disables the rewrite, and is mentioned in the docs.

In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include:

COPY into tables that were created or truncated in the same transaction

This would skip the WAL write and heap rewrite.

Thanks for the detailed answer. I somehow expected behavior like this. I am loading a huge geospatial dataset with over 1 Billion rows, which is split up into files of ~10 Million rows. The tool that I am using does not create an unlogged table, so I create the unlogged table first and append to that. BTW: the table will only be used to select rows (spatial query), no new rows will be created, nothing will be updated, or deleted. The table will only be joined to other tables. As for WAL, I have to read more about it, I am not so familiar with it...
What kind of geospatial data? You may want to check out shp2pgsql or ogr2ogr which can output in pgsql then you can process it as a stream.

Stack Exchange Network

Set PostgreSQL table to LOGGED after data loading

2 Answers 2

Problem

Potential solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Set PostgreSQL table to LOGGED after data loading

2 Answers 2

Problem

Potential solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions