dumping database with "compressed data is corrupt" errors

Question 1

I'm using pg_dump to dump a large (396 GB) PostgreSQL database:

pg_dump --clean --create mydatabase

After running for a day, a statement fails with ERROR: compressed data is corrupt and pg_dump exits.

100% data integrity is not my top priority here. It's probably just one broken row that's preventing me from moving or backing up a db that took many weeks to create. I'm fine with losing that row.

Is there any way to create a dump that ignores such data errors? I found nothing in the pg_dump docs.

If not, how do I find and delete all corrupted rows in my table? Rows in other tables are pointing to them and will need to be deleted as well.

Question 2

According to a grep in the sources, this error

ERROR: compressed data is corrupt

happens in case of a decompression failure of a LZ-compressed TOAST'ed value.

See http://doxygen.postgresql.org/tuptoaster_8c.html#abcb4cc32d19cd5f89e27aeb7e7369fa8

At the row-level storage, large values are stored as pointers to tables in the pg_toast schema containing the actual data.

How do I find and delete all corrupted rows in my table ?

Assuming you know what table it is (if not, iterate over the tables, starting with these with the largest columns), something that might work is:

COPY (select ctid,* FROM tablename ORDER BY ctid) TO '/path/to/file'

ctid is a pseudo-column indicating the physical location of a row in the form of (page,index in page), so this will dump the contents in their order in data files.

This COPY command should error out when it reaches the offending row, but at this point it should have streamed some output (to be confirmed in practice). The end of this output should indicate the ctid up to which the data is not corrupted.

Starting from this ctid, it should be possible to pinpoint the offending row with a dichotomic approach, running queries such as

SELECT ctid,length(t.*::text) FROM table t WHERE ctid>='(?,?)' ORDER BY ctid LIMIT 10

The point of length(t.*::text) is that it must pull all the decompressed data, as opposed to, e.g. count(*).

Once the corrupted row(s) are found, DELETE FROM table WHERE ctid='(?,?)' can be used to delete them.

Note that if VACUUM is working on this table, if might change the ctid of certain rows. Turning off autovacuum or doing all the work inside a transaction should avoid that potential problem.

EDIT: a web search on the error message reveals a blog post from Robert Berry suggesting a method that seems less fastidious than the above. Essentially, create this function:

create function chk(anyelement)
 returns bool 
 language plpgsql as $f$ 
 declare t text; 
 begin t := 1ドル; 
 return false; 
 exception when others then return true; 
 end; 
 $f$;

and run all rows through it to find the ctid of the corrupted ones.

select ctid from table where chk(table);

Question 3

This seems like the way to stamp out corruption. :) I did not have a chance to try this, ended up just abandoning the database after multiple issues.

score 5 · Accepted Answer · 2016-02-01 20:49:21Z

According to a grep in the sources, this error

ERROR: compressed data is corrupt

happens in case of a decompression failure of a LZ-compressed TOAST'ed value.

See http://doxygen.postgresql.org/tuptoaster_8c.html#abcb4cc32d19cd5f89e27aeb7e7369fa8

At the row-level storage, large values are stored as pointers to tables in the pg_toast schema containing the actual data.

How do I find and delete all corrupted rows in my table ?

Assuming you know what table it is (if not, iterate over the tables, starting with these with the largest columns), something that might work is:

COPY (select ctid,* FROM tablename ORDER BY ctid) TO '/path/to/file'

ctid is a pseudo-column indicating the physical location of a row in the form of (page,index in page), so this will dump the contents in their order in data files.

This COPY command should error out when it reaches the offending row, but at this point it should have streamed some output (to be confirmed in practice). The end of this output should indicate the ctid up to which the data is not corrupted.

Starting from this ctid, it should be possible to pinpoint the offending row with a dichotomic approach, running queries such as

SELECT ctid,length(t.*::text) FROM table t WHERE ctid>='(?,?)' ORDER BY ctid LIMIT 10

The point of length(t.*::text) is that it must pull all the decompressed data, as opposed to, e.g. count(*).

Once the corrupted row(s) are found, DELETE FROM table WHERE ctid='(?,?)' can be used to delete them.

Note that if VACUUM is working on this table, if might change the ctid of certain rows. Turning off autovacuum or doing all the work inside a transaction should avoid that potential problem.

EDIT: a web search on the error message reveals a blog post from Robert Berry suggesting a method that seems less fastidious than the above. Essentially, create this function:

create function chk(anyelement)
 returns bool 
 language plpgsql as $f$ 
 declare t text; 
 begin t := 1ドル; 
 return false; 
 exception when others then return true; 
 end; 
 $f$;

and run all rows through it to find the ctid of the corrupted ones.

select ctid from table where chk(table);

This seems like the way to stamp out corruption. :) I did not have a chance to try this, ended up just abandoning the database after multiple issues.

Stack Exchange Network

dumping database with "compressed data is corrupt" errors

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

dumping database with "compressed data is corrupt" errors

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions