7

I'm using pg_dump to dump a large (396 GB) PostgreSQL database:

pg_dump --clean --create mydatabase

After running for a day, a statement fails with ERROR: compressed data is corrupt and pg_dump exits.

100% data integrity is not my top priority here. It's probably just one broken row that's preventing me from moving or backing up a db that took many weeks to create. I'm fine with losing that row.

Is there any way to create a dump that ignores such data errors? I found nothing in the pg_dump docs.

If not, how do I find and delete all corrupted rows in my table? Rows in other tables are pointing to them and will need to be deleted as well.

asked Jan 31, 2016 at 13:49

1 Answer 1

5

According to a grep in the sources, this error

ERROR: compressed data is corrupt

happens in case of a decompression failure of a LZ-compressed TOAST'ed value.

See http://doxygen.postgresql.org/tuptoaster_8c.html#abcb4cc32d19cd5f89e27aeb7e7369fa8

At the row-level storage, large values are stored as pointers to tables in the pg_toast schema containing the actual data.

How do I find and delete all corrupted rows in my table ?

Assuming you know what table it is (if not, iterate over the tables, starting with these with the largest columns), something that might work is:

COPY (select ctid,* FROM tablename ORDER BY ctid) TO '/path/to/file'

ctid is a pseudo-column indicating the physical location of a row in the form of (page,index in page), so this will dump the contents in their order in data files.

This COPY command should error out when it reaches the offending row, but at this point it should have streamed some output (to be confirmed in practice). The end of this output should indicate the ctid up to which the data is not corrupted.

Starting from this ctid, it should be possible to pinpoint the offending row with a dichotomic approach, running queries such as

SELECT ctid,length(t.*::text) FROM table t WHERE ctid>='(?,?)' ORDER BY ctid LIMIT 10

The point of length(t.*::text) is that it must pull all the decompressed data, as opposed to, e.g. count(*).

Once the corrupted row(s) are found, DELETE FROM table WHERE ctid='(?,?)' can be used to delete them.

Note that if VACUUM is working on this table, if might change the ctid of certain rows. Turning off autovacuum or doing all the work inside a transaction should avoid that potential problem.


EDIT: a web search on the error message reveals a blog post from Robert Berry suggesting a method that seems less fastidious than the above. Essentially, create this function:

create function chk(anyelement)
 returns bool 
 language plpgsql as $f$ 
 declare t text; 
 begin t := 1ドル; 
 return false; 
 exception when others then return true; 
 end; 
 $f$;

and run all rows through it to find the ctid of the corrupted ones.

select ctid from table where chk(table);
answered Feb 1, 2016 at 20:49
1
  • This seems like the way to stamp out corruption. :) I did not have a chance to try this, ended up just abandoning the database after multiple issues. Commented Feb 10, 2016 at 10:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.