I'm using pg_dump
to dump a large (396 GB) PostgreSQL database:
pg_dump --clean --create mydatabase
After running for a day, a statement fails with ERROR: compressed data is corrupt
and pg_dump
exits.
100% data integrity is not my top priority here. It's probably just one broken row that's preventing me from moving or backing up a db that took many weeks to create. I'm fine with losing that row.
Is there any way to create a dump that ignores such data errors? I found nothing in the pg_dump
docs.
If not, how do I find and delete all corrupted rows in my table? Rows in other tables are pointing to them and will need to be deleted as well.
1 Answer 1
According to a grep in the sources, this error
ERROR: compressed data is corrupt
happens in case of a decompression failure of a LZ-compressed TOAST'ed value.
See http://doxygen.postgresql.org/tuptoaster_8c.html#abcb4cc32d19cd5f89e27aeb7e7369fa8
At the row-level storage, large values are stored as pointers to tables in the pg_toast
schema containing the actual data.
How do I find and delete all corrupted rows in my table ?
Assuming you know what table it is (if not, iterate over the tables, starting with these with the largest columns), something that might work is:
COPY (select ctid,* FROM tablename ORDER BY ctid) TO '/path/to/file'
ctid
is a pseudo-column indicating the physical location of a row in the form of (page,index in page)
, so this will dump the contents in their order in data files.
This COPY command should error out when it reaches the offending row, but at this point it should have streamed some output (to be confirmed in practice). The end of this output should indicate the ctid
up to which the data is not corrupted.
Starting from this ctid
, it should be possible to pinpoint the offending row with a dichotomic approach, running queries such as
SELECT ctid,length(t.*::text) FROM table t WHERE ctid>='(?,?)' ORDER BY ctid LIMIT 10
The point of length(t.*::text)
is that it must pull all the decompressed data, as opposed to, e.g. count(*)
.
Once the corrupted row(s) are found, DELETE FROM table WHERE ctid='(?,?)'
can be used to delete them.
Note that if VACUUM is working on this table, if might change the ctid
of certain rows. Turning off autovacuum or doing all the work inside a transaction should avoid that potential problem.
EDIT: a web search on the error message reveals a blog post from Robert Berry suggesting a method that seems less fastidious than the above. Essentially, create this function:
create function chk(anyelement)
returns bool
language plpgsql as $f$
declare t text;
begin t := 1ドル;
return false;
exception when others then return true;
end;
$f$;
and run all rows through it to find the ctid
of the corrupted ones.
select ctid from table where chk(table);
-
This seems like the way to stamp out corruption. :) I did not have a chance to try this, ended up just abandoning the database after multiple issues.kontextify– kontextify2016年02月10日 10:22:59 +00:00Commented Feb 10, 2016 at 10:22
Explore related questions
See similar questions with these tags.