16

I've spent the last 8 hours trying to import the output of 'mysqldump --compatible=postgresql' into PostgreSQL 8.4.9, and I've read at least 20 different threads here and elesewhere already about this specific problem, but found no real usable answer that works.

MySQL 5.1.52 data dumped:

mysqldump -u root -p --compatible=postgresql --no-create-info --no-create-db --default-character-set=utf8 --skip-lock-tables rt3 > foo

PostgreSQL 8.4.9 server as destination

Loading the data with 'psql -U rt_user -f foo' is reporting (many of these, here's one example):

psql:foo:29: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

According the following, there are no NULL (0x00) characters in the input file.

database-dumps:rcf-temp1# sed 's/\x0/ /g' < foo > nonulls
database-dumps:rcf-temp1# sum foo nonulls
04730 2545610 foo
04730 2545610 nonulls
database-dumps:rcf-temp1# rm nonulls

Likewise, another check with Perl shows no NULLs:

database-dumps:rcf-temp1# perl -ne '/000円/ and print;' foo
database-dumps:rcf-temp1#

As the "HINT" in the error mentions, I have tried every possible way to set 'client_encoding' to 'UTF8', and I succeed but it has no effect toward solving my problem.

database-dumps:rcf-temp1# psql -U rt_user --variable=client_encoding=utf-8 -c "SHOW client_encoding;" rt3
 client_encoding
-----------------
 UTF8
(1 row)
database-dumps:rcf-temp1#

Perfect, yet:

database-dumps:rcf-temp1# psql -U rt_user -f foo --variable=client_encoding=utf-8 rt3
...
psql:foo:29: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
...

Barring the "According to Hoyle" correct answer, which would be fantastic to hear, and knowing that I really don't care about preserving any non-ASCII characters for this seldom-referenced data, what suggestions do you have?

Update: I get the same error with an ASCII-only version of the same dump file at import time. Truly mind-boggling:

database-dumps:rcf-temp1# # convert any non-ASCII character to a space
database-dumps:rcf-temp1# perl -i.bk -pe 's/[^[:ascii:]]/ /g;' mysql5-dump.sql
database-dumps:rcf-temp1# sum mysql5-dump.sql mysql5-dump.sql.bk
41053 2545611 mysql5-dump.sql
50145 2545611 mysql5-dump.sql.bk
database-dumps:rcf-temp1# cmp mysql5-dump.sql mysql5-dump.sql.bk
mysql5-dump.sql mysql5-dump.sql.bk differ: byte 1304850, line 30
database-dumps:rcf-temp1# # GOOD!
database-dumps:rcf-temp1# psql -U postgres -f mysql5-dump.sql --variable=client_encoding=utf-8 rt3
...
INSERT 0 416
psql:mysql5-dump.sql:30: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 455
INSERT 0 424
INSERT 0 483
INSERT 0 447
INSERT 0 503
psql:mysql5-dump.sql:36: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 502
INSERT 0 507
INSERT 0 318
INSERT 0 284
psql:mysql5-dump.sql:41: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 382
INSERT 0 419
INSERT 0 247
psql:mysql5-dump.sql:45: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 267
INSERT 0 348
^C

One of the tables in question is defined as:

 Table "public.attachments"
 Column | Type | Modifie
-----------------+-----------------------------+--------------------------------
 id | integer | not null default nextval('atta)
 transactionid | integer | not null
 parent | integer | not null default 0
 messageid | character varying(160) |
 subject | character varying(255) |
 filename | character varying(255) |
 contenttype | character varying(80) |
 contentencoding | character varying(80) |
 content | text |
 headers | text |
 creator | integer | not null default 0
 created | timestamp without time zone |
Indexes:
 "attachments_pkey" PRIMARY KEY, btree (id)
 "attachments1" btree (parent)
 "attachments2" btree (transactionid)
 "attachments3" btree (parent, transactionid)

I do not have the liberty to change the type for any part of the DB schema. Doing so would likely break future upgrades of the software, etc.

The likely problem column is 'content' of type 'text' (perhaps others in other tables as well). As I already know from previous research, PostgreSQL will not allow NULL in 'text' values. However, please see above where both sed and Perl show no NULL characters, and then further down where I strip all non-ASCII characters from the entire dump file but it still barfs.

asked Dec 27, 2011 at 22:57
4
  • 2
    What does line 29 of your dump file look like? Something like head -29 foo | tail -1 | cat -v might be of use. Commented Dec 27, 2011 at 23:29
  • What's the definition of the affected table and what does the offending line look like? Commented Dec 28, 2011 at 1:33
  • It's ~1MB of company data. I understand where you're headed of course, though. Here's the end of that line of thought (please pardon my french at the end of the gist/paste): gist.github.com/1525788 Commented Dec 28, 2011 at 2:02
  • tscho: As indicated, that example error line is one of hundreds of these errors. Commented Dec 28, 2011 at 2:02

4 Answers 4

4

One or more of those character/text fields MAY have 0x00 for its content.

Try the following:

SELECT * FROM rt3 where some_text_field = 0x00 LIMIT 1;

If this returns any single row then try updating those character/text fields with:

UPDATE rt3 SET some_text_field = '' WHERE some_text_field = 0x00;

Afterwards, try another MYSQLDUMP ... ( and PostgreSQL import method ).

answered Oct 10, 2013 at 21:31
1
  • 2
    This helped me find my stray null characters, though I needed to use colname LIKE concat('%', 0x00, '%'). Found them in fields containing serialized PHP arrays. Commented Oct 18, 2014 at 4:21
7

I had the same problem using MySQL version 5.0.51 and Postgres version 9.3.4.0. I solved the "invalid byte sequence for encoding "UTF8": 0x00" issue after seeing Daniel Vérité's comment that "mysqldump in postgresql mode will dump null bytes as 0円 in strings, so you probably want to search for that sequence of characters."

Sure enough a grep finally revealed the NULL chars.

grep \\\0円 dump.sql

I replaced the NULL chars using the following command

sed -i BAK 's/\0円//g' dump.sql

Postgres was then able to successfully load dump.sql

answered May 16, 2014 at 20:00
1
  • Won't this insert empty strings instead of actual NULL values? Commented Aug 19, 2022 at 19:10
4

You can get this error without any NULL byte or any non-ascii character in the file. Example in an utf8 database:

select E'ab0円cd';

will yield:

ERROR: invalid byte sequence for encoding "UTF8": 0x00 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

mysqldump in postgresql mode will dump null bytes as 0円 in strings, so you probably want to search for that sequence of characters.

answered Mar 14, 2012 at 1:33
0

I half remember a problem like this. I think I ended up migrating the schema then dumping the data as csv and loading the data from the csv file. I remember having to update the csv file (using unix tools like sed or unixtodos) or using open office calc (excell) to fix up some items that were errors on the import step - it could be as simple as opening and re-saving the file.

answered Jan 25, 2012 at 0:24

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.