I've spent the last 8 hours trying to import the output of 'mysqldump --compatible=postgresql' into PostgreSQL 8.4.9, and I've read at least 20 different threads here and elesewhere already about this specific problem, but found no real usable answer that works.
MySQL 5.1.52 data dumped:
mysqldump -u root -p --compatible=postgresql --no-create-info --no-create-db --default-character-set=utf8 --skip-lock-tables rt3 > foo
PostgreSQL 8.4.9 server as destination
Loading the data with 'psql -U rt_user -f foo' is reporting (many of these, here's one example):
psql:foo:29: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
According the following, there are no NULL (0x00) characters in the input file.
database-dumps:rcf-temp1# sed 's/\x0/ /g' < foo > nonulls
database-dumps:rcf-temp1# sum foo nonulls
04730 2545610 foo
04730 2545610 nonulls
database-dumps:rcf-temp1# rm nonulls
Likewise, another check with Perl shows no NULLs:
database-dumps:rcf-temp1# perl -ne '/000円/ and print;' foo
database-dumps:rcf-temp1#
As the "HINT" in the error mentions, I have tried every possible way to set 'client_encoding' to 'UTF8', and I succeed but it has no effect toward solving my problem.
database-dumps:rcf-temp1# psql -U rt_user --variable=client_encoding=utf-8 -c "SHOW client_encoding;" rt3
client_encoding
-----------------
UTF8
(1 row)
database-dumps:rcf-temp1#
Perfect, yet:
database-dumps:rcf-temp1# psql -U rt_user -f foo --variable=client_encoding=utf-8 rt3
...
psql:foo:29: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
...
Barring the "According to Hoyle" correct answer, which would be fantastic to hear, and knowing that I really don't care about preserving any non-ASCII characters for this seldom-referenced data, what suggestions do you have?
Update: I get the same error with an ASCII-only version of the same dump file at import time. Truly mind-boggling:
database-dumps:rcf-temp1# # convert any non-ASCII character to a space
database-dumps:rcf-temp1# perl -i.bk -pe 's/[^[:ascii:]]/ /g;' mysql5-dump.sql
database-dumps:rcf-temp1# sum mysql5-dump.sql mysql5-dump.sql.bk
41053 2545611 mysql5-dump.sql
50145 2545611 mysql5-dump.sql.bk
database-dumps:rcf-temp1# cmp mysql5-dump.sql mysql5-dump.sql.bk
mysql5-dump.sql mysql5-dump.sql.bk differ: byte 1304850, line 30
database-dumps:rcf-temp1# # GOOD!
database-dumps:rcf-temp1# psql -U postgres -f mysql5-dump.sql --variable=client_encoding=utf-8 rt3
...
INSERT 0 416
psql:mysql5-dump.sql:30: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 455
INSERT 0 424
INSERT 0 483
INSERT 0 447
INSERT 0 503
psql:mysql5-dump.sql:36: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 502
INSERT 0 507
INSERT 0 318
INSERT 0 284
psql:mysql5-dump.sql:41: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 382
INSERT 0 419
INSERT 0 247
psql:mysql5-dump.sql:45: ERROR: invalid byte sequence for encoding "UTF8": 0x00
HINT: This error can also happen if the byte sequence does not match the encod.
INSERT 0 267
INSERT 0 348
^C
One of the tables in question is defined as:
Table "public.attachments"
Column | Type | Modifie
-----------------+-----------------------------+--------------------------------
id | integer | not null default nextval('atta)
transactionid | integer | not null
parent | integer | not null default 0
messageid | character varying(160) |
subject | character varying(255) |
filename | character varying(255) |
contenttype | character varying(80) |
contentencoding | character varying(80) |
content | text |
headers | text |
creator | integer | not null default 0
created | timestamp without time zone |
Indexes:
"attachments_pkey" PRIMARY KEY, btree (id)
"attachments1" btree (parent)
"attachments2" btree (transactionid)
"attachments3" btree (parent, transactionid)
I do not have the liberty to change the type for any part of the DB schema. Doing so would likely break future upgrades of the software, etc.
The likely problem column is 'content' of type 'text' (perhaps others in other tables as well). As I already know from previous research, PostgreSQL will not allow NULL in 'text' values. However, please see above where both sed and Perl show no NULL characters, and then further down where I strip all non-ASCII characters from the entire dump file but it still barfs.
4 Answers 4
One or more of those character/text fields MAY have 0x00 for its content.
Try the following:
SELECT * FROM rt3 where some_text_field = 0x00 LIMIT 1;
If this returns any single row then try updating those character/text fields with:
UPDATE rt3 SET some_text_field = '' WHERE some_text_field = 0x00;
Afterwards, try another MYSQLDUMP ... ( and PostgreSQL import method ).
-
2This helped me find my stray null characters, though I needed to use
colname LIKE concat('%', 0x00, '%')
. Found them in fields containing serialized PHP arrays.cimmanon– cimmanon2014年10月18日 04:21:31 +00:00Commented Oct 18, 2014 at 4:21
I had the same problem using MySQL version 5.0.51 and Postgres version 9.3.4.0. I solved the "invalid byte sequence for encoding "UTF8": 0x00" issue after seeing Daniel Vérité's comment that "mysqldump in postgresql mode will dump null bytes as 0円 in strings, so you probably want to search for that sequence of characters."
Sure enough a grep finally revealed the NULL chars.
grep \\\0円 dump.sql
I replaced the NULL chars using the following command
sed -i BAK 's/\0円//g' dump.sql
Postgres was then able to successfully load dump.sql
-
Won't this insert empty strings instead of actual NULL values?Jeff– Jeff2022年08月19日 19:10:44 +00:00Commented Aug 19, 2022 at 19:10
You can get this error without any NULL byte or any non-ascii character in the file. Example in an utf8 database:
select E'ab0円cd';
will yield:
ERROR: invalid byte sequence for encoding "UTF8": 0x00 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
mysqldump in postgresql mode will dump null bytes as 0円 in strings, so you probably want to search for that sequence of characters.
I half remember a problem like this. I think I ended up migrating the schema then dumping the data as csv and loading the data from the csv file. I remember having to update the csv file (using unix tools like sed or unixtodos) or using open office calc (excell) to fix up some items that were errors on the import step - it could be as simple as opening and re-saving the file.
head -29 foo | tail -1 | cat -v
might be of use.