Postgres ERROR: could not compare Unicode strings

Question 1

I have a database on my development machine where everything runs just fine.

I copied the database to the server using file system level backup. Most queries run fine and fast. But I got problems with one table.

If I query data and sort by pk "id", then it returns all rows without any issues. But if I sort by column "label" which is text type without empty cells, then I got an error:

SQL Error [XX000]: ERROR: could not compare Unicode strings: Invalid argument

I tried to compare LC_COLLATE, LC_CTYPE, LC_MESSAGES and LC_NUMERIC on both machines. LC_COLLATE, LC_CTYPE are the same, type en_US.utf8, but LC_MESSAGES and LC_NUMERIC are en_US.utf8 on running well instance and C on instance with issues. Then I've set last two to en_US.utf8 (SET lc_messages TO 'en_US.utf8';) but this didn't help.

There are no special symbols in that table, just normal plain text ASCII symbols. I tried to delete all rows and added them again - no change.

I do not understand the source of the problem. Suspect localization issues.

Postgres version where I experience the issue: v.13.5 (windows x64). in the postgresql.conf locale settings are:

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'C' # locale for system error message
 # strings
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting

If I change "C" to "en_US.utf8" then the database fails to start.

EDIT:

Select query which returns without errors: SELECT * FROM my_schema.my_table ORDER BY id ASC;

Query returning an error: SELECT * FROM my_schema.my_table ORDER BY label ASC;

label is non-reserved key-word in postgresql according to docs. Column id is primary key, type serial4, "label" column is of type text. No null values in both columns.

Question 2

Please, add your SELECT command.

Question 3

the only place with such an error message, windows-only code path: github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/… I don't work with this exotic. But I would like to clarify two points: postgresql version 13.5.1 does not exists. Are you using postgresql or some fork? The second thing - how exactly did you copy the database?

Question 4

sorry, it's 13.5, just checked in a cli. 13.5.1 I took from installation package name "postgresql-13.5-1-windows-x64.exe", but this package I've got from official page, not a fork. I copied data files from my PC to the server (packed-unpacked with .zip archive), then installed posgres and during installation pointed on this data folder. And I had to modify lc_messages, lc_monetary, lc_numeric and lc_time from 'en_US.utf8' to 'C' in conf otherwise postgres fails to start. I was thinking to do dump-restore instead of copying files but the size is quite big (500GB) and it'll take days just to try

Question 5

by the way, host OS is windows server 2019 (I'm unhappy with this, would like to have linux, but can't influence this)

Question 6

If the development host does not have exactly the same OS as the target host, transfering the Postgres data files leads to a corrupt database on the target. Since you mention en_US.utf8 which is a POSIX locale, not a Windows locale, you're probably in that case.

Question 7

I answer my own question.

I was not acquainted with Collation concept. It's described here.

When I specify the collate in my query (add COLLATE "C") then it works.

Example of the query:

SELECT * FROM my_table ORDER BY label COLLATE "C" asc;

Obviously, the default collations are different on my two installations, therefore strings are not comparable, postgres engine doesn't know how to sort them.

So, one way is to specify collate when creating the table, or to specify it in a query when retrieving the data later on.

EDIT

I tried to backup and restore the database with pg_dumpall, pg_dump and pg_restore. Then my queries work without specifying collate (so, run the same on both machines).

But, there is another strange behavior. User identification is successful from the client running on old machine but fails when the client runs on the new machine. Master login (for user postgres) is accepted on both instances. The difference between postgres and other users is, that I've created postgres user with the same password when installing the database. Other users have been restored with pg_restore.

If I re-set the password (using pgadmin) then it's accepted from clients running on both machines.

Question 8

Still, it would be interesting to know the cause. Could it be that these were different Windows versions with somehow incompatible ideas of Unicode? You should probably use pg_dumpall/psql to transfer the data.

Ostap Ostap 415 bronze badges · Answer 1 · 2021-11-15 17:53:40Z

I answer my own question.

I was not acquainted with Collation concept. It's described here.

When I specify the collate in my query (add COLLATE "C") then it works.

Example of the query:

SELECT * FROM my_table ORDER BY label COLLATE "C" asc;

Obviously, the default collations are different on my two installations, therefore strings are not comparable, postgres engine doesn't know how to sort them.

So, one way is to specify collate when creating the table, or to specify it in a query when retrieving the data later on.

EDIT

I tried to backup and restore the database with pg_dumpall, pg_dump and pg_restore. Then my queries work without specifying collate (so, run the same on both machines).

But, there is another strange behavior. User identification is successful from the client running on old machine but fails when the client runs on the new machine. Master login (for user postgres) is accepted on both instances. The difference between postgres and other users is, that I've created postgres user with the same password when installing the database. Other users have been restored with pg_restore.

If I re-set the password (using pgadmin) then it's accepted from clients running on both machines.

Still, it would be interesting to know the cause. Could it be that these were different Windows versions with somehow incompatible ideas of Unicode? You should probably use pg_dumpall/psql to transfer the data.

Stack Exchange Network

Postgres ERROR: could not compare Unicode strings

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Postgres ERROR: could not compare Unicode strings

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions