I have a database on my development machine where everything runs just fine.
I copied the database to the server using file system level backup. Most queries run fine and fast. But I got problems with one table.
If I query data and sort by pk "id", then it returns all rows without any issues. But if I sort by column "label" which is text type without empty cells, then I got an error:
SQL Error [XX000]: ERROR: could not compare Unicode strings: Invalid argument
I tried to compare LC_COLLATE, LC_CTYPE, LC_MESSAGES and LC_NUMERIC on both machines. LC_COLLATE
, LC_CTYPE
are the same, type en_US.utf8
, but LC_MESSAGES
and LC_NUMERIC
are en_US.utf8
on running well instance and C
on instance with issues. Then I've set last two to en_US.utf8
(SET lc_messages TO 'en_US.utf8';
) but this didn't help.
There are no special symbols in that table, just normal plain text ASCII symbols. I tried to delete all rows and added them again - no change.
I do not understand the source of the problem. Suspect localization issues.
Postgres version where I experience the issue: v.13.5 (windows x64). in the postgresql.conf locale settings are:
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'C' # locale for system error message
# strings
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
If I change "C" to "en_US.utf8" then the database fails to start.
EDIT:
Select query which returns without errors:
SELECT * FROM my_schema.my_table ORDER BY id ASC;
Query returning an error: SELECT * FROM my_schema.my_table ORDER BY label ASC;
label
is non-reserved key-word in postgresql according to docs.
Column id is primary key, type serial4, "label" column is of type text. No null values in both columns.
1 Answer 1
I answer my own question.
I was not acquainted with Collation concept. It's described here.
When I specify the collate in my query (add COLLATE "C"
) then it works.
Example of the query:
SELECT * FROM my_table ORDER BY label COLLATE "C" asc;
Obviously, the default collations are different on my two installations, therefore strings are not comparable, postgres engine doesn't know how to sort them.
So, one way is to specify collate when creating the table, or to specify it in a query when retrieving the data later on.
EDIT
I tried to backup and restore the database with pg_dumpall, pg_dump and pg_restore. Then my queries work without specifying collate (so, run the same on both machines).
But, there is another strange behavior. User identification is successful from the client running on old machine but fails when the client runs on the new machine. Master login (for user postgres) is accepted on both instances. The difference between postgres and other users is, that I've created postgres user with the same password when installing the database. Other users have been restored with pg_restore.
If I re-set the password (using pgadmin) then it's accepted from clients running on both machines.
-
Still, it would be interesting to know the cause. Could it be that these were different Windows versions with somehow incompatible ideas of Unicode? You should probably use
pg_dumpall
/psql
to transfer the data.Laurenz Albe– Laurenz Albe2021年11月16日 08:33:40 +00:00Commented Nov 16, 2021 at 8:33
en_US.utf8
which is a POSIX locale, not a Windows locale, you're probably in that case.