- OS: Windows 10
- Docker OS: Alpine Linux 3.18.3
- DB: PostgreSQL 15
I created a new PostgreSQL database named test1 with zh-x-icu as the default collation using the following command:
postgres=> create database test1 with encoding 'UTF8' owner db_owner lc_collate 'zh-x-icu' lc_ctype 'zh-x-icu' template template0;
When I list the databases, the datcollate and datctype columns show the expected values:
datname | datdba | encoding | datcollate | datctype |
---|---|---|---|---|
template1 | 10 | 6 | en_US.utf8 | en_US.utf8 |
template0 | 10 | 6 | en_US.utf8 | en_US.utf8 |
postgres | 10 | 6 | en_US.utf8 | en_US.utf8 |
test1 | 43170 | 6 | zh-x-icu | zh-x-icu |
However, when I run queries in this database, the sorting behavior doesn't align with the default collation zh-x-icu. Here are some examples:
query | result |
---|---|
SELECT * FROM (VALUES ('张'), ('李'), ('王'), ('赵'), ('刘')) AS names(name) ORDER BY name; | 刘张李王赵 |
SELECT * FROM (VALUES ('张'), ('李'), ('王'), ('赵'), ('刘')) AS names(name) ORDER BY name collate "en-x-icu"; | 刘张李王赵 |
SELECT * FROM (VALUES ('张'), ('李'), ('王'), ('赵'), ('刘')) AS names(name) ORDER BY name collate "zh-x-icu"; | 李刘王张赵 |
As you can see, the query without explicit collation returns the same result as the one with 'en-x-icu', which is not what I expected. I expected it to behave like the 'zh-x-icu' query since that's the default collation for the database.
Why is this happening, and how can I make the default collation apply to queries in this database?
Here is the docker-compose:
version: '3'
services:
timescaledb:
image: timescale/timescaledb:latest-pg15
hostname: d16f59bcb903
mac_address: 02:42:ac:11:00:02
environment:
- PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
- LANG=en_US.utf8
- PG_MAJOR=15
- PG_VERSION=15.4
- PG_SHA256=baec5a4bdc4437336653b6cb5d9ed89be5bd5c0c58b94e0becee0a999e63c8f9
- DOCKER_PG_LLVM_DEPS=llvm15-dev clang15
- PGDATA=/var/lib/postgresql/data
volumes:
- H:\DB_PG15:/var/lib/postgresql/data
- /var/lib/postgresql/data
ports:
- "5432:5432"
restart: "no"
labels:
maintainer: "Timescale https://www.timescale.com"
runtime: runc
-
Can you share the docker-compose? With a bare-bones one, I don't get that locale existing.jjanes– jjanes2023年10月16日 13:41:24 +00:00Commented Oct 16, 2023 at 13:41
-
I've updated the question to include the docker-compose.David H. J.– David H. J.2023年10月16日 18:57:54 +00:00Commented Oct 16, 2023 at 18:57
2 Answers 2
LC_COLLATE
and LC_CTYPE
(or shorter: LOCALE
) are for C library locales only.
The reason that your statement does not cause an error is probably because Alpine Linux uses musl as its C library, whose locales have ICU-like names. But the last time I looked, musl's collations didn't work as they should, which probably explains what you see.
Use a Linux distribution with a working C library, or use PostgreSQL's support for the ICU library (if that is available on Alpine Linux, and your PostgreSQL is built with ICU support):
CREATE DATABASE test1
OWNER db_owner
ENCODING UTF8
LOCALE_PROVIDER icu
ICU_LOCALE zh
TEMPLATE template0;
-
If it were just a matter of musl being broken, wouldn't it be broken in the same way between a db's default collation and an explicitly specified collation?jjanes– jjanes2023年10月17日 21:22:50 +00:00Commented Oct 17, 2023 at 21:22
-
@jjanes I don't think so. The names of musl locales just are the same as the names of ICU collations. If you build PostgreSQL with ICU support, it will link with
libicu.so
, which is a different library that will provide working collations, even if the collation names are the same.Laurenz Albe– Laurenz Albe2023年10月18日 06:33:33 +00:00Commented Oct 18, 2023 at 6:33
Thanks to Laurenz Albe's answer and Jezk's answer, I managed to solve the issue with collation for Chinese characters in PostgreSQL running in a Docker container. Here's how:
Step 1: Create a docker-compose.yml File
version: '3'
services:
timescaledb:
container_name: timescaledb-pg15-zh
image: timescale/timescaledb:latest-pg15
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_INITDB_ARGS: "--locale-provider=icu --icu-locale=zh_CN"
ports:
- "5432:5432"
volumes:
- "<your_data_directory>:/var/lib/postgresql/data"
Step 2: Run the Docker Compose Command Execute the following command in the terminal:
docker-compose -f "<path_to_your_docker-compose_file>" up -d
After doing this, the collation settings for Chinese characters behaved as expected.