I am running a Postgresql database with docker-compose, using the image postgres:10-alpine
. For some reason, the database is unable to select certain rows based on string comparison. For example, this query works as expected:
peertube=# SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername"='user1';
id | preferredUsername
-------+-------------------
38793 | user1
(1 row)
However, the same query is not working for another user:
peertube=# SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername"='user2';
id | preferredUsername
----+-------------------
(0 rows)
The user definitely exists, and the query works if I use ILIKE
instead of =
:
peertube=# SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername" ILIKE 'user2';
id | preferredUsername
-------+-------------------
41576 | user2
(1 row)
What could be the reason for this wrong behaviour? I suspected it might have something to do with the encoding, but SHOW SERVER_ENCODING
and SHOW CLIENT_ENCODING
both show UTF8. I also tried to export the data and import it into a fresh database. That fixed the problem for a few days, but it came back after that.
I'm happy about any possible solutions or debugging ideas.
Edit: Some more queries, it works with trim() in the WHERE clause:
peertube=# SELECT "preferredUsername", "preferredUsername" FROM actor WHERE trim("preferredUsername")='mailab';
preferredUsername | btrim | md5
-------------------+--------+----------------------------------
mailab | mailab | 3d83ba6a9e5391c0c4d0253fbb2b01aa
However, this still seems weird because if there is whitespace, the hash of the field should be different. In fact, the md5 is the same:
peertube=# SELECT "preferredUsername", md5("preferredUsername"), md5('mailab') FROM actor WHERE trim("preferredUsername")='mailab';
preferredUsername | md5 | md5
-------------------+----------------------------------+----------------------------------
mailab | 3d83ba6a9e5391c0c4d0253fbb2b01aa | 3d83ba6a9e5391c0c4d0253fbb2b01aa
Edit 2: Length is the same with or without trim, so that seems to rule out the whitespace theory:
peertube=# SELECT "preferredUsername", length("preferredUsername"), length(trim("preferredUsername")) FROM actor WHERE trim("preferredUsername")='mailab';
preferredUsername | length | length
-------------------+--------+--------
mailab | 6 | 6
Edit 3: Details of the affected table: https://gitlab.com/snippets/1840320
Edit 4: Someone suggested the get_byte()
function to check that there are no unexpected utf8 characters. And in fact, every single character corresponds to an ascii value in the range 97-122.
1 Answer 1
I would check and see if there are any white spaces after the Value 'User2' in the database.
I would also look at the application inserting the data and if possible profile the data stream before insertion to determine if the application is incorrectly adding white spaces.
Alternatively I would look at the source of the 'User2' and ensure that the source does not contain white spaces.
-
Do you know any command that I can use to show whitespace in the database field? I am already talking with the application developer to debug this.Nutomic– Nutomic2019年03月29日 14:48:05 +00:00Commented Mar 29, 2019 at 14:48
-
1I'm not a PostgreSql expert but could you get the length of the data in the field and compare to the length of the trimmed data in the field? If there is a difference you have white space. I've also seen this with hidden control characters.armitage– armitage2019年03月29日 14:56:14 +00:00Commented Mar 29, 2019 at 14:56
-
Turns out length is exactly the number of visible characters, so that seems to rule out the whitespace theory (see my last edit).Nutomic– Nutomic2019年03月29日 15:06:28 +00:00Commented Mar 29, 2019 at 15:06
SET enable_indexscan to off
, does it give the same result? What is the collation of the column?