SQL select statements not matching on certain rows

Question 1

I am running a Postgresql database with docker-compose, using the image postgres:10-alpine. For some reason, the database is unable to select certain rows based on string comparison. For example, this query works as expected:

peertube=# SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername"='user1';
 id | preferredUsername 
-------+-------------------
 38793 | user1 
(1 row)

However, the same query is not working for another user:

peertube=# SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername"='user2';
 id | preferredUsername 
----+-------------------
(0 rows)

The user definitely exists, and the query works if I use ILIKE instead of =:

peertube=# SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername" ILIKE 'user2'; 
 id | preferredUsername 
-------+-------------------
 41576 | user2
(1 row)

What could be the reason for this wrong behaviour? I suspected it might have something to do with the encoding, but SHOW SERVER_ENCODING and SHOW CLIENT_ENCODING both show UTF8. I also tried to export the data and import it into a fresh database. That fixed the problem for a few days, but it came back after that.

I'm happy about any possible solutions or debugging ideas.

Edit: Some more queries, it works with trim() in the WHERE clause:

peertube=# SELECT "preferredUsername", "preferredUsername" FROM actor WHERE trim("preferredUsername")='mailab';
 preferredUsername | btrim | md5
-------------------+--------+----------------------------------
 mailab | mailab | 3d83ba6a9e5391c0c4d0253fbb2b01aa

However, this still seems weird because if there is whitespace, the hash of the field should be different. In fact, the md5 is the same:

peertube=# SELECT "preferredUsername", md5("preferredUsername"), md5('mailab') FROM actor WHERE trim("preferredUsername")='mailab';
 preferredUsername | md5 | md5
-------------------+----------------------------------+----------------------------------
 mailab | 3d83ba6a9e5391c0c4d0253fbb2b01aa | 3d83ba6a9e5391c0c4d0253fbb2b01aa

Edit 2: Length is the same with or without trim, so that seems to rule out the whitespace theory:

peertube=# SELECT "preferredUsername", length("preferredUsername"), length(trim("preferredUsername")) FROM actor WHERE trim("preferredUsername")='mailab';
 preferredUsername | length | length
-------------------+--------+--------
 mailab | 6 | 6

Edit 3: Details of the affected table: https://gitlab.com/snippets/1840320

Edit 4: Someone suggested the get_byte() function to check that there are no unexpected utf8 characters. And in fact, every single character corresponds to an ascii value in the range 97-122.

Question 2

Any white spaces after the user2 value in the table ? Have you tried trimming the value in the table ?

Question 3

Could be a corrupted index due to binary-migrating the index across systems with incompatible libc collations. The weird thing is you fix it by a dump-reload and the problem comes back after a few days?

Question 4

@armitage Good idea, the query is working with that. But it is still weird because the hashes are the same (see my edit).

Question 5

@DanielVérité Exactly, I dumped to plain text, and after that everything was fine. It is possible that this is some kind of application bug.

Question 6

Questions about the corrupted index theory: once it happens, is it always reproducible? Is the column indexed? If the failing query is run with SET enable_indexscan to off, does it give the same result? What is the collation of the column?

Question 7

I would check and see if there are any white spaces after the Value 'User2' in the database.

I would also look at the application inserting the data and if possible profile the data stream before insertion to determine if the application is incorrectly adding white spaces.

Alternatively I would look at the source of the 'User2' and ensure that the source does not contain white spaces.

Question 8

Do you know any command that I can use to show whitespace in the database field? I am already talking with the application developer to debug this.

Question 9

I'm not a PostgreSql expert but could you get the length of the data in the field and compare to the length of the trimmed data in the field? If there is a difference you have white space. I've also seen this with hidden control characters.

Question 10

Turns out length is exactly the number of visible characters, so that seems to rule out the whitespace theory (see my last edit).

armitage armitage 1,4292 gold badges14 silver badges20 bronze badges · Answer 1 · 2019-03-29 14:34:40Z

1

I would check and see if there are any white spaces after the Value 'User2' in the database.

I would also look at the application inserting the data and if possible profile the data stream before insertion to determine if the application is incorrectly adding white spaces.

Alternatively I would look at the source of the 'User2' and ensure that the source does not contain white spaces.

Share

Improve this answer

answered Mar 29, 2019 at 14:34

armitage's user avatar

armitage armitage

1,4292 gold badges14 silver badges20 bronze badges

3

Do you know any command that I can use to show whitespace in the database field? I am already talking with the application developer to debug this.

Nutomic
– Nutomic

2019年03月29日 14:48:05 +00:00
Commented Mar 29, 2019 at 14:48
1

I'm not a PostgreSql expert but could you get the length of the data in the field and compare to the length of the trimmed data in the field? If there is a difference you have white space. I've also seen this with hidden control characters.

armitage
– armitage

2019年03月29日 14:56:14 +00:00
Commented Mar 29, 2019 at 14:56
Turns out length is exactly the number of visible characters, so that seems to rule out the whitespace theory (see my last edit).

Nutomic
– Nutomic

2019年03月29日 15:06:28 +00:00
Commented Mar 29, 2019 at 15:06

Add a comment |

Stack Exchange Network

SQL select statements not matching on certain rows

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQL select statements not matching on certain rows

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions