The impetus for my question is that I had hoped that PostgreSQL would behave consistently when selecting from citext
columns, regardless of whether or not the string to be matched is wrapped in one or more instances of lower()
(any such wrapping is beyond my control). That appears not to be the case. (Of course, it is entirely possible that my tests are invalid or I am misunderstanding fundamental concepts.)
Steps to Reproduce Testing Scenario
CREATE EXTENSION IF NOT EXISTS citext;
CREATE TABLE users (id int, email citext);
INSERT INTO users(id, email) VALUES
(1, '[email protected]');
Tests
As expected when using the citext
type, the lowercase variant yields a result:
# select * from users where email = '[email protected]';
id | email
----+------------------
1 | [email protected]
(1 row)
Changing the =
operator to like
yields a result:
select * from users where email like lower('[email protected]');
id | email
----+------------------
1 | [email protected]
(1 row)
As does the "inverse":
# select * from users where lower(email) = '[email protected]';
id | email
----+------------------
1 | [email protected]
(1 row)
As does wrapping both values in lower()
:
# select * from users where lower(email) = lower('[email protected]');
id | email
----+------------------
1 | [email protected]
(1 row)
My Question
Why then does the following query not return a result in this instance?
# select * from users where email = lower('[email protected]');
id | email
----+-------
(0 rows)
The manual says of the citext
type:
Essentially, it internally calls lower when comparing values.
The operative word seems to be "essentially"; this statement implies the following, which does yield a result:
select * from users where lower(email) = lower(lower('[email protected]'));
id | email
----+------------------
1 | [email protected]
(1 row)
Might this be related to the following caveat in the Limitations
section of the above-cited document?
citext's case-folding behavior depends on the LC_CTYPE setting of your database.
# SHOW LC_CTYPE;
lc_ctype
-------------
en_US.UTF-8
(1 row)
Any explanation in this regard is much appreciated.
1 Answer 1
tldr; when comparing case insensitive and sensitive things for equality, you have to be explicit. text
is explicitly case-sensitive; citext
is explicitly case-insensitive. You should provide a cast for both sides and be explicit
A few things about lower()
lower()
is typed- When it's argument are
text
, it always returnstext
A few other points
- When you do a comparison with a literal, the type isn't known (it's explicitly
unknown
internally). - Operators are functions.
- Functions coerce the types in PostgreSQL in runtime.
In this case types are as follows, with description
-- text = unknown
-- unknown promoted to text, this has nothing to do with citext
lower(email) = '[email protected]';
-- text = text
-- this has nothing to do with citext
lower(email) = lower('[email protected]');
-- text = text
-- this has nothing to do with citext
lower(email) = lower(lower('[email protected]'));
-- citext LIKE text
-- LIKE is smart `operator ~~(citext,text)` via `texticlike`
-- WORKS
email like lower('[email protected]');
-- citext = unknown
-- unknown promoted to citext, there is an `operator =(citext,citext)`
-- WORKS
email = '[email protected]';
-- citext = text
-- citext promoted to text, there is no `operator =(citext,text)`
-- FAILS
email = lower('[email protected]');
In summary, there is an operator =(citext,citext)
. So you can
email = lower('[email protected]')::citext;
If you want, or you can define your own operator that sets =
to the case insensitive route rather than the case sensitive route. I find that to be horrible practice though, I'll always cast.
-
1An excellent and thorough response! Thank you, Evan! I really appreciate your time and expertise.Ben Johnson– Ben Johnson2017年12月02日 16:54:31 +00:00Commented Dec 2, 2017 at 16:54