If I have a text column name
with a collation, for example tr-TR
, and I run a query
SELECT * FROM t WHERE name LIKE 'a%'
then this will do a seq scan.
If I now create an index
CREATE INDEX ON t(name text_pattern_ops)
the above query will become a bitmap scan. But there is another technique that will achieve the same result:
CREATE INDEX ON t(name COLLATE "C")
Are those approaches completely equivalent or are there differences?
1 Answer 1
Note that you should also create an index with the default operator class if you want queries involving ordinary
<
,<=
,>
, or>=
comparisons to use an index. Such queries cannot use thexxx_pattern_ops
operator classes. (Ordinary equality comparisons can use these operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes. If you do use the C locale, you do not need thexxx_pattern_ops
operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.
So, the index with COLLATE "C"
does everything a text_pattern_ops
would do, plus more which the latter cannot. Like support this query:
SELECT * FROM t
WHERE name >= 'z' COLLATE "C";
COLLATE "C"
must be specified explicitly in the query to make the COLLATE "C"
index applicable (unless you operate with "C" collation to begin with, but then why the index?)
The above query requires the feature of "per-column collation support" (as does your index). The feature was added with Postgres 9.1. The operator class text_pattern_ops
is much older and made a lot of sense before Postgres 9.1.
Since Postgres 9.1 the COLLATE "C" variant is superior as it is more versatile. The only reason I could think of would be to intentionally not support additional use cases for some odd reason. The manual might actually add a deprecation notice for the operator classes xxx_pattern_ops
.