Steps to reproduce
Create database
CREATE DATABASE citiesdb
WITH OWNER = citiesowner
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'C'
LC_CTYPE = 'C'
CONNECTION LIMIT = -1;
After creating database you can just run code from sql fiddle from the answer by Erwin https://dba.stackexchange.com/a/63202/37108 (http://sqlfiddle.com/#!12/270e2/1) or read additional info at the end of the question.
Run LIKE query with only ASCII characters
EXPLAIN ANALYZE SELECT * FROM city WHERE other_names_lower like '%ele%';
"Bitmap Heap Scan on city (cost=16.10..64.02 rows=13 width=147) (actual time=0.642..3.303 rows=513 loops=1)"
" Recheck Cond: (other_names_lower ~~ '%ele%'::text)"
" -> Bitmap Index Scan on other_names_lower_trgm_gin (cost=0.00..16.10 rows=13 width=0) (actual time=0.486..0.486 rows=513 loops=1)"
" Index Cond: (other_names_lower ~~ '%ele%'::text)"
"Total runtime: 3.439 ms"
Run LIKE query with non-ASCII characters
explain analyze SELECT * FROM city WHERE (other_names_lower like '%желез%')
"Seq Scan on city (cost=0.00..1693.53 rows=5 width=134) (actual time=33.498..58.688 rows=9 loops=1)"
" Filter: (other_names_lower ~~ '%желез%'::text)"
" Rows Removed by Filter: 46673"
"Total runtime: 58.753 ms"
Question
When searching for non-ascii text the engine is using sequential scan instead of GIN trigram index. Why is it doing that and what are alternative ways to construct the index, query or database to speed up the lookup?
Additional info
PostgreSQL 9.2; Windows 8 64-bit.
Part of table definition ([...]
are other columns).
CREATE TABLE city ([...] other_names_lower text [...]) WITH ( OIDS=FALSE );
Column other_names_lower contains different names for cities. Rows contain Chinese, Polish, Russian and other character ranges.
Index creation code
CREATE EXTENSION pg_trgm;
CREATE INDEX other_names_lower_trgm_gin
ON city
USING gin
(other_names_lower gin_trgm_ops);
Other settings - query suggested by Daniel Vérité in the comment
select name, source, setting from pg_settings where source <> 'default' and source <> 'override';
"application_name";"client";"pgAdmin III - Narz??dzie Zapytania"
"bytea_output";"session";"escape"
"client_encoding";"session";"UNICODE"
"client_min_messages";"session";"notice"
"DateStyle";"session";"ISO, YMD"
"default_text_search_config";"configuration file";"pg_catalog.simple"
"enable_seqscan";"session";"on"
"lc_messages";"configuration file";"en_US.UTF-8"
"lc_monetary";"configuration file";"Polish_Poland.1250"
"lc_numeric";"configuration file";"Polish_Poland.1250"
"lc_time";"configuration file";"Polish_Poland.1250"
"listen_addresses";"configuration file";"*"
"log_destination";"configuration file";"stderr"
"log_line_prefix";"configuration file";"%t "
"log_statement";"configuration file";"all"
"log_timezone";"configuration file";"Europe/Sarajevo"
"logging_collector";"configuration file";"on"
"max_connections";"configuration file";"100"
"max_stack_depth";"environment variable";"2048"
"port";"configuration file";"5432"
"shared_buffers";"configuration file";"4096"
"TimeZone";"configuration file";"Europe/Sarajevo"
1 Answer 1
I created your scenario with COLLATE = "C"
, and both queries use a bitmap index scan index on other_names_lower_trgm_gin
as expected.
SQL Fiddle with a table of ~ 10k rows, Postgres 9.2.4, COLLATE = "C"
.
There is probably something wrong in your setup that is not in your question.
Run (takes some time for big tables and an exclusive lock!):
VACUUM FULL ANALYZE city;
And try again ...
-
VACUUM took 7250 ms (only ~50k rows); no problem with the lock, it's a dev machine; still seq scan though with the non-ascii version; it's nice to see that the non-ascii index search can work (because the fiddle works)user44– user442014年04月15日 18:43:48 +00:00Commented Apr 15, 2014 at 18:43
-
I executed all DDL from the fiddle (only changed table and index name) on my database; result: still index scan with
'%ele%'
(2.4 ms) and seq scan with'%желез%'
(67.9 ms) - so the problem seems to be with the database, not with table or index - so I'll add database creation code to the questionuser44– user442014年04月15日 19:02:17 +00:00Commented Apr 15, 2014 at 19:02 -
See if you have non-standard cost settings with
select name,source,setting from pg_settings where source<>'default' and source<>'override';
Daniel Vérité– Daniel Vérité2014年04月16日 13:55:57 +00:00Commented Apr 16, 2014 at 13:55 -
Added @DanielVérité 's query result at the end of the questionuser44– user442014年04月16日 20:58:59 +00:00Commented Apr 16, 2014 at 20:58
-
@user44 I'm confused, you've marked this as the answer but in the comment above you've said it didn't solve the non-ascii use-case. Can you clarify how you solved this?wilsonpage– wilsonpage2020年03月20日 10:35:48 +00:00Commented Mar 20, 2020 at 10:35
Explore related questions
See similar questions with these tags.
\d city
inpsql
. And the setting forLC_COLLATE
is also the default, i.e. theCOLLATION
for the index?edit
above.COLLATE pg_catalog."default"
when creating index, I guess it is just a thing added implicitly by pgAdmin or the engine...