I'm trying to create an index that will support queries that use my custom operator. This is on PostgreSQL 10.4.
The custom operator
I followed the tips in this SO answer to create an operator that performs "LIKE" style matching on elements in an text ARRAY.
CREATE FUNCTION reverse_like (text, text) returns boolean language sql
as $$ select 2ドル like 1ドル $$;
CREATE OPERATOR <~~ ( function =reverse_like, leftarg = text, rightarg=text );
The above operator allows me to do things like
SELECT 'ab%' <~~ ANY('{"abc","def"}');
The schema, index and query
I have a table with web traffic visits called sessions
which includes an array column.
CREATE TABLE sessions
(
session_id varchar(24) NOT NULL,
first_seen timestamp,
domains varchar[]
);
To query the domains column to see if a given domain (or partial/ wildcarded domain name) was visited I can do the following:
SELECT count(*)
FROM session_4070ba14_f081_41cb_9ef7_9dd385934da7
WHERE 'www.foo%' <~~ ANY(domains);
I want to speed up the above queries with GIN index. So I created the index as follows:
CREATE INDEX idx_domains ON session USING GIN(domains);
The Question
After running analyze on the table and a set enable_seqscan = false;
I have no luck getting Postgres to employ this index. It's always doing a seqscan. It uses the above index of array operators like @>
but not for my custom <~~
operator.
I think its because the GIN index doesn't know how to handle my custom operator - so do I need to create an operator class and then create my index using that? Or do I create a functional index?
-
Are you looking for trigram support, or just prefix matching?jjanes– jjanes2019年09月06日 01:27:46 +00:00Commented Sep 6, 2019 at 1:27
3 Answers 3
For trigram support, you can try the parray_gin extension
WHERE domains @@> ARRAY['www.foo%'];
If you just want to do prefix matching (more efficiently than that provided by trigram), I don't think there is any way you can do that without writing some C code to glue the pieces together. I think you would then work on the array type directly, so wouldn't need the ANY, and so wouldn't benefit from the reverse_like operator at all.
You won't be able to index an expression like this at all:
<constant> <operator> ANY(<array column>)
Your only chance would be to define an operator such that your expression looks like:
<array column> <operator> <constant>
But writing a GIN operator class means writing an extension in C, and I don't think you want to go that far.
The easy solution would be to change your data model so that you don't use arrays for things like that.
Turns out I over complicated this by thinking about a GIN index. A b-tree index on the whole array works fine and supports the custom <~~ operator.
CREATE INDEX IF NOT EXISTS idx_domains2 ON session(domains );
select count(*)
from session
where 'www.foo%' <~~ ANY(domains);
Finalize Aggregate (cost=331523.11..331523.12 rows=1 width=8)
-> Gather (cost=331522.90..331523.11 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=330522.90..330522.91 rows=1 width=8)
-> Parallel Index Only Scan using idx_domains2 on session (cost=0.42..330200.52 rows=128952 width=0)
Filter: ('www.foo%'::text <~~ ANY ((domains)::text[]))
-
This is remarkable, but also misleading. The btree index is used because it's smaller than the table, so the index-only scan is faster than a sequential scan on the table. The performance gain is limited, though, and diminished if index-only scans are not possible. And the index is not used in its capacity as index. A non-selective condition like
WHERE domains IS NOT NULL
will be faster (still "using" the index) than a very selective condition likeWHERE 'www.foo12345%' <~~ ANY(domains)
! IOW, the index is barely supporting your custom operator as requested.Erwin Brandstetter– Erwin Brandstetter2021年02月25日 14:38:19 +00:00Commented Feb 25, 2021 at 14:38