PostgreSQL Operator Class for index support of custom operator

Question 1

I'm trying to create an index that will support queries that use my custom operator. This is on PostgreSQL 10.4.

The custom operator

I followed the tips in this SO answer to create an operator that performs "LIKE" style matching on elements in an text ARRAY.

CREATE FUNCTION reverse_like (text, text) returns boolean language sql 
as $$ select 2ドル like 1ドル $$;
 
CREATE OPERATOR <~~ ( function =reverse_like, leftarg = text, rightarg=text );

The above operator allows me to do things like

SELECT 'ab%' <~~ ANY('{"abc","def"}');

The schema, index and query

I have a table with web traffic visits called sessions which includes an array column.

CREATE TABLE sessions
(
 session_id varchar(24) NOT NULL,
 first_seen timestamp,
 domains varchar[]
);

To query the domains column to see if a given domain (or partial/ wildcarded domain name) was visited I can do the following:

SELECT count(*)
FROM session_4070ba14_f081_41cb_9ef7_9dd385934da7
WHERE 'www.foo%' <~~ ANY(domains);

I want to speed up the above queries with GIN index. So I created the index as follows:

CREATE INDEX idx_domains ON session USING GIN(domains);

The Question

After running analyze on the table and a set enable_seqscan = false; I have no luck getting Postgres to employ this index. It's always doing a seqscan. It uses the above index of array operators like @> but not for my custom <~~ operator.

I think its because the GIN index doesn't know how to handle my custom operator - so do I need to create an operator class and then create my index using that? Or do I create a functional index?

Question 2

Are you looking for trigram support, or just prefix matching?

Question 3

For trigram support, you can try the parray_gin extension

WHERE domains @@> ARRAY['www.foo%'];

If you just want to do prefix matching (more efficiently than that provided by trigram), I don't think there is any way you can do that without writing some C code to glue the pieces together. I think you would then work on the array type directly, so wouldn't need the ANY, and so wouldn't benefit from the reverse_like operator at all.

Question 4

You won't be able to index an expression like this at all:

<constant> <operator> ANY(<array column>)

Your only chance would be to define an operator such that your expression looks like:

<array column> <operator> <constant>

But writing a GIN operator class means writing an extension in C, and I don't think you want to go that far.

The easy solution would be to change your data model so that you don't use arrays for things like that.

Question 5

Turns out I over complicated this by thinking about a GIN index. A b-tree index on the whole array works fine and supports the custom <~~ operator.


CREATE INDEX IF NOT EXISTS idx_domains2 ON session(domains );
select count(*)
from session
where 'www.foo%' <~~ ANY(domains);

Finalize Aggregate (cost=331523.11..331523.12 rows=1 width=8)
 -> Gather (cost=331522.90..331523.11 rows=2 width=8)
 Workers Planned: 2
 -> Partial Aggregate (cost=330522.90..330522.91 rows=1 width=8)
 -> Parallel Index Only Scan using idx_domains2 on session (cost=0.42..330200.52 rows=128952 width=0)
 Filter: ('www.foo%'::text <~~ ANY ((domains)::text[]))

Question 6

This is remarkable, but also misleading. The btree index is used because it's smaller than the table, so the index-only scan is faster than a sequential scan on the table. The performance gain is limited, though, and diminished if index-only scans are not possible. And the index is not used in its capacity as index. A non-selective condition like WHERE domains IS NOT NULL will be faster (still "using" the index) than a very selective condition like WHERE 'www.foo12345%' <~~ ANY(domains)! IOW, the index is barely supporting your custom operator as requested.

jjanes jjanes 42.5k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2019-09-06 01:58:46Z

For trigram support, you can try the parray_gin extension

WHERE domains @@> ARRAY['www.foo%'];

If you just want to do prefix matching (more efficiently than that provided by trigram), I don't think there is any way you can do that without writing some C code to glue the pieces together. I think you would then work on the array type directly, so wouldn't need the ANY, and so wouldn't benefit from the reverse_like operator at all.

Laurenz Albe Laurenz Albe 62.1k4 gold badges57 silver badges93 bronze badges · Answer 2 · 2019-09-06 01:33:32Z

You won't be able to index an expression like this at all:

<constant> <operator> ANY(<array column>)

Your only chance would be to define an operator such that your expression looks like:

<array column> <operator> <constant>

But writing a GIN operator class means writing an extension in C, and I don't think you want to go that far.

The easy solution would be to change your data model so that you don't use arrays for things like that.

maxTrialfire maxTrialfire 1,1944 gold badges11 silver badges23 bronze badges · Answer 3 · 2019-09-09 17:40:34Z

Turns out I over complicated this by thinking about a GIN index. A b-tree index on the whole array works fine and supports the custom <~~ operator.


CREATE INDEX IF NOT EXISTS idx_domains2 ON session(domains );
select count(*)
from session
where 'www.foo%' <~~ ANY(domains);

Finalize Aggregate (cost=331523.11..331523.12 rows=1 width=8)
 -> Gather (cost=331522.90..331523.11 rows=2 width=8)
 Workers Planned: 2
 -> Partial Aggregate (cost=330522.90..330522.91 rows=1 width=8)
 -> Parallel Index Only Scan using idx_domains2 on session (cost=0.42..330200.52 rows=128952 width=0)
 Filter: ('www.foo%'::text <~~ ANY ((domains)::text[]))

This is remarkable, but also misleading. The btree index is used because it's smaller than the table, so the index-only scan is faster than a sequential scan on the table. The performance gain is limited, though, and diminished if index-only scans are not possible. And the index is not used in its capacity as index. A non-selective condition like WHERE domains IS NOT NULL will be faster (still "using" the index) than a very selective condition like WHERE 'www.foo12345%' <~~ ANY(domains)! IOW, the index is barely supporting your custom operator as requested.

Stack Exchange Network

PostgreSQL Operator Class for index support of custom operator

The custom operator

The schema, index and query

The Question

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

PostgreSQL Operator Class for index support of custom operator

The custom operator

The schema, index and query

The Question

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions