How to get index scan for OR'ed time-range predicates?

Question 1

I have table events with fields:

id
user_id
time_start
time_end
...

And have b-tree index on (time_start, time_end).

SELECT user_id
FROM events
WHERE ((time_start <= '2021-08-24T15:30:00+00:00' AND time_end >= '2021-08-24T15:30:00+00:00') OR
 (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id);

Group (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1)
 Group Key: user_id
 Buffers: shared hit=224848
 -> Gather Merge (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=224848
 -> Sort (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3)
 Sort Key: user_id
 Sort Method: quicksort Memory: 27kB
 Worker 0: Sort Method: quicksort Memory: 27kB
 Worker 1: Sort Method: quicksort Memory: 28kB
 Buffers: shared hit=224848
 -> Partial HashAggregate (cost=242668.62..242679.65 rows=1103 width=4) (actual time=184.065..184.085 rows=59 loops=3)
 Group Key: user_id
 Buffers: shared hit=224834
 -> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3)
 Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))
 Rows Removed by Filter: 708728
 Buffers: shared hit=224834
Planning Time: 0.169 ms
Execution Time: 188.294 ms

Postgres uses Seq Scan with filter:

Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))

But when I leave one condition for time_start and time_end it starts to use Index Scan.

How can I change condition to make Postgres use Index Scan over Seq Scan?

I don't want to use UNION like:

SELECT user_id
FROM events
WHERE (
 (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id)
UNION (SELECT user_id
 FROM events
 WHERE (
 (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
 GROUP BY user_id

Question 2

Please edit your question and add the complete execution plan generated using explain (analyze, buffers) - not just a "simple" explain.

Question 3

Please start any performance question by disclosing your version of Postgres.

Question 4

Expression index

A GiST or (even better) SP-GiST expression index on an inclusive timestamp range should work wonders.

CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));

Rewrite your query with the "range contains" operator @> and match the indexed expression (exactly equivalent to your original):

SELECT user_id
FROM events
WHERE tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00'
 OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00'
GROUP BY user_id;

You'll get a query plan like this:

HashAggregate (cost=9.90..10.00 rows=10 width=4)
 Group Key: user_id
 -> Bitmap Heap Scan on events (cost=2.57..9.88 rows=10 width=4)
 Recheck Cond: ((tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone) OR (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone))
 -> BitmapOr (cost=2.57..2.57 rows=10 width=0)
 -> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
 Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone)
 -> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
 Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone)

Should be much faster.

Range types assume inclusive lower and exclusive upper bound unless instructed otherwise. tsrange(time_start, time_end) is the same as tsrange(time_start, time_end), '[)').
Since you operate with >= and <=, include both bounds with tsrange(time_start, time_end, '[]').

Should be a bit faster, yet, as plain (not expression) index.
You can add a timestamp range column to your table, like:

ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;

See:

Computed / calculated / virtual / derived columns in PostgreSQL

Or, more radically, replace time_start and time_end with the range column. Then index and query are a bit simpler:

CREATE INDEX events_right_idx ON events USING spgist (ts_range);
SELECT user_id
FROM events
WHERE ts_range @> timestamp '2021-08-24T15:30:00'
 OR ts_range @> timestamp '2021-08-24T15:59:00'
GROUP BY user_id;

But the tsrange column occupies more space than two timestamp columns. Weigh cost and benefits.

Asides

Postgres 14 (currently beta) even allows a covering SP-GiST index. The release notes:

Allow SP-GiST to use INCLUDE'd columns (Pavel Borisov)

But I don't think you can get index-only scans for your particular query.

If you have to make do with your B-tree index for some reason, this fixed UNION query shouldn't be too bad:

SELECT user_id
FROM events
WHERE '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM events
WHERE '2021-08-24T15:59:00' BETWEEN time_start AND time_end

Notably, no GROUP BY. UNION already does all the work.
And simplify with BETWEEN (no effect on performance).

Also, you seem to have a wild mix of timestamp without time zone and timestamp with time zone. And name it "time" to add to the confusion. Typically timestamptz is the better choice. See:

Ignoring time zones altogether in Rails and PostgreSQL

Last, but not least, this indicates inaccurate column statistics, leading to a sub-optimal query plan:

 -> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4)
 (actual time=104.085..183.994 rows=64 loops=3)

Run

ANALYZE events;

And retry. Your original query can use a plain B-tree index. It's just not as efficient as the suggested SP-GiST index.
And then maybe tune your autovacuum and statistics settings to avoid bad statistics in the future. See:

Question 5

thank you for your response. I have an offtopic question how can I find out what indexes on the table are not used by postgresql?

Question 6

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context.

score 6 · Accepted Answer · 2021-08-27 00:03:53Z

Expression index

A GiST or (even better) SP-GiST expression index on an inclusive timestamp range should work wonders.

CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));

Rewrite your query with the "range contains" operator @> and match the indexed expression (exactly equivalent to your original):

SELECT user_id
FROM events
WHERE tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00'
 OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00'
GROUP BY user_id;

You'll get a query plan like this:

HashAggregate (cost=9.90..10.00 rows=10 width=4)
 Group Key: user_id
 -> Bitmap Heap Scan on events (cost=2.57..9.88 rows=10 width=4)
 Recheck Cond: ((tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone) OR (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone))
 -> BitmapOr (cost=2.57..2.57 rows=10 width=0)
 -> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
 Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone)
 -> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
 Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone)

Should be much faster.

Range types assume inclusive lower and exclusive upper bound unless instructed otherwise. tsrange(time_start, time_end) is the same as tsrange(time_start, time_end), '[)').
Since you operate with >= and <=, include both bounds with tsrange(time_start, time_end, '[]').

Should be a bit faster, yet, as plain (not expression) index.
You can add a timestamp range column to your table, like:

ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;

See:

Computed / calculated / virtual / derived columns in PostgreSQL

Or, more radically, replace time_start and time_end with the range column. Then index and query are a bit simpler:

CREATE INDEX events_right_idx ON events USING spgist (ts_range);
SELECT user_id
FROM events
WHERE ts_range @> timestamp '2021-08-24T15:30:00'
 OR ts_range @> timestamp '2021-08-24T15:59:00'
GROUP BY user_id;

But the tsrange column occupies more space than two timestamp columns. Weigh cost and benefits.

Asides

Postgres 14 (currently beta) even allows a covering SP-GiST index. The release notes:

Allow SP-GiST to use INCLUDE'd columns (Pavel Borisov)

But I don't think you can get index-only scans for your particular query.

If you have to make do with your B-tree index for some reason, this fixed UNION query shouldn't be too bad:

SELECT user_id
FROM events
WHERE '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM events
WHERE '2021-08-24T15:59:00' BETWEEN time_start AND time_end

Notably, no GROUP BY. UNION already does all the work.
And simplify with BETWEEN (no effect on performance).

Also, you seem to have a wild mix of timestamp without time zone and timestamp with time zone. And name it "time" to add to the confusion. Typically timestamptz is the better choice. See:

Ignoring time zones altogether in Rails and PostgreSQL

Last, but not least, this indicates inaccurate column statistics, leading to a sub-optimal query plan:

 -> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4)
 (actual time=104.085..183.994 rows=64 loops=3)

Run

ANALYZE events;

And retry. Your original query can use a plain B-tree index. It's just not as efficient as the suggested SP-GiST index.
And then maybe tune your autovacuum and statistics settings to avoid bad statistics in the future. See:

thank you for your response. I have an offtopic question how can I find out what indexes on the table are not used by postgresql?
If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context.

Stack Exchange Network

How to get index scan for OR'ed time-range predicates?

1 Answer 1

Expression index

Alternatively, store range column in table

Asides

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to get index scan for OR'ed time-range predicates?

1 Answer 1

Expression index

Alternatively, store range column in table

Asides

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions