I have table events
with fields:
id
user_id
time_start
time_end
...
And have b-tree index on (time_start, time_end)
.
SELECT user_id
FROM events
WHERE ((time_start <= '2021-08-24T15:30:00+00:00' AND time_end >= '2021-08-24T15:30:00+00:00') OR
(time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id);
Group (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1)
Group Key: user_id
Buffers: shared hit=224848
-> Gather Merge (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=224848
-> Sort (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3)
Sort Key: user_id
Sort Method: quicksort Memory: 27kB
Worker 0: Sort Method: quicksort Memory: 27kB
Worker 1: Sort Method: quicksort Memory: 28kB
Buffers: shared hit=224848
-> Partial HashAggregate (cost=242668.62..242679.65 rows=1103 width=4) (actual time=184.065..184.085 rows=59 loops=3)
Group Key: user_id
Buffers: shared hit=224834
-> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3)
Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))
Rows Removed by Filter: 708728
Buffers: shared hit=224834
Planning Time: 0.169 ms
Execution Time: 188.294 ms
Postgres uses Seq Scan
with filter:
Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))
But when I leave one condition for time_start
and time_end
it starts to use Index Scan.
How can I change condition to make Postgres use Index Scan over Seq Scan?
I don't want to use UNION
like:
SELECT user_id
FROM events
WHERE (
(time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id)
UNION (SELECT user_id
FROM events
WHERE (
(time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id
1 Answer 1
Expression index
A GiST or (even better) SP-GiST expression index on an inclusive timestamp range should work wonders.
CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));
Rewrite your query with the "range contains" operator @>
and match the indexed expression (exactly equivalent to your original):
SELECT user_id
FROM events
WHERE tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00'
OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00'
GROUP BY user_id;
You'll get a query plan like this:
HashAggregate (cost=9.90..10.00 rows=10 width=4)
Group Key: user_id
-> Bitmap Heap Scan on events (cost=2.57..9.88 rows=10 width=4)
Recheck Cond: ((tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone) OR (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone))
-> BitmapOr (cost=2.57..2.57 rows=10 width=0)
-> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone)
-> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0)
Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone)
Should be much faster.
Range types assume inclusive lower and exclusive upper bound unless instructed otherwise. tsrange(time_start, time_end)
is the same as tsrange(time_start, time_end), '[)'
).
Since you operate with >=
and <=
, include both bounds with tsrange(time_start, time_end, '[]')
.
Related:
- Perform this hours of operation query in PostgreSQL
- Optimizing queries on a range of timestamps (two columns)
Alternatively, store range column in table
Should be a bit faster, yet, as plain (not expression) index.
You can add a timestamp range column to your table, like:
ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;
See:
Or, more radically, replace time_start
and time_end
with the range column. Then index and query are a bit simpler:
CREATE INDEX events_right_idx ON events USING spgist (ts_range);
SELECT user_id
FROM events
WHERE ts_range @> timestamp '2021-08-24T15:30:00'
OR ts_range @> timestamp '2021-08-24T15:59:00'
GROUP BY user_id;
But the tsrange
column occupies more space than two timestamp
columns. Weigh cost and benefits.
Asides
Postgres 14 (currently beta) even allows a covering SP-GiST index. The release notes:
Allow SP-GiST to use INCLUDE'd columns (Pavel Borisov)
But I don't think you can get index-only scans for your particular query.
If you have to make do with your B-tree index for some reason, this fixed UNION
query shouldn't be too bad:
SELECT user_id
FROM events
WHERE '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM events
WHERE '2021-08-24T15:59:00' BETWEEN time_start AND time_end
Notably, no GROUP BY
. UNION
already does all the work.
And simplify with BETWEEN
(no effect on performance).
Also, you seem to have a wild mix of timestamp without time zone
and timestamp with time zone
. And name it "time" to add to the confusion. Typically timestamptz
is the better choice. See:
Last, but not least, this indicates inaccurate column statistics, leading to a sub-optimal query plan:
-> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3)
Run
ANALYZE events;
And retry. Your original query can use a plain B-tree index. It's just not as efficient as the suggested SP-GiST index.
And then maybe tune your autovacuum
and statistics settings to avoid bad statistics in the future. See:
-
thank you for your response. I have an offtopic question how can I find out what indexes on the table are not used by postgresql?Dmitro– Dmitro2021年08月27日 11:28:18 +00:00Commented Aug 27, 2021 at 11:28
-
1If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context.2021年08月27日 12:29:48 +00:00Commented Aug 27, 2021 at 12:29
Explore related questions
See similar questions with these tags.
explain (analyze, buffers)
- not just a "simple" explain.