Creating index on TIMESTAMP column to use with range operators

Question 1

TLDR: Can I create an index that's used by the following WHERE clause:

WHERE foo_date <@ tsrange('2018-01-01', '2018-02-01')

Let's say I have a table like this:

CREATE TABLE foo
(
 foo_id INTEGER GENERATED BY DEFAULT AS IDENTITY,
 foo_date timestamp without time zone NOT NULL,
 CONSTRAINT foo_pkey PRIMARY KEY (foo_id)
);

This table contains 100,000 records with dates from 2009年01月01日 to 2018年12月29日. I'd like to be able to query for rows in a given date range (e.g. for rows in January 2018).

Option 1

One approach is to use the BETWEEN operator:

SELECT * FROM foo WHERE foo_date BETWEEN '2018-01-01' AND '2018-01-31';

The problem of this approach is that if foo_date occurred on 2018年01月31日 after midnight, they wouldn't be included in this query. So I could change the query to BETWEEN '2018-01-01' AND '2018-02-01'. The problem then, however, is records that occur on 2018年02月01日 00:00:00. These would be included, which I don't want.

Option 2

Another option, put forth by Aaron Bertrand, is to use this construct:

foo_date >= '2018-01-01' AND foo_date < '2018-02-01'

(Yes, this blog is for SQL Server, but seems to be applicable here).

While this form unequivocally gives me the results I want, it's cumbersome: I have to repeat the column name twice.

Option 3

Since Postgres gives us the range data type, I thought a clearer form might be:

foo_date <@ tsrange('2018-01-01', '2018-02-01')

So my next question is, if I use this form, can I use an index to speed up operation?

With Options 1 and 2 above, a normal b-tree index can be used:

CREATE INDEX idx_foo ON foo(foo_date);

A query using Options 1 or 2 will use the index:

EXPLAIN SELECT * FROM foo 
WHERE 
 foo_date >= '2018-01-01' 
 AND foo_date < '2018-02-01';

gives me this query plan:

Bitmap Heap Scan on foo (cost=21.95..592.70 rows=942 width=12)
 Recheck Cond: ((foo_date >= '2018-01-01 00:00:00'::timestamp without time zone) AND (foo_date < '2018-02-01 00:00:00'::timestamp without time zone))
 -> Bitmap Index Scan on idx_foo (cost=0.00..21.71 rows=942 width=0)
 Index Cond: ((foo_date >= '2018-01-01 00:00:00'::timestamp without time zone) AND (foo_date < '2018-02-01 00:00:00'::timestamp without time zone))

However, if I use Option 3, the index isn't used:

EXPLAIN SELECT * FROM foo 
WHERE foo_date <@ tsrange('2018-01-01', '2018-02-01');

gives me:

Seq Scan on foo (cost=0.00..1791.00 rows=500 width=12)
 Filter: (foo_date <@ '["2018-01-01 00:00:00","2018-02-01 00:00:00")'::tsrange)

If I try to create a gist index, I initially get an error message.

ERROR: data type timestamp without time zone has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
SQL state: 42704

After adding the btree_gist extension, I can create the index:

CREATE INDEX idx_foo ON foo USING gist (foo_date)

However, using the @> or <@ still doesn't make use of the index.

Is there something I'm missing? Or is it just not feasible to create an index that can be used by this construct?

Question 2

The name "foo_date" indicates a date and is a bad choice for a timestamp column. Option 1 would work just fine with actual dates.

A plain btree index in combination with Option 2 is the unequivocally best solution. Look no further. Except maybe for the special case of a BRIN index for large tables with physically sorted data. See:

Speed up creation of Postgres partial index

Proof of concept

That said, to make a GiST or SP-GiST index work, you could create an expression index on fake ranges. You do not need the module btree_gist for this. Demonstrating with SP-GiST since that is typically a bit faster here. See:

Perform this hours of operation query in PostgreSQL

CREATE INDEX foo_date_spgist_idx ON foo USING spgist(tsrange(foo_date, foo_date, '[]'));
SELECT * FROM foo
WHERE tsrange(foo_date, foo_date, '[]') <@ tsrange('2018-01-01', '2018-02-01')

Or with range literal:

...
WHERE tsrange(foo_date, foo_date, '[]') <@ '[2018年01月01日,2018年02月01日)'

But: bigger, more expensive to maintain, slower than the btree index. Not even less cumbersome to write. Pointless for your case.

Aside: technically, you could:

... WHERE foo_date BETWEEN '2018-01-01' AND '2018-01-31 23:59.999999';

The Postgres timestamp type is (currently) implemented with μs resolution, i.e. max. 6 fractional digits. Hence the expression does exactly what you want. But I strongly advice against building on this implementation detail. Option 2 is the way to go. Related:

Question 3

Excellent! I hadn't thought of indexing a "dummy" range (although I agree this isn't the way to go, not least because you end up still having to type the column name twice). It seems a shame that there's not a better way to use an index for Postgres's range operators this way, but I suppose it's best to stick with the SQL standard in these cases. I agree about the name foo_date. This was a quick mock-up, and it was the name that came to me. Thank you!'

score 8 · Accepted Answer · 2019-04-30 03:19:45Z

The name "foo_date" indicates a date and is a bad choice for a timestamp column. Option 1 would work just fine with actual dates.

A plain btree index in combination with Option 2 is the unequivocally best solution. Look no further. Except maybe for the special case of a BRIN index for large tables with physically sorted data. See:

Speed up creation of Postgres partial index

Proof of concept

That said, to make a GiST or SP-GiST index work, you could create an expression index on fake ranges. You do not need the module btree_gist for this. Demonstrating with SP-GiST since that is typically a bit faster here. See:

Perform this hours of operation query in PostgreSQL

CREATE INDEX foo_date_spgist_idx ON foo USING spgist(tsrange(foo_date, foo_date, '[]'));
SELECT * FROM foo
WHERE tsrange(foo_date, foo_date, '[]') <@ tsrange('2018-01-01', '2018-02-01')

Or with range literal:

...
WHERE tsrange(foo_date, foo_date, '[]') <@ '[2018年01月01日,2018年02月01日)'

But: bigger, more expensive to maintain, slower than the btree index. Not even less cumbersome to write. Pointless for your case.

Aside: technically, you could:

... WHERE foo_date BETWEEN '2018-01-01' AND '2018-01-31 23:59.999999';

The Postgres timestamp type is (currently) implemented with μs resolution, i.e. max. 6 fractional digits. Hence the expression does exactly what you want. But I strongly advice against building on this implementation detail. Option 2 is the way to go. Related:

Excellent! I hadn't thought of indexing a "dummy" range (although I agree this isn't the way to go, not least because you end up still having to type the column name twice). It seems a shame that there's not a better way to use an index for Postgres's range operators this way, but I suppose it's best to stick with the SQL standard in these cases. I agree about the name foo_date. This was a quick mock-up, and it was the name that came to me. Thank you!'

Stack Exchange Network

Creating index on TIMESTAMP column to use with range operators

Option 1

Option 2

Option 3

1 Answer 1

Proof of concept

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions