TLDR: Can I create an index that's used by the following WHERE
clause:
WHERE foo_date <@ tsrange('2018-01-01', '2018-02-01')
Let's say I have a table like this:
CREATE TABLE foo ( foo_id INTEGER GENERATED BY DEFAULT AS IDENTITY, foo_date timestamp without time zone NOT NULL, CONSTRAINT foo_pkey PRIMARY KEY (foo_id) );
This table contains 100,000 records with dates from 2009年01月01日
to 2018年12月29日
. I'd like to be able to query for rows in a given date range (e.g. for rows in January 2018).
Option 1
One approach is to use the BETWEEN
operator:
SELECT * FROM foo WHERE foo_date BETWEEN '2018-01-01' AND '2018-01-31';
The problem of this approach is that if foo_date
occurred on 2018年01月31日
after midnight, they wouldn't be included in this query. So I could change the query to BETWEEN '2018-01-01' AND '2018-02-01'
. The problem then, however, is records that occur on 2018年02月01日 00:00:00
. These would be included, which I don't want.
Option 2
Another option, put forth by Aaron Bertrand, is to use this construct:
foo_date >= '2018-01-01' AND foo_date < '2018-02-01'
(Yes, this blog is for SQL Server, but seems to be applicable here).
While this form unequivocally gives me the results I want, it's cumbersome: I have to repeat the column name twice.
Option 3
Since Postgres gives us the range data type, I thought a clearer form might be:
foo_date <@ tsrange('2018-01-01', '2018-02-01')
So my next question is, if I use this form, can I use an index to speed up operation?
With Options 1 and 2 above, a normal b-tree index can be used:
CREATE INDEX idx_foo ON foo(foo_date);
A query using Options 1 or 2 will use the index:
EXPLAIN SELECT * FROM foo
WHERE
foo_date >= '2018-01-01'
AND foo_date < '2018-02-01';
gives me this query plan:
Bitmap Heap Scan on foo (cost=21.95..592.70 rows=942 width=12)
Recheck Cond: ((foo_date >= '2018-01-01 00:00:00'::timestamp without time zone) AND (foo_date < '2018-02-01 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on idx_foo (cost=0.00..21.71 rows=942 width=0)
Index Cond: ((foo_date >= '2018-01-01 00:00:00'::timestamp without time zone) AND (foo_date < '2018-02-01 00:00:00'::timestamp without time zone))
However, if I use Option 3, the index isn't used:
EXPLAIN SELECT * FROM foo
WHERE foo_date <@ tsrange('2018-01-01', '2018-02-01');
gives me:
Seq Scan on foo (cost=0.00..1791.00 rows=500 width=12)
Filter: (foo_date <@ '["2018-01-01 00:00:00","2018-02-01 00:00:00")'::tsrange)
If I try to create a gist index, I initially get an error message.
ERROR: data type timestamp without time zone has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
SQL state: 42704
After adding the btree_gist
extension, I can create the index:
CREATE INDEX idx_foo ON foo USING gist (foo_date)
However, using the @>
or <@
still doesn't make use of the index.
Is there something I'm missing? Or is it just not feasible to create an index that can be used by this construct?
1 Answer 1
The name "foo_date" indicates a date
and is a bad choice for a timestamp
column. Option 1 would work just fine with actual dates.
A plain btree index in combination with Option 2 is the unequivocally best solution. Look no further. Except maybe for the special case of a BRIN index for large tables with physically sorted data. See:
Proof of concept
That said, to make a GiST or SP-GiST index work, you could create an expression index on fake ranges. You do not need the module btree_gist
for this. Demonstrating with SP-GiST since that is typically a bit faster here. See:
CREATE INDEX foo_date_spgist_idx ON foo USING spgist(tsrange(foo_date, foo_date, '[]'));
SELECT * FROM foo
WHERE tsrange(foo_date, foo_date, '[]') <@ tsrange('2018-01-01', '2018-02-01')
Or with range literal:
...
WHERE tsrange(foo_date, foo_date, '[]') <@ '[2018年01月01日,2018年02月01日)'
But: bigger, more expensive to maintain, slower than the btree index. Not even less cumbersome to write. Pointless for your case.
Aside: technically, you could:
... WHERE foo_date BETWEEN '2018-01-01' AND '2018-01-31 23:59.999999';
The Postgres timestamp type is (currently) implemented with μs resolution, i.e. max. 6 fractional digits. Hence the expression does exactly what you want. But I strongly advice against building on this implementation detail. Option 2 is the way to go. Related:
-
Excellent! I hadn't thought of indexing a "dummy" range (although I agree this isn't the way to go, not least because you end up still having to type the column name twice). It seems a shame that there's not a better way to use an index for Postgres's range operators this way, but I suppose it's best to stick with the SQL standard in these cases. I agree about the name
foo_date
. This was a quick mock-up, and it was the name that came to me. Thank you!'Zack– Zack2019年04月30日 19:07:03 +00:00Commented Apr 30, 2019 at 19:07
Explore related questions
See similar questions with these tags.