Given a table vp
with column timestamp
type bigint
, and a btree
index on timestamp
, why would Postgres ignore the index and run a seq scan on comparison of timestamp
with a floating point value, when an index scan would produce identical results?
Integer comparison:
SELECT * FROM vp WHERE vp.timestamp > 1470752584
takes 48 ms:
Index Scan using vp_ts_idx on vp (cost=0.57..257.87 rows=2381 width=57) (actual time=0.014..38.669 rows=80323 loops=1) Index Cond: ("timestamp"> 1470752584) Total runtime: 48.322 ms
Numeric comparison:
SELECT * FROM vp WHERE vp.timestamp > 1470752584.1
takes 103 seconds because it ignores vp_ts_idx
and performs a seq scan of the entire table:
Seq Scan on vp (cost=0.00..7378353.16 rows=95403915 width=57) (actual time=62625.420..103122.701 rows=98240 loops=1) Filter: (("timestamp")::numeric> 1470752584.1) Rows Removed by Filter: 285945491 Total runtime: 103134.333 ms
Context: A query for recent vehicle positions compared timestamp
with EXTRACT(EPOCH FROM NOW()) - %s
, where %s
was the desired number of seconds, without explicitly casting to a bigint
. The workaround is to use CAST(EXTRACT(EPOCH FROM NOW()) - %s AS bigint)
.
Why wouldn't the query planner doesn't do this automatically when the column type is bigint
? Is this a bug, or am I not considering some edge case where this behavior would be useful?
2 Answers 2
The key is that you don't compare the same types. When comparing a bigint
to a numeric
, the easier way is to 'expand' the bigint
with the decimal places being 0 (like 1 -> 1.0), while the other way around it would mean rounding/truncation. (In this specific case it is easy to see that the both lead to the same result, but what if the values are negative?)
So, what you get in your comparison, is a numeric
to numeric
comparison, which is not something a bigint
index could serve.
It's worth a look which casts are possible and which is performed in these cases. For this, here are the two rows from pg_cast
:
SELECT castsource::regtype, casttarget::regtype, castcontext
FROM pg_cast
WHERE castsource::regtype = 'bigint'::regtype
AND casttarget::regtype = 'numeric'::regtype;
castsource │ casttarget │ castcontext
────────────┼────────────┼─────────────
bigint │ numeric │ i
SELECT castsource::regtype, casttarget::regtype, castcontext
FROM pg_cast
WHERE castsource::regtype = 'numeric'::regtype
AND casttarget::regtype = 'bigint'::regtype;
castsource │ casttarget │ castcontext
────────────┼────────────┼─────────────
numeric │ bigint │ a
According to the linked documentation page, castcontext
Indicates what contexts the cast can be invoked in.
e
means only as an explicit cast (usingCAST
or::
syntax).a
means implicitly in assignment to a target column, as well as explicitly.i
means implicitly in expressions, as well as the other cases.
So this means that the numeric
-> bigint
direction is happening 'by itself' (i. e. you not calling one of the casting operators explicitly) only if you assign the former to the latter. In expressions like your comparison, this is not the case, so the parser will only consider the other way (marked with an i
above). That means, you get a numeric
to numeric
comparison, unless you force it otherwise.
Notes:
psql
is a command line client to PostgreSQL, it does not do any sort of these things itself (I've edited the title accordingly)- using keywords as column names (like
timestamp
) is never a good idea - you'll possibly get unexpected parsing errors here and there, unless you are careful enough to double-quote them everywhere - using the Unix epoch as a timestamp might be cumbersome. There is a rich functionality for 'real' timestamps in PostgreSQL - in most cases it is much easier to use it.
-
Great advice, thanks. The table with a
bigint
column namedtimestamp
was generated by a third-party Python library that consumes a GTFS-RT feed into PostgreSQL using an ORM. I agree that it would make a lot more sense to use 'real' timestamps.Miles Erickson– Miles Erickson2016年08月09日 16:27:37 +00:00Commented Aug 9, 2016 at 16:27
Make sure to either
- bind parameter
some_id
asbigint
in prepared statement - cast any number literal to
bigint
manually likesome_id = 1234::bigint
Note even with prepared statement managed by ORM, if some_id
is mapping to ulong
type from the view of ORM, it will still cause implicitly ::numeric
casting which will bypass index on the bigint
column since there's no unsigned types in Postgres: https://stackoverflow.com/questions/20810134/why-unsigned-integer-is-not-available-in-postgresql
FYI:
- https://code.jeremyevans.net/2022-11-01-forcing-sequential-scans-on-postgresql.html
- https://lobste.rs/s/jgl9w2/forcing_sequential_scans_on_postgresql
- https://pganalyze.com/blog/5mins-postgres-large-integers-causing-sequential-scan-instead-of-using-index
- https://www.postgresql.org/message-id/flat/001101c404a3%24fb34d320%24fd08a8c0%40steve
CREATE TABLE test(num int not null, PRIMARY KEY (num));
INSERT INTO test SELECT * FROM generate_series(0, 32672);
EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
SELECT * FROM test WHERE num = 9223372036854775807;
Index Only Scan using test_pkey on tbm.test (cost=0.29..1.41 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=1)
Output: num
Index Cond: (test.num = '9223372036854775807'::bigint)
Heap Fetches: 0
Buffers: shared hit=2
Query Identifier: 3505370066652848307
Planning Time: 0.052 ms
Execution Time: 0.024 ms
EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
SELECT * FROM test WHERE num = 9223372036854775808;
Seq Scan on tbm.test (cost=0.00..635.10 rows=163 width=4) (actual time=3.427..3.427 rows=0 loops=1)
Output: num
Filter: ((test.num)::numeric = '9223372036854775808'::numeric)
Rows Removed by Filter: 32673
Buffers: shared hit=145
Query Identifier: 7606985776533665305
Planning Time: 0.055 ms
Execution Time: 3.439 ms
CREATE TABLE test(num smallint not null, PRIMARY KEY (num));
INSERT INTO test SELECT * FROM generate_series(0, 32672);
EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
SELECT * FROM test WHERE num = 32672;
Index Only Scan using test_pkey on tbm.test (cost=0.29..2.51 rows=1 width=2) (actual time=0.018..0.018 rows=1 loops=1)
Output: num
Index Cond: (test.num = 32672)
Heap Fetches: 1
Buffers: shared hit=4
Query Identifier: -4575445827832434577
Planning:
Buffers: shared hit=15
Planning Time: 0.091 ms
Execution Time: 0.029 ms
EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
SELECT * FROM test WHERE num = 2147483647;
Index Only Scan using test_pkey on tbm.test (cost=0.29..2.51 rows=1 width=2) (actual time=0.014..0.014 rows=0 loops=1)
Output: num
Index Cond: (test.num = 2147483647)
Heap Fetches: 0
Buffers: shared hit=2
Query Identifier: -4575445827832434577
Planning Time: 0.063 ms
Execution Time: 0.032 ms
EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
SELECT * FROM test WHERE num = 4294967296;
Index Only Scan using test_pkey on tbm.test (cost=0.29..2.51 rows=1 width=2) (actual time=0.007..0.007 rows=0 loops=1)
Output: num
Index Cond: (test.num = '4294967296'::bigint)
Heap Fetches: 0
Buffers: shared hit=2
Query Identifier: 4896689328011101216
Planning Time: 0.029 ms
Execution Time: 0.016 ms
Explore related questions
See similar questions with these tags.