Why does PostgreSQL perform a seq scan when comparing a numeric value with a bigint column?

Question 1

Given a table vp with column timestamp type bigint, and a btree index on timestamp, why would Postgres ignore the index and run a seq scan on comparison of timestamp with a floating point value, when an index scan would produce identical results?

Integer comparison:

SELECT * FROM vp WHERE vp.timestamp > 1470752584 takes 48 ms:

 Index Scan using vp_ts_idx on vp (cost=0.57..257.87 rows=2381 width=57) (actual time=0.014..38.669 rows=80323 loops=1)
 Index Cond: ("timestamp"> 1470752584)
 Total runtime: 48.322 ms

Numeric comparison:

SELECT * FROM vp WHERE vp.timestamp > 1470752584.1 takes 103 seconds because it ignores vp_ts_idx and performs a seq scan of the entire table:

 Seq Scan on vp (cost=0.00..7378353.16 rows=95403915 width=57) (actual time=62625.420..103122.701 rows=98240 loops=1)
 Filter: (("timestamp")::numeric> 1470752584.1)
 Rows Removed by Filter: 285945491
 Total runtime: 103134.333 ms

Context: A query for recent vehicle positions compared timestamp with EXTRACT(EPOCH FROM NOW()) - %s, where %s was the desired number of seconds, without explicitly casting to a bigint. The workaround is to use CAST(EXTRACT(EPOCH FROM NOW()) - %s AS bigint).

Why wouldn't the query planner doesn't do this automatically when the column type is bigint? Is this a bug, or am I not considering some edge case where this behavior would be useful?

Question 2

The key is that you don't compare the same types. When comparing a bigint to a numeric, the easier way is to 'expand' the bigint with the decimal places being 0 (like 1 -> 1.0), while the other way around it would mean rounding/truncation. (In this specific case it is easy to see that the both lead to the same result, but what if the values are negative?)

So, what you get in your comparison, is a numeric to numeric comparison, which is not something a bigint index could serve.

It's worth a look which casts are possible and which is performed in these cases. For this, here are the two rows from pg_cast:

SELECT castsource::regtype, casttarget::regtype, castcontext 
 FROM pg_cast 
 WHERE castsource::regtype = 'bigint'::regtype
 AND casttarget::regtype = 'numeric'::regtype;
 castsource │ casttarget │ castcontext 
────────────┼────────────┼─────────────
 bigint │ numeric │ i
SELECT castsource::regtype, casttarget::regtype, castcontext
 FROM pg_cast
 WHERE castsource::regtype = 'numeric'::regtype
 AND casttarget::regtype = 'bigint'::regtype;
 castsource │ casttarget │ castcontext 
────────────┼────────────┼─────────────
 numeric │ bigint │ a

According to the linked documentation page, castcontext

Indicates what contexts the cast can be invoked in. e means only as an explicit cast (using CAST or :: syntax). a means implicitly in assignment to a target column, as well as explicitly. i means implicitly in expressions, as well as the other cases.

So this means that the numeric -> bigint direction is happening 'by itself' (i. e. you not calling one of the casting operators explicitly) only if you assign the former to the latter. In expressions like your comparison, this is not the case, so the parser will only consider the other way (marked with an i above). That means, you get a numeric to numeric comparison, unless you force it otherwise.

Notes:

psql is a command line client to PostgreSQL, it does not do any sort of these things itself (I've edited the title accordingly)
using keywords as column names (like timestamp) is never a good idea - you'll possibly get unexpected parsing errors here and there, unless you are careful enough to double-quote them everywhere
using the Unix epoch as a timestamp might be cumbersome. There is a rich functionality for 'real' timestamps in PostgreSQL - in most cases it is much easier to use it.

Question 3

Great advice, thanks. The table with a bigint column named timestamp was generated by a third-party Python library that consumes a GTFS-RT feed into PostgreSQL using an ORM. I agree that it would make a lot more sense to use 'real' timestamps.

Question 4

https://stackoverflow.com/questions/59941708/does-postgres-use-index-on-bigint-columns/78499596#78499596

Make sure to either

bind parameter some_id as bigint in prepared statement
cast any number literal to bigint manually like some_id = 1234::bigint

Note even with prepared statement managed by ORM, if some_id is mapping to ulong type from the view of ORM, it will still cause implicitly ::numeric casting which will bypass index on the bigint column since there's no unsigned types in Postgres: https://stackoverflow.com/questions/20810134/why-unsigned-integer-is-not-available-in-postgresql

FYI:

CREATE TABLE test(num int not null, PRIMARY KEY (num));
INSERT INTO test SELECT * FROM generate_series(0, 32672);
EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
 SELECT * FROM test WHERE num = 9223372036854775807;

Index Only Scan using test_pkey on tbm.test (cost=0.29..1.41 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=1)
 Output: num
 Index Cond: (test.num = '9223372036854775807'::bigint)
 Heap Fetches: 0
 Buffers: shared hit=2
Query Identifier: 3505370066652848307
Planning Time: 0.052 ms
Execution Time: 0.024 ms

EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
 SELECT * FROM test WHERE num = 9223372036854775808;

Seq Scan on tbm.test (cost=0.00..635.10 rows=163 width=4) (actual time=3.427..3.427 rows=0 loops=1)
 Output: num
 Filter: ((test.num)::numeric = '9223372036854775808'::numeric)
 Rows Removed by Filter: 32673
 Buffers: shared hit=145
Query Identifier: 7606985776533665305
Planning Time: 0.055 ms
Execution Time: 3.439 ms

CREATE TABLE test(num smallint not null, PRIMARY KEY (num));
INSERT INTO test SELECT * FROM generate_series(0, 32672);
EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
 SELECT * FROM test WHERE num = 32672;

Index Only Scan using test_pkey on tbm.test (cost=0.29..2.51 rows=1 width=2) (actual time=0.018..0.018 rows=1 loops=1)
 Output: num
 Index Cond: (test.num = 32672)
 Heap Fetches: 1
 Buffers: shared hit=4
Query Identifier: -4575445827832434577
Planning:
 Buffers: shared hit=15
Planning Time: 0.091 ms
Execution Time: 0.029 ms

EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
 SELECT * FROM test WHERE num = 2147483647;

Index Only Scan using test_pkey on tbm.test (cost=0.29..2.51 rows=1 width=2) (actual time=0.014..0.014 rows=0 loops=1)
 Output: num
 Index Cond: (test.num = 2147483647)
 Heap Fetches: 0
 Buffers: shared hit=2
Query Identifier: -4575445827832434577
Planning Time: 0.063 ms
Execution Time: 0.032 ms

EXPLAIN (VERBOSE, ANALYZE, BUFFERS)
 SELECT * FROM test WHERE num = 4294967296;

Index Only Scan using test_pkey on tbm.test (cost=0.29..2.51 rows=1 width=2) (actual time=0.007..0.007 rows=0 loops=1)
 Output: num
 Index Cond: (test.num = '4294967296'::bigint)
 Heap Fetches: 0
 Buffers: shared hit=2
Query Identifier: 4896689328011101216
Planning Time: 0.029 ms
Execution Time: 0.016 ms

András Váczi 31.8k13 gold badges103 silver badges152 bronze badges · Accepted Answer · 2016-08-09 15:56:58Z

The key is that you don't compare the same types. When comparing a bigint to a numeric, the easier way is to 'expand' the bigint with the decimal places being 0 (like 1 -> 1.0), while the other way around it would mean rounding/truncation. (In this specific case it is easy to see that the both lead to the same result, but what if the values are negative?)

So, what you get in your comparison, is a numeric to numeric comparison, which is not something a bigint index could serve.

It's worth a look which casts are possible and which is performed in these cases. For this, here are the two rows from pg_cast:

SELECT castsource::regtype, casttarget::regtype, castcontext 
 FROM pg_cast 
 WHERE castsource::regtype = 'bigint'::regtype
 AND casttarget::regtype = 'numeric'::regtype;
 castsource │ casttarget │ castcontext 
────────────┼────────────┼─────────────
 bigint │ numeric │ i
SELECT castsource::regtype, casttarget::regtype, castcontext
 FROM pg_cast
 WHERE castsource::regtype = 'numeric'::regtype
 AND casttarget::regtype = 'bigint'::regtype;
 castsource │ casttarget │ castcontext 
────────────┼────────────┼─────────────
 numeric │ bigint │ a

According to the linked documentation page, castcontext

Indicates what contexts the cast can be invoked in. e means only as an explicit cast (using CAST or :: syntax). a means implicitly in assignment to a target column, as well as explicitly. i means implicitly in expressions, as well as the other cases.

So this means that the numeric -> bigint direction is happening 'by itself' (i. e. you not calling one of the casting operators explicitly) only if you assign the former to the latter. In expressions like your comparison, this is not the case, so the parser will only consider the other way (marked with an i above). That means, you get a numeric to numeric comparison, unless you force it otherwise.

Notes:

psql is a command line client to PostgreSQL, it does not do any sort of these things itself (I've edited the title accordingly)
using keywords as column names (like timestamp) is never a good idea - you'll possibly get unexpected parsing errors here and there, unless you are careful enough to double-quote them everywhere
using the Unix epoch as a timestamp might be cumbersome. There is a rich functionality for 'real' timestamps in PostgreSQL - in most cases it is much easier to use it.

Great advice, thanks. The table with a bigint column named timestamp was generated by a third-party Python library that consumes a GTFS-RT feed into PostgreSQL using an ORM. I agree that it would make a lot more sense to use 'real' timestamps.

Stack Exchange Network

Why does PostgreSQL perform a seq scan when comparing a numeric value with a bigint column?

Integer comparison:

Numeric comparison:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why does PostgreSQL perform a seq scan when comparing a numeric value with a bigint column?

Integer comparison:

Numeric comparison:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions