I have a table data
:
- with ~ 400 million rows
- with
data.id
as int4, not null, and set as primary key - it's an aws RDS server, with ~128G of RAM
- there are no row with id > 1e9
This query:
select count(*) from data where id > 1e9;
that returns 0, takes consistently ~ 25s to run.
It uses to take up more than 2mns (I may have run analyze data
during my investigations and then the time dropped to 25s).
On another server, an aws aurora Postgres with the exact same table, it takes ~ 3m30s.
Anyway the same query takes a few ms on a MySQL DB, as it should. I am very new to Postgres, so I am probably missing something obvious.
explain analyze select count(*) from data where id > 1e9;
QUERY PLAN
Finalize Aggregate (cost=11944927.24..11944927.25 rows=1 width=8) (actual time=23986.851..23988.183 rows=1 loops=1)
-> Gather (cost=11944927.03..11944927.24 rows=2 width=8) (actual time=23986.779..23988.177 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=11943927.03..11943927.04 rows=1 width=8) (actual time=23984.078..23984.079 rows=1 loops=3)
-> Parallel Index Only Scan using data_pkey on data (cost=0.57..11804489.05 rows=55775191 width=0) (actual time=23984.074..23984.075 rows=0 loops=3)
Filter: ((id)::numeric > '1000000000'::numeric)
Rows Removed by Filter: 133840000
Heap Fetches: 100863313
Planning Time: 0.078 ms
Execution Time: 23988.219 ms
What am I doing wrong?
-
Hi, and welcome to dba.se! You should always include full table definitions with any questions - the more detail you provide in your quesiton, the better chance you have of obtaining good answers!Vérace– Vérace2024年03月02日 21:05:37 +00:00Commented Mar 2, 2024 at 21:05
1 Answer 1
I assume that you have an index on id
. If not, you need one.
Your problem is the 1e9
. If you write a numeric constant with the scientific notation, it is considered to be of type numeric
:
SELECT pg_typeof(1e9);
pg_typeof
═══════════
numeric
(1 row)
So PostgreSQL has to cast id
to type numeric
to perform the comparison (the (id)::numeric
in your execution plan) and cannot use the index.
Using a constant of type integer
should speed up processing:
select count(*) from data where id > 1000000000;
-
thanks a lot! Now it takes ~ 150ms. I knew there had to be some reason. And that may also explain some other issue I have using a timestamp wo tz field.Karl Forner– Karl Forner2024年03月01日 13:47:44 +00:00Commented Mar 1, 2024 at 13:47
Explore related questions
See similar questions with these tags.