0

I have a table data:

  • with ~ 400 million rows
  • with data.id as int4, not null, and set as primary key
  • it's an aws RDS server, with ~128G of RAM
  • there are no row with id > 1e9

This query:

select count(*) from data where id > 1e9;

that returns 0, takes consistently ~ 25s to run. It uses to take up more than 2mns (I may have run analyze data during my investigations and then the time dropped to 25s).

On another server, an aws aurora Postgres with the exact same table, it takes ~ 3m30s.

Anyway the same query takes a few ms on a MySQL DB, as it should. I am very new to Postgres, so I am probably missing something obvious.

explain analyze select count(*) from data where id > 1e9;

QUERY PLAN
Finalize Aggregate (cost=11944927.24..11944927.25 rows=1 width=8) (actual time=23986.851..23988.183 rows=1 loops=1)
 -> Gather (cost=11944927.03..11944927.24 rows=2 width=8) (actual time=23986.779..23988.177 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Partial Aggregate (cost=11943927.03..11943927.04 rows=1 width=8) (actual time=23984.078..23984.079 rows=1 loops=3)
 -> Parallel Index Only Scan using data_pkey on data (cost=0.57..11804489.05 rows=55775191 width=0) (actual time=23984.074..23984.075 rows=0 loops=3)
 Filter: ((id)::numeric > '1000000000'::numeric)
 Rows Removed by Filter: 133840000
 Heap Fetches: 100863313
Planning Time: 0.078 ms
Execution Time: 23988.219 ms

What am I doing wrong?

asked Mar 1, 2024 at 13:33
1
  • Hi, and welcome to dba.se! You should always include full table definitions with any questions - the more detail you provide in your quesiton, the better chance you have of obtaining good answers! Commented Mar 2, 2024 at 21:05

1 Answer 1

3

I assume that you have an index on id. If not, you need one.

Your problem is the 1e9. If you write a numeric constant with the scientific notation, it is considered to be of type numeric:

SELECT pg_typeof(1e9);
 pg_typeof 
═══════════
 numeric
(1 row)

So PostgreSQL has to cast id to type numeric to perform the comparison (the (id)::numeric in your execution plan) and cannot use the index.

Using a constant of type integer should speed up processing:

select count(*) from data where id > 1000000000;
answered Mar 1, 2024 at 13:41
1
  • thanks a lot! Now it takes ~ 150ms. I knew there had to be some reason. And that may also explain some other issue I have using a timestamp wo tz field. Commented Mar 1, 2024 at 13:47

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.