PostgreSQL 10 Optimizing slow query performance with aggregate function

Question 1

I have two tables person and tag. person has an bigint[] column containing ids of tags. I want to write a query which fetches all tags including the count of the tagged persons. So far I have the following query

SELECT t.id AS id, t.name AS name,
(select count(id) from person where person.tags @> array[t.id]) as instances_count
 FROM tag AS t

I know the sub-query is bad but I couldn't think of a way of doing it better.

Any suggestions are welcome. Thank you

Edit: PostgreSQL 10.8

Here is the 'explain analyze':

Seq Scan on tag t (cost=0.00..4309761.27 rows=79 width=41) (actual time=131.109..11329.945 rows=79 loops=1)
 Buffers: shared hit=560506 read=3505230
 SubPlan 1
 -> Aggregate (cost=54553.91..54553.92 rows=1 width=8) (actual time=143.411..143.412 rows=1 loops=79)
 Buffers: shared hit=560505 read=3505230
 -> Seq Scan on person i (cost=0.00..54550.82 rows=1233 width=8) (actual time=41.367..143.393 rows=148 loops=79)
 Filter: ((NOT is_deleted) AND (tags @> ARRAY[t.id]))
 Rows Removed by Filter: 245452
 Buffers: shared hit=560505 read=3505230
Planning time: 0.098 ms
Execution time: 11330.014 ms

Question 2

Please edit to include an EXPLAIN (ANALYZE, BUFFERS), and see dba.stackexchange.com/tags/postgresql-performance/info for how ask effective performance questions.

Question 3

I've updated the question

Question 4

You really need an index on the array:

create index on person using gin (tags);

It is not obvious to me that the subselect is worse than any other way of accomplishing the same thing.

Your query plan doesn't match your query text, "is_deleted" occurs in the plan but not the text. If you want to optimize for that you could do:

create index on person using gin (tags) where not is_deleted;

How important that is depends on what fraction of your table is_deleted.

An alternative method not needing the index would be

select * from 
 (select unnest(tags) as id, count(id) from person group by 1) agg 
join 
 tag 
using (id);

But this would not return any tags which had a count of zero. You could make the join a right join, but then reported count would be null rather than zero.

Question 5

creating the index lowered the execution time to 13.437 ms. thank you

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2019-06-03 16:26:39Z

You really need an index on the array:

create index on person using gin (tags);

It is not obvious to me that the subselect is worse than any other way of accomplishing the same thing.

Your query plan doesn't match your query text, "is_deleted" occurs in the plan but not the text. If you want to optimize for that you could do:

create index on person using gin (tags) where not is_deleted;

How important that is depends on what fraction of your table is_deleted.

An alternative method not needing the index would be

select * from 
 (select unnest(tags) as id, count(id) from person group by 1) agg 
join 
 tag 
using (id);

But this would not return any tags which had a count of zero. You could make the join a right join, but then reported count would be null rather than zero.

creating the index lowered the execution time to 13.437 ms. thank you

Stack Exchange Network

PostgreSQL 10 Optimizing slow query performance with aggregate function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL 10 Optimizing slow query performance with aggregate function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions