0

I have two tables person and tag. person has an bigint[] column containing ids of tags. I want to write a query which fetches all tags including the count of the tagged persons. So far I have the following query

SELECT t.id AS id, t.name AS name,
(select count(id) from person where person.tags @> array[t.id]) as instances_count
 FROM tag AS t

I know the sub-query is bad but I couldn't think of a way of doing it better.

Any suggestions are welcome. Thank you

Edit: PostgreSQL 10.8

Here is the 'explain analyze':

Seq Scan on tag t (cost=0.00..4309761.27 rows=79 width=41) (actual time=131.109..11329.945 rows=79 loops=1)
 Buffers: shared hit=560506 read=3505230
 SubPlan 1
 -> Aggregate (cost=54553.91..54553.92 rows=1 width=8) (actual time=143.411..143.412 rows=1 loops=79)
 Buffers: shared hit=560505 read=3505230
 -> Seq Scan on person i (cost=0.00..54550.82 rows=1233 width=8) (actual time=41.367..143.393 rows=148 loops=79)
 Filter: ((NOT is_deleted) AND (tags @> ARRAY[t.id]))
 Rows Removed by Filter: 245452
 Buffers: shared hit=560505 read=3505230
Planning time: 0.098 ms
Execution time: 11330.014 ms
MDCCL
8,5303 gold badges32 silver badges63 bronze badges
asked May 31, 2019 at 11:27
2
  • 2
    Please edit to include an EXPLAIN (ANALYZE, BUFFERS), and see dba.stackexchange.com/tags/postgresql-performance/info for how ask effective performance questions. Commented May 31, 2019 at 15:15
  • I've updated the question Commented Jun 3, 2019 at 8:42

1 Answer 1

1

You really need an index on the array:

create index on person using gin (tags);

It is not obvious to me that the subselect is worse than any other way of accomplishing the same thing.

Your query plan doesn't match your query text, "is_deleted" occurs in the plan but not the text. If you want to optimize for that you could do:

create index on person using gin (tags) where not is_deleted;

How important that is depends on what fraction of your table is_deleted.


An alternative method not needing the index would be

select * from 
 (select unnest(tags) as id, count(id) from person group by 1) agg 
join 
 tag 
using (id);

But this would not return any tags which had a count of zero. You could make the join a right join, but then reported count would be null rather than zero.

answered Jun 3, 2019 at 16:26
1
  • creating the index lowered the execution time to 13.437 ms. thank you Commented Jun 4, 2019 at 8:00

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.