I have two tables person and tag. person has an bigint[] column containing ids of tags. I want to write a query which fetches all tags including the count of the tagged persons. So far I have the following query
SELECT t.id AS id, t.name AS name,
(select count(id) from person where person.tags @> array[t.id]) as instances_count
FROM tag AS t
I know the sub-query is bad but I couldn't think of a way of doing it better.
Any suggestions are welcome. Thank you
Edit: PostgreSQL 10.8
Here is the 'explain analyze':
Seq Scan on tag t (cost=0.00..4309761.27 rows=79 width=41) (actual time=131.109..11329.945 rows=79 loops=1)
Buffers: shared hit=560506 read=3505230
SubPlan 1
-> Aggregate (cost=54553.91..54553.92 rows=1 width=8) (actual time=143.411..143.412 rows=1 loops=79)
Buffers: shared hit=560505 read=3505230
-> Seq Scan on person i (cost=0.00..54550.82 rows=1233 width=8) (actual time=41.367..143.393 rows=148 loops=79)
Filter: ((NOT is_deleted) AND (tags @> ARRAY[t.id]))
Rows Removed by Filter: 245452
Buffers: shared hit=560505 read=3505230
Planning time: 0.098 ms
Execution time: 11330.014 ms
1 Answer 1
You really need an index on the array:
create index on person using gin (tags);
It is not obvious to me that the subselect is worse than any other way of accomplishing the same thing.
Your query plan doesn't match your query text, "is_deleted" occurs in the plan but not the text. If you want to optimize for that you could do:
create index on person using gin (tags) where not is_deleted;
How important that is depends on what fraction of your table is_deleted.
An alternative method not needing the index would be
select * from
(select unnest(tags) as id, count(id) from person group by 1) agg
join
tag
using (id);
But this would not return any tags which had a count of zero. You could make the join a right join, but then reported count would be null rather than zero.
-
creating the index lowered the execution time to 13.437 ms. thank youK.Kostadinov– K.Kostadinov2019年06月04日 08:00:22 +00:00Commented Jun 4, 2019 at 8:00
EXPLAIN (ANALYZE, BUFFERS)
, and see dba.stackexchange.com/tags/postgresql-performance/info for how ask effective performance questions.