Improve COUNT, JOIN and GROUP BY query performance

Question 1

I am trying to get the count per group for a given user and wrote the query below. The execution plan looks fine. However, in some cases, the user has around 10k articles and with a bit of load the queries start to be very slow and CPU usage goes up.

How can I improve the performances of the query below?

I am using PostgreSQL 11.

SELECT COUNT(*) AS occurences, seller_nickname AS value, seller_id AS id 
FROM data.saved_articles 
JOIN aggregated.offers USING (offer_id) 
WHERE user_id = 406943491 
GROUP BY seller_nickname, seller_id;

Example result set:

occurences value id
1 "nick1" id-1
8 "nick2" id-2

Query plan:

Finalize GroupAggregate (cost=50262.89..51401.41 rows=9427 width=23) (actual time=58.418..68.209 rows=3527 loops=1)
 Output: count(*), offers.seller_nickname, offers.seller_id
 Group Key: offers.seller_nickname, offers.seller_id
 Buffers: shared hit=17448 dirtied=75
 -> Gather Merge (cost=50262.89..51248.22 rows=7856 width=23) (actual time=58.413..66.732 rows=4549 loops=1)
 Output: offers.seller_nickname, offers.seller_id, (PARTIAL count(*))
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=37116 dirtied=150
 -> Partial GroupAggregate (cost=49262.86..49341.42 rows=3928 width=23) (actual time=45.467..46.616 rows=1516 loops=3)
 Output: offers.seller_nickname, offers.seller_id, PARTIAL count(*)
 Group Key: offers.seller_nickname, offers.seller_id
 Buffers: shared hit=37116 dirtied=150
 Worker 0: actual time=31.676..32.206 rows=726 loops=1
 Buffers: shared hit=4337 dirtied=16
 Worker 1: actual time=46.740..48.138 rows=1841 loops=1
 Buffers: shared hit=15331 dirtied=59
 -> Sort (cost=49262.86..49272.68 rows=3928 width=15) (actual time=45.458..45.738 rows=3073 loops=3)
 Output: offers.seller_nickname, offers.seller_id
 Sort Key: offers.seller_nickname, offers.seller_id
 Sort Method: quicksort Memory: 436kB
 Worker 0: Sort Method: quicksort Memory: 110kB
 Worker 1: Sort Method: quicksort Memory: 314kB
 Buffers: shared hit=37116 dirtied=150
 Worker 0: actual time=31.666..31.759 rows=1077 loops=1
 Buffers: shared hit=4337 dirtied=16
 Worker 1: actual time=46.732..47.088 rows=3811 loops=1
 Buffers: shared hit=15331 dirtied=59
 -> Nested Loop (cost=229.92..49028.37 rows=3928 width=15) (actual time=0.407..35.412 rows=3073 loops=3)
 Output: offers.seller_nickname, offers.seller_id
 Inner Unique: true
 Buffers: shared hit=37086 dirtied=150
 Worker 0: actual time=0.096..28.787 rows=1077 loops=1
 Buffers: shared hit=4322 dirtied=16
 Worker 1: actual time=0.103..36.065 rows=3811 loops=1
 Buffers: shared hit=15316 dirtied=59
 -> Parallel Bitmap Heap Scan on data.saved_articles (cost=229.49..21594.86 rows=3928 width=4) (actual time=0.368..1.219 rows=3073 loops=3)
 Output: saved_articles.offer_id
 Recheck Cond: (saved_articles.user_id = 406943491)
 Heap Blocks: exact=67
 Buffers: shared hit=178
 Worker 0: actual time=0.056..0.384 rows=1077 loops=1
 Buffers: shared hit=12
 Worker 1: actual time=0.061..1.154 rows=3811 loops=1
 Buffers: shared hit=59
 -> Bitmap Index Scan on pk_saved_articles_user_id_offer_id (cost=0.00..227.13 rows=9427 width=0) (actual time=0.941..0.941 rows=9220 loops=1)
 Index Cond: (saved_articles.user_id = 406943491)
 Buffers: shared hit=40
 -> Index Scan using pk_offers_offer_id on aggregated.offers (cost=0.43..6.98 rows=1 width=19) (actual time=0.011..0.011 rows=1 loops=9220)
 Output: offers.offer_id, offers.seller_id, offers.seller_nickname, offers.title, offers.condition, offers.ends_at, offers.current_price, offers.buynow_price, offers.category_id, offers.category_name_fr, offers.category_name_de, offers.category_name_it, offers.created_at, offers.updated_at, offers.bid_count, offers.quantity, offers.increment, offers.offer_type
 Index Cond: (offers.offer_id = saved_articles.offer_id)
 Buffers: shared hit=36908 dirtied=150
 Worker 0: actual time=0.026..0.026 rows=1 loops=1077
 Buffers: shared hit=4310 dirtied=16
 Worker 1: actual time=0.009..0.009 rows=1 loops=3811
 Buffers: shared hit=15257 dirtied=59
Planning Time: 0.274 ms
Execution Time: 68.714 ms

To improve performance, I could maybe limit the result set to 100 per group.

Tables and indexes:

CREATE TABLE aggregated.offers (
 offer_id integer NOT NULL,
 seller_id integer NOT NULL,
 seller_nickname character varying(30) NOT NULL,
 title character varying(60) NOT NULL
);
ALTER TABLE aggregated.offers
 ADD CONSTRAINT pk_offers_offer_id PRIMARY KEY (offer_id);
CREATE INDEX idx_offers_seller_id ON aggregated.offers (seller_id);
CREATE TABLE data.saved_articles (
 user_id integer NOT NULL,
 offer_id integer NOT NULL,
 created_at timestamp with time zone NOT NULL,
 updated_at timestamp with time zone,
 CONSTRAINT pk_saved_articles_user_id_offer_id PRIMARY KEY (user_id, offer_id),
 CONSTRAINT fk_saved_articles_offer_id FOREIGN KEY (offer_id)
 REFERENCES aggregated.offers (offer_id) MATCH SIMPLE
 ON UPDATE NO ACTION
 ON DELETE NO ACTION
)
CREATE INDEX idx_saved_articles_offer_id ON data.saved_articles (offer_id);

EDIT:

Starting with the easiest approach, I ran the the VACUUM and created an index:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
 SELECT COUNT(*) AS occurences, seller_nickname AS value, seller_id AS id 
 FROM data.saved_articles 
 JOIN aggregated.offers USING (offer_id) 
 WHERE user_id = 406943491 
 GROUP BY seller_nickname, seller_id

However, it did not seem to have any major impact.

Plan:

Finalize GroupAggregate (cost=54772.66..56138.73 rows=11312 width=23) (actual time=54.118..61.137 rows=3527 loops=1)
 Output: count(*), offers.seller_nickname, offers.seller_id
 Group Key: offers.seller_nickname, offers.seller_id
 Buffers: shared hit=17925
 -> Gather Merge (cost=54772.66..55954.92 rows=9426 width=23) (actual time=54.112..61.691 rows=4482 loops=1)
 Output: offers.seller_nickname, offers.seller_id, (PARTIAL count(*))
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=38070
 -> Partial GroupAggregate (cost=53772.64..53866.90 rows=4713 width=23) (actual time=40.581..41.995 rows=1494 loops=3)
 Output: offers.seller_nickname, offers.seller_id, PARTIAL count(*)
 Group Key: offers.seller_nickname, offers.seller_id
 Buffers: shared hit=38070
 Worker 0: actual time=43.304..45.268 rows=1882 loops=1
 Buffers: shared hit=16389
 Worker 1: actual time=24.626..25.086 rows=603 loops=1
 Buffers: shared hit=3756
 -> Sort (cost=53772.64..53784.42 rows=4713 width=15) (actual time=40.571..40.921 rows=3073 loops=3)
 Output: offers.seller_nickname, offers.seller_id
 Sort Key: offers.seller_nickname, offers.seller_id
 Sort Method: quicksort Memory: 437kB
 Worker 0: Sort Method: quicksort Memory: 323kB
 Worker 1: Sort Method: quicksort Memory: 76kB
 Buffers: shared hit=38070
 Worker 0: actual time=43.293..43.759 rows=3969 loops=1
 Buffers: shared hit=16389
 Worker 1: actual time=24.616..24.709 rows=913 loops=1
 Buffers: shared hit=3756
 -> Nested Loop (cost=216.53..53485.09 rows=4713 width=15) (actual time=0.457..30.793 rows=3073 loops=3)
 Output: offers.seller_nickname, offers.seller_id
 Inner Unique: true
 Buffers: shared hit=38040
 Worker 0: actual time=0.191..32.516 rows=3969 loops=1
 Buffers: shared hit=16374
 Worker 1: actual time=0.171..22.287 rows=913 loops=1
 Buffers: shared hit=3741
 -> Parallel Bitmap Heap Scan on data.saved_articles (cost=216.10..24104.73 rows=4713 width=4) (actual time=0.371..1.275 rows=3073 loops=3)
 Output: saved_articles.offer_id
 Recheck Cond: (saved_articles.user_id = 406943491)
 Heap Blocks: exact=60
 Buffers: shared hit=166
 Worker 0: actual time=0.075..1.420 rows=3969 loops=1
 Buffers: shared hit=66
 Worker 1: actual time=0.059..0.344 rows=913 loops=1
 Buffers: shared hit=12
 -> Bitmap Index Scan on pk_saved_articles_user_id_offer_id (cost=0.00..213.27 rows=11312 width=0) (actual time=0.929..0.929 rows=9220 loops=1)
 Index Cond: (saved_articles.user_id = 406943491)
 Buffers: shared hit=28
 -> Index Only Scan using tmp_id on aggregated.offers (cost=0.43..6.23 rows=1 width=19) (actual time=0.009..0.009 rows=1 loops=9220)
 Output: offers.offer_id, offers.seller_id, offers.seller_nickname
 Index Cond: (offers.offer_id = saved_articles.offer_id)
 Heap Fetches: 9129
 Buffers: shared hit=37874
 Worker 0: actual time=0.007..0.007 rows=1 loops=3969
 Buffers: shared hit=16308
 Worker 1: actual time=0.023..0.023 rows=1 loops=913
 Buffers: shared hit=3729
Planning Time: 0.275 ms
Execution Time: 63.720 ms

Since I need to run the same query on another column, I will not touch the PK and instead to try to have seller_nickname on a different table to be able to add it after the aggregation.

Question 2

Is 69 ms an accurate example of the "very slow" query which needs to be improved? How much faster than 69 ms does it need to be?

Question 3

Your query seems to perform ok already. Some ideas to squeeze out faster times:

Index-only scans for `aggregated.offers`

seller_nickname seems to be functionally dependent on seller_id. It is more expensive to read and group by a varchar(30) additionally, than to base that on just an integer. Cutting seller_nickname out of the base query should make it faster. Then you can add seller_id to the PK with an INCLUDE clause to get a covering index and very fast index-only scans on aggregated.offers:

ALTER TABLE aggregated.offers
 DROP CONSTRAINT pk_offers_offer_id 
, ADD CONSTRAINT pk_offers_offer_id PRIMARY KEY (offer_id) INCLUDE (seller_id);

Since offer_id and seller_id are both int4 we hit a local optimum. See:

Is a composite index also good for queries on the first field?

Faster query:

SELECT count(*) AS occurrences, o.seller_id AS id -- note occurrences with "rr"
FROM data.saved_articles a
JOIN aggregated.offers o USING (offer_id) 
WHERE a.user_id = 406943491 
GROUP BY o.seller_id;

If you really need to include seller_nickname in the result, add it after the aggregation from a lookup table. Assuming a table seller (which you would typically have), ideally with an index on (seller_id, seller_nickname) for another index-only scan:

SELECT ao.*, s.seller_nickname
FROM ( -- query from above
 SELECT count(*) AS occurrences, o.seller_id AS id
 FROM data.saved_articles a
 JOIN aggregated.offers o USING (offer_id) 
 WHERE a.user_id = 406943491 
 GROUP BY o.seller_id
 ) ao
LEFT JOIN data.seller s USING (seller_id);

Should still be faster, especially with more than a few rows per seller in the result.

Alternatively, you could just add seller_nickname to the PK index in the INCLUDE clause to speed up your original query. Or leave the PK untouched and add another index on (offer_id, seller_id, seller_nickname). But while including the integer column seller_id is very light-weight on the PK index and typically has hardly any downsides (do read the linked answer!), adding another varchar(30) is more costly. And you'll want to keep the PK index snappy for other purposes.

`autovacuum` for `data.saved_articles`

Your PK index on (user_id, offer_id) seems perfect for the task. However, I see a Bitmap Index Scan on pk_saved_articles_user_id_offer_id with rows=9220, followed by a Parallel Bitmap Heap Scan on data.saved_articles with rows=3073 - swallowing ~ 40 % of the query time.

No index-only scan. And only a third of the index tuples produce a heap tuple. That indicates problems with autovacuum: outdated visibility map and lots of dead index tuples. If possible (takes an exclusive lock on the table, does concurrent workload allow it?), try:

VACUUM FULL ANALYZE data.saved_articles;

Or use CLUSTER to cluster rows with the same user_id in physical storage.

CLUSTER data.saved_articles USING pk_saved_articles_user_id_offer_id;
ANALYZE data.saved_articles;

Then, even if write load on saved_articles prevents index-only scans, fewer blocks have to be read from the heap. But it's a one-time effect deteriorating over time.

Either way, if that improves performance, set up more aggressive autovacuum settings for table data.saved_articles and / or consider using pg_repack or pg_squeeze - which can replace VACUUM FULL / CLUSTER and work under concurrent load.

Configuring PostgreSQL for read performance

Question 4

Thanks for your detailed answer. A few points, I did not get. What do you mean after the aggregation, in another query? Do i need to add the seller_nickname in the covering index? If I need to run the same query but for other columns (same table) to get their count, should I just add them in the INCLUDE clause?

Question 5

I ran VACUUM FULL ANALYZE data.saved_articles; and re-ran the EXPLAIN but without any noticeable differences.

Question 6

@nss: I added some more above about indices and dealing with seller_nickname. This answer is tailored to the question, didn't take other columns into account. Either way, NO, don't INCLUDE more columns to the PK index.

Question 7

@nss Did the query plan change after VACUUM FULL ANALYZE data.saved_articles;? Did you see index-only scans? Did you run the query a couple of times to saturate cache again?

Question 8

I updated my question with (some) of your suggestions

score 2 · Accepted Answer · 2020-06-13 13:54:34Z

Your query seems to perform ok already. Some ideas to squeeze out faster times:

Index-only scans for `aggregated.offers`

seller_nickname seems to be functionally dependent on seller_id. It is more expensive to read and group by a varchar(30) additionally, than to base that on just an integer. Cutting seller_nickname out of the base query should make it faster. Then you can add seller_id to the PK with an INCLUDE clause to get a covering index and very fast index-only scans on aggregated.offers:

ALTER TABLE aggregated.offers
 DROP CONSTRAINT pk_offers_offer_id 
, ADD CONSTRAINT pk_offers_offer_id PRIMARY KEY (offer_id) INCLUDE (seller_id);

Since offer_id and seller_id are both int4 we hit a local optimum. See:

Is a composite index also good for queries on the first field?

Faster query:

SELECT count(*) AS occurrences, o.seller_id AS id -- note occurrences with "rr"
FROM data.saved_articles a
JOIN aggregated.offers o USING (offer_id) 
WHERE a.user_id = 406943491 
GROUP BY o.seller_id;

If you really need to include seller_nickname in the result, add it after the aggregation from a lookup table. Assuming a table seller (which you would typically have), ideally with an index on (seller_id, seller_nickname) for another index-only scan:

SELECT ao.*, s.seller_nickname
FROM ( -- query from above
 SELECT count(*) AS occurrences, o.seller_id AS id
 FROM data.saved_articles a
 JOIN aggregated.offers o USING (offer_id) 
 WHERE a.user_id = 406943491 
 GROUP BY o.seller_id
 ) ao
LEFT JOIN data.seller s USING (seller_id);

Should still be faster, especially with more than a few rows per seller in the result.

Alternatively, you could just add seller_nickname to the PK index in the INCLUDE clause to speed up your original query. Or leave the PK untouched and add another index on (offer_id, seller_id, seller_nickname). But while including the integer column seller_id is very light-weight on the PK index and typically has hardly any downsides (do read the linked answer!), adding another varchar(30) is more costly. And you'll want to keep the PK index snappy for other purposes.

`autovacuum` for `data.saved_articles`

Your PK index on (user_id, offer_id) seems perfect for the task. However, I see a Bitmap Index Scan on pk_saved_articles_user_id_offer_id with rows=9220, followed by a Parallel Bitmap Heap Scan on data.saved_articles with rows=3073 - swallowing ~ 40 % of the query time.

No index-only scan. And only a third of the index tuples produce a heap tuple. That indicates problems with autovacuum: outdated visibility map and lots of dead index tuples. If possible (takes an exclusive lock on the table, does concurrent workload allow it?), try:

VACUUM FULL ANALYZE data.saved_articles;

Or use CLUSTER to cluster rows with the same user_id in physical storage.

CLUSTER data.saved_articles USING pk_saved_articles_user_id_offer_id;
ANALYZE data.saved_articles;

Then, even if write load on saved_articles prevents index-only scans, fewer blocks have to be read from the heap. But it's a one-time effect deteriorating over time.

Either way, if that improves performance, set up more aggressive autovacuum settings for table data.saved_articles and / or consider using pg_repack or pg_squeeze - which can replace VACUUM FULL / CLUSTER and work under concurrent load.

Configuring PostgreSQL for read performance

Thanks for your detailed answer. A few points, I did not get. What do you mean after the aggregation, in another query? Do i need to add the seller_nickname in the covering index? If I need to run the same query but for other columns (same table) to get their count, should I just add them in the INCLUDE clause?
I ran VACUUM FULL ANALYZE data.saved_articles; and re-ran the EXPLAIN but without any noticeable differences.
@nss: I added some more above about indices and dealing with seller_nickname. This answer is tailored to the question, didn't take other columns into account. Either way, NO, don't INCLUDE more columns to the PK index.
@nss Did the query plan change after VACUUM FULL ANALYZE data.saved_articles;? Did you see index-only scans? Did you run the query a couple of times to saturate cache again?

Stack Exchange Network

Improve COUNT, JOIN and GROUP BY query performance

1 Answer 1

Index-only scans for `aggregated.offers`

`autovacuum` for `data.saved_articles`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Improve COUNT, JOIN and GROUP BY query performance

1 Answer 1

Index-only scans for aggregated.offers

autovacuum for data.saved_articles

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Index-only scans for `aggregated.offers`

`autovacuum` for `data.saved_articles`