Postgres index for aggregate functions

Question 1

Query:

SELECT
 "places_place"."id"
FROM "places_place"
 LEFT OUTER JOIN "units_unit" ON ("places_place"."id" = "units_unit"."place_id")
GROUP BY "places_place"."id"
HAVING SUM("units_unit"."quantity") >= 123

Index attempt:

CREATE INDEX units_quantity_sum ON units_unit (SUM("units_unit"."quantity"));
-- ERROR: aggregate functions are not allowed in index expressions

Essentially, I want it to index the result of SUM without storing the result in a separate column of the table. How can I create an index to do this (or is there a better way to optimize this query)?

EXPLAIN ANALYZE of query with 10,000 rows in places_place and 25,000 in units_unit:

HashAggregate (cost=2057.31..2157.33 rows=10002 width=4) (actual time=38.121..41.174 rows=7727 loops=1)
 Group Key: places_place.id
 Filter: (sum(units_unit.quantity) >= 5)
 Rows Removed by Filter: 2275
 -> Hash Right Join (cost=594.04..1932.22 rows=25018 width=6) (actual time=6.383..28.578 rows=26727 loops=1)
 Hash Cond: (units_unit.place_id = places_place.id)
 -> Seq Scan on units_unit (cost=0.00..994.18 rows=25018 width=6) (actual time=0.003..7.279 rows=25018 loops=1)
 -> Hash (cost=469.02..469.02 rows=10002 width=4) (actual time=6.311..6.311 rows=10002 loops=1)
 Buckets: 16384 Batches: 1 Memory Usage: 480kB
 -> Seq Scan on places_place (cost=0.00..469.02 rows=10002 width=4) (actual time=0.007..3.560 rows=10002 loops=1)
Planning time: 0.584 ms
Execution time: 42.643 ms

Question 2

I don't think you need the join at all. An index on (place_id, quantity) and simplifying the query to SELECT place_id FROM units_unit GROUP BY place_id HAVING SUM(quantity) >= 123 ; would improve efficiency.

Question 3

@ypercubeTM good point, removing the join reduced execution time from 42ms to 20ms. However adding the index CREATE INDEX unit_quantity_index ON units_unit (place_id, quantity) had no effect.

Question 4

Still very small tables. And 20 ms is quite good. Now try with 10M and 25M rows!

Question 5

You have two easy options

You can use a MATERIALIZED VIEW
You can also use a TRIGGER that inserts another table.

Both of these will allow you to cache the SUM(). I would go with the MATERIALIZED VIEW unless you need up to date changes all the time.

Before you go down that route though PostgreSQL 9.6 enables parallel seq scans and aggregation which will increase performance. In fact, it's an ideal use case. If you just need this to be faster try setting max_parallel_workers_per_gather

Question 6

Up to date changes would be needed, it's user-generated data and it would be necessary to have the index update as the quantity is changed

Question 7

If that leaves TRIGGER as the only option, then I'm guessing denormalizing it into the places_place table would be better than creating another table...?

Question 8

The database would still be doing the same amount of work faster, right? If I want to reduce the amount of work it has to do, I guess it would require the denormalized column (depending on ratio of queries to updates)?

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges511 bronze badges · Answer 1 · 2017-02-21 07:14:17Z

4

You have two easy options

You can use a MATERIALIZED VIEW
You can also use a TRIGGER that inserts another table.

Both of these will allow you to cache the SUM(). I would go with the MATERIALIZED VIEW unless you need up to date changes all the time.

Before you go down that route though PostgreSQL 9.6 enables parallel seq scans and aggregation which will increase performance. In fact, it's an ideal use case. If you just need this to be faster try setting max_parallel_workers_per_gather

Share

Improve this answer

edited Feb 21, 2017 at 14:42

answered Feb 21, 2017 at 7:14

Evan Carroll's user avatar

Evan Carroll Evan Carroll

65.7k50 gold badges259 silver badges511 bronze badges

3

Up to date changes would be needed, it's user-generated data and it would be necessary to have the index update as the quantity is changed

davidtgq
– davidtgq

2017年02月21日 07:17:48 +00:00
Commented Feb 21, 2017 at 7:17
If that leaves TRIGGER as the only option, then I'm guessing denormalizing it into the places_place table would be better than creating another table...?

davidtgq
– davidtgq

2017年02月21日 07:19:48 +00:00
Commented Feb 21, 2017 at 7:19
The database would still be doing the same amount of work faster, right? If I want to reduce the amount of work it has to do, I guess it would require the denormalized column (depending on ratio of queries to updates)?

davidtgq
– davidtgq

2017年02月21日 21:12:42 +00:00
Commented Feb 21, 2017 at 21:12

Add a comment |

Stack Exchange Network

Postgres index for aggregate functions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Postgres index for aggregate functions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions