4

Query:

SELECT
 "places_place"."id"
FROM "places_place"
 LEFT OUTER JOIN "units_unit" ON ("places_place"."id" = "units_unit"."place_id")
GROUP BY "places_place"."id"
HAVING SUM("units_unit"."quantity") >= 123

Index attempt:

CREATE INDEX units_quantity_sum ON units_unit (SUM("units_unit"."quantity"));
-- ERROR: aggregate functions are not allowed in index expressions

Essentially, I want it to index the result of SUM without storing the result in a separate column of the table. How can I create an index to do this (or is there a better way to optimize this query)?

EXPLAIN ANALYZE of query with 10,000 rows in places_place and 25,000 in units_unit:

HashAggregate (cost=2057.31..2157.33 rows=10002 width=4) (actual time=38.121..41.174 rows=7727 loops=1)
 Group Key: places_place.id
 Filter: (sum(units_unit.quantity) >= 5)
 Rows Removed by Filter: 2275
 -> Hash Right Join (cost=594.04..1932.22 rows=25018 width=6) (actual time=6.383..28.578 rows=26727 loops=1)
 Hash Cond: (units_unit.place_id = places_place.id)
 -> Seq Scan on units_unit (cost=0.00..994.18 rows=25018 width=6) (actual time=0.003..7.279 rows=25018 loops=1)
 -> Hash (cost=469.02..469.02 rows=10002 width=4) (actual time=6.311..6.311 rows=10002 loops=1)
 Buckets: 16384 Batches: 1 Memory Usage: 480kB
 -> Seq Scan on places_place (cost=0.00..469.02 rows=10002 width=4) (actual time=0.007..3.560 rows=10002 loops=1)
Planning time: 0.584 ms
Execution time: 42.643 ms
Paul White
95.3k30 gold badges439 silver badges689 bronze badges
asked Feb 21, 2017 at 7:00
3
  • 2
    I don't think you need the join at all. An index on (place_id, quantity) and simplifying the query to SELECT place_id FROM units_unit GROUP BY place_id HAVING SUM(quantity) >= 123 ; would improve efficiency. Commented Feb 21, 2017 at 11:07
  • @ypercubeTM good point, removing the join reduced execution time from 42ms to 20ms. However adding the index CREATE INDEX unit_quantity_index ON units_unit (place_id, quantity) had no effect. Commented Feb 21, 2017 at 21:02
  • 1
    Still very small tables. And 20 ms is quite good. Now try with 10M and 25M rows! Commented Feb 21, 2017 at 23:02

1 Answer 1

4

You have two easy options

  1. You can use a MATERIALIZED VIEW
  2. You can also use a TRIGGER that inserts another table.

Both of these will allow you to cache the SUM(). I would go with the MATERIALIZED VIEW unless you need up to date changes all the time.

Before you go down that route though PostgreSQL 9.6 enables parallel seq scans and aggregation which will increase performance. In fact, it's an ideal use case. If you just need this to be faster try setting max_parallel_workers_per_gather

answered Feb 21, 2017 at 7:14
3
  • Up to date changes would be needed, it's user-generated data and it would be necessary to have the index update as the quantity is changed Commented Feb 21, 2017 at 7:17
  • If that leaves TRIGGER as the only option, then I'm guessing denormalizing it into the places_place table would be better than creating another table...? Commented Feb 21, 2017 at 7:19
  • The database would still be doing the same amount of work faster, right? If I want to reduce the amount of work it has to do, I guess it would require the denormalized column (depending on ratio of queries to updates)? Commented Feb 21, 2017 at 21:12

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.