Query:
SELECT
"places_place"."id"
FROM "places_place"
LEFT OUTER JOIN "units_unit" ON ("places_place"."id" = "units_unit"."place_id")
GROUP BY "places_place"."id"
HAVING SUM("units_unit"."quantity") >= 123
Index attempt:
CREATE INDEX units_quantity_sum ON units_unit (SUM("units_unit"."quantity"));
-- ERROR: aggregate functions are not allowed in index expressions
Essentially, I want it to index the result of SUM
without storing the result in a separate column of the table. How can I create an index to do this (or is there a better way to optimize this query)?
EXPLAIN ANALYZE
of query with 10,000 rows in places_place
and 25,000 in units_unit
:
HashAggregate (cost=2057.31..2157.33 rows=10002 width=4) (actual time=38.121..41.174 rows=7727 loops=1)
Group Key: places_place.id
Filter: (sum(units_unit.quantity) >= 5)
Rows Removed by Filter: 2275
-> Hash Right Join (cost=594.04..1932.22 rows=25018 width=6) (actual time=6.383..28.578 rows=26727 loops=1)
Hash Cond: (units_unit.place_id = places_place.id)
-> Seq Scan on units_unit (cost=0.00..994.18 rows=25018 width=6) (actual time=0.003..7.279 rows=25018 loops=1)
-> Hash (cost=469.02..469.02 rows=10002 width=4) (actual time=6.311..6.311 rows=10002 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 480kB
-> Seq Scan on places_place (cost=0.00..469.02 rows=10002 width=4) (actual time=0.007..3.560 rows=10002 loops=1)
Planning time: 0.584 ms
Execution time: 42.643 ms
1 Answer 1
You have two easy options
- You can use a
MATERIALIZED VIEW
- You can also use a
TRIGGER
that inserts another table.
Both of these will allow you to cache the SUM()
. I would go with the MATERIALIZED VIEW
unless you need up to date changes all the time.
Before you go down that route though PostgreSQL 9.6 enables parallel seq scans and aggregation which will increase performance. In fact, it's an ideal use case. If you just need this to be faster try setting max_parallel_workers_per_gather
-
Up to date changes would be needed, it's user-generated data and it would be necessary to have the index update as the
quantity
is changeddavidtgq– davidtgq2017年02月21日 07:17:48 +00:00Commented Feb 21, 2017 at 7:17 -
If that leaves
TRIGGER
as the only option, then I'm guessing denormalizing it into theplaces_place
table would be better than creating another table...?davidtgq– davidtgq2017年02月21日 07:19:48 +00:00Commented Feb 21, 2017 at 7:19 -
The database would still be doing the same amount of work faster, right? If I want to reduce the amount of work it has to do, I guess it would require the denormalized column (depending on ratio of queries to updates)?davidtgq– davidtgq2017年02月21日 21:12:42 +00:00Commented Feb 21, 2017 at 21:12
Explore related questions
See similar questions with these tags.
(place_id, quantity)
and simplifying the query toSELECT place_id FROM units_unit GROUP BY place_id HAVING SUM(quantity) >= 123 ;
would improve efficiency.CREATE INDEX unit_quantity_index ON units_unit (place_id, quantity)
had no effect.