I have a query with a group by that either performs a sort followed by group aggregate or a hash aggregate. I noticed that when adding an array_agg function, the query never tries to perform a hash aggregate increasing the query time. I have set my work_mem for the transaction to a really high number so there is no memory constraint.
Does postgres never use hashaggregate if there is an array_agg function in the aggregation?
This is the format of the query in question:
SELECT
"test"."db"."a",
"test"."db"."b",
"test"."db"."c",
"test"."db"."d",
"test"."db"."e",
"test"."db"."f",
"test"."db"."g",
array_agg(ARRAY["test"."db"."resource_type", "test"."db"."resource_name", "test"."db"."pattern_type"]) FILTER (WHERE "test"."db"."deleted" is FALSE) AS "resources"
FROM
"test"."db"
WHERE ("test"."db"."last_change_id" BETWEEN 0 AND val3
AND "test"."db"."c" = 'val1'
AND "test"."db"."d" = 'val2')
GROUP BY
"test"."db"."a",
"test"."db"."b",
"test"."db"."c",
"test"."db"."d",
"test"."db"."e",
"test"."db"."f",
"test"."db"."g"
The plan executed is the following:
GroupAggregate (cost=180461.63..183740.70 rows=84788 width=147) (actual time=3310.049..3764.216 rows=119173 loops=1)
Group Key: a, b, c, d, e, f, g
-> Sort (cost=180461.63..180708.21 rows=98632 width=151) (actual time=3310.030..3394.925 rows=513793 loops=1)
Sort Key: a, b, e, f, g
Sort Method: quicksort Memory: 148765kB
-> Gather (cost=1000.00..172280.22 rows=98632 width=151) (actual time=2.590..478.026 rows=513793 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on role_binding (cost=0.00..161417.02 rows=41097 width=151) (actual time=0.897..414.638 rows=171264 loops=3)
Filter: ((last_change_id >= 0) AND (last_change_id <= val3) AND (c = 'val1'::text) AND (d = 'val2'::text))
Rows Removed by Filter: 826426
Planning Time: 0.181 ms
Execution Time: 3782.427 ms
1 Answer 1
In my hands in 12.9 it uses the hash agg as long as the work_mem is large enough that it is confident the data will fit in memory. Without having your actual data, I don't know what large enough would mean for you, but a few Gig would probably do it.
In later versions, hash agg was taught to spill to disk, which makes the planner more eager to use it.
ORDER BY
clause in the aggregate function?