Suppose I have the following merging function that merges two jsonb values overwriting duplicate keys with the values from the second
create or replace function jsonb_concat(a jsonb, b jsonb) returns jsonb
as
'select 1ドル || 2ドル'
language sql
immutable
parallel safe
;
Since I want to use that function in aggregates I need to define an aggregate function like so
create or replace aggregate jsonb_merge_agg(jsonb)
(
sfunc = jsonb_concat,
stype = jsonb,
initcond = '{}'
);
And suppose I have the following table
id (bigint) | username (text) | event (jsonb) |
---|---|---|
1 | foo | { "it": 1, "key": "bla" } |
2 | foo | { "it": 2, "key" : "dah" } |
3 | bar | {} |
4 | zar | {} |
When I want to aggregate on username column and merge the event column I would use the following query
select username, jsonb_merge_agg(event) from table group by username
And I expect to get the following results, where records with greater id overwrite keys value pairs of earlier records in the aggregate
username | event |
---|---|
foo | { "it": 2, "key": "dah"} |
bar | {} |
zar | {} |
The problem is that I am sometimes seeing that the foo aggregate contains combination { "it": 1, "key": "bla" }
instead. I am aware that Postgres does not have natural row order in the table, so it might be different between transactions. How do I manage the order of the aggregation merge?
2 Answers 2
In the Aggregate functions page (chapter 9, section 20 in postgresql 12 documentation "Aggregate Functions" (1)) there's a paragraph specifically addressing this
The aggregate functions array_agg, json_agg, jsonb_agg, json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7.
And in the section 4.2.7 (2) Value expressions/Aggregate expressions the syntax is explained as following
aggregate_name (expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]
as a result, to get consistent merge order the query must be modified to the following
select username, jsonb_merge_agg(event order by id)
1: https://www.postgresql.org/docs/12/functions-aggregate.html
2: https://www.postgresql.org/docs/12/sql-expressions.html#SYNTAX-AGGREGATES
-
Same approach can be taken in newer versions than psql12.Dragas– Dragas2024年01月17日 09:22:51 +00:00Commented Jan 17, 2024 at 9:22
The manual goes on to say:
... Alternatively, supplying the input values from a sorted subquery will usually work. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
In fact, this is typically much faster:
SELECT username, jsonb_merge_agg(event) AS event
FROM (
SELECT username, event
FROM tbl
ORDER BY username, id
) sub
GROUP BY username;
Because a single sort operation is much cheaper than a separate sort per user.
As long as the outer query level does nothing to reorder rows before the aggregation, this is safe.
Related:
- How to apply ORDER BY and LIMIT in combination with an aggregate function?
- Multiple to_json(array_agg), separate joins
Aside
If the use case is as simple as your sample suggests - effectively taking the latest value per user - a plain DISTINCT ON
query does it:
SELECT DISTINCT ON (username)
username, COALESCE(event, '{}')
FROM tbl
ORDER BY username, event IS NULL, id;
(I presume your real case actually merges values. Then this is not applicable.)
The 2nd ORDER BY
term event IS NULL
makes sure null
values are ignored like in your aggregate function - by sorting them last. See:
Now, why would I claim that your aggregate function ignores null values, when your custom function jsonb_concat(jsonb, jsonb)
isn't defined STRICT
?
Postgres already has a built-in function jsonb_concat(jsonb, jsonb)
, and that one is STRICT
. Your CREATE AGGREGATE
uses the unqualified function name, and in a sane setup the system schema pg_catalog
takes precedence. So the aggregate actually uses the built-in (STRICT
) version of the function. (Incidentally a good thing as that one is much faster!) Your demo is quite a trap. (Maybe trapping you already?)
-
That's correct. I missed a point to specify that the event jsonb column shall never contain null values, and that the merged result shall have more keys than just the latest available value for that particular username record.Dragas– Dragas2024年01月18日 10:26:17 +00:00Commented Jan 18, 2024 at 10:26
jsonb_concat()
that hides your custom function, which goes unused. I added details to my answer.comment on function jsonb_concat(unknown, unknown) is 'implementation of || operator';