4

Suppose I have the following merging function that merges two jsonb values overwriting duplicate keys with the values from the second

create or replace function jsonb_concat(a jsonb, b jsonb) returns jsonb
as
'select 1ドル || 2ドル'
 language sql
 immutable
 parallel safe
;

Since I want to use that function in aggregates I need to define an aggregate function like so

create or replace aggregate jsonb_merge_agg(jsonb)
(
 sfunc = jsonb_concat,
 stype = jsonb,
 initcond = '{}'
);

And suppose I have the following table

id (bigint) username (text) event (jsonb)
1 foo { "it": 1, "key": "bla" }
2 foo { "it": 2, "key" : "dah" }
3 bar {}
4 zar {}

When I want to aggregate on username column and merge the event column I would use the following query

select username, jsonb_merge_agg(event) from table group by username

And I expect to get the following results, where records with greater id overwrite keys value pairs of earlier records in the aggregate

username event
foo { "it": 2, "key": "dah"}
bar {}
zar {}

The problem is that I am sometimes seeing that the foo aggregate contains combination { "it": 1, "key": "bla" } instead. I am aware that Postgres does not have natural row order in the table, so it might be different between transactions. How do I manage the order of the aggregation merge?

Erwin Brandstetter
186k28 gold badges463 silver badges636 bronze badges
asked Jan 17, 2024 at 9:22
3
  • Be aware that there is a built-in jsonb_concat() that hides your custom function, which goes unused. I added details to my answer. Commented Jan 18, 2024 at 0:19
  • @ErwinBrandstetter that's a good point, but I'm not seeing it being referenced somewhere in the documentation (such as json functions page 9.5). I do see the comment on such function that comment on function jsonb_concat(unknown, unknown) is 'implementation of || operator'; Commented Jan 18, 2024 at 10:12
  • 1
    There are many undocumented internal funstions. Schema-qualify the function name to make sure that yours gets used. (Or don't, in this particular case, and use the superior built-in function.) Commented Jan 18, 2024 at 16:43

2 Answers 2

4

In the Aggregate functions page (chapter 9, section 20 in postgresql 12 documentation "Aggregate Functions" (1)) there's a paragraph specifically addressing this

The aggregate functions array_agg, json_agg, jsonb_agg, json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7.

And in the section 4.2.7 (2) Value expressions/Aggregate expressions the syntax is explained as following

aggregate_name (expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]

as a result, to get consistent merge order the query must be modified to the following

select username, jsonb_merge_agg(event order by id)

1: https://www.postgresql.org/docs/12/functions-aggregate.html

2: https://www.postgresql.org/docs/12/sql-expressions.html#SYNTAX-AGGREGATES

answered Jan 17, 2024 at 9:22
1
  • Same approach can be taken in newer versions than psql12. Commented Jan 17, 2024 at 9:22
1

The manual goes on to say:

... Alternatively, supplying the input values from a sorted subquery will usually work. For example:

SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

In fact, this is typically much faster:

SELECT username, jsonb_merge_agg(event) AS event
FROM (
 SELECT username, event
 FROM tbl
 ORDER BY username, id
 ) sub
GROUP BY username;

fiddle

Because a single sort operation is much cheaper than a separate sort per user.

As long as the outer query level does nothing to reorder rows before the aggregation, this is safe.

Related:

Aside

If the use case is as simple as your sample suggests - effectively taking the latest value per user - a plain DISTINCT ON query does it:

SELECT DISTINCT ON (username)
 username, COALESCE(event, '{}')
FROM tbl
ORDER BY username, event IS NULL, id;

fiddle

(I presume your real case actually merges values. Then this is not applicable.)

The 2nd ORDER BY term event IS NULL makes sure null values are ignored like in your aggregate function - by sorting them last. See:

Now, why would I claim that your aggregate function ignores null values, when your custom function jsonb_concat(jsonb, jsonb) isn't defined STRICT? Postgres already has a built-in function jsonb_concat(jsonb, jsonb), and that one is STRICT. Your CREATE AGGREGATE uses the unqualified function name, and in a sane setup the system schema pg_catalog takes precedence. So the aggregate actually uses the built-in (STRICT) version of the function. (Incidentally a good thing as that one is much faster!) Your demo is quite a trap. (Maybe trapping you already?)

answered Jan 17, 2024 at 23:05
1
  • That's correct. I missed a point to specify that the event jsonb column shall never contain null values, and that the merged result shall have more keys than just the latest available value for that particular username record. Commented Jan 18, 2024 at 10:26

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.