Requirements and situation
We're currently using a JSONB column to speed up somewhat arbitrary searches in our database and this is working fine so far. When updating the data the requirement is as follows:
- Any new value will overwrite any existing value. This includes nulls and arrays.
- Any (nested) objects will be merged.
- Null values as well as empty arrays and objects will be removed (should they exist at all).
To illustrate that, consider this example:
Existing (contains a null value for illustration purposes):
{
"a":null,
"b":1,
"c":1,
"f":1,
"g": {
"nested": 1
}
}
This should be merged into the existing object:
{
"b":2,
"d":null,
"e":2,
"f":null,
"g":{
"nested": 2
}
}
As you can see we're overriding a few fields and remove f
. So the expected output would be:
{
"b": 2, //overridden
"c": 1, //kept
"e": 2, //added
"g": {
"nested": 2 //overridden
}
}
To achieve this we're using the following function:
CREATE OR REPLACE FUNCTION jsonb_merge(jsonb1 JSONB, jsonb2 JSONB)
RETURNS JSONB LANGUAGE sql IMMUTABLE
AS $$
SELECT
CASE
WHEN jsonb_typeof(1ドル) = 'object' AND jsonb_typeof(2ドル) = 'object' THEN
(
SELECT jsonb_object_agg(merged.key, merged.value) FROM
(
SELECT
COALESCE( p1.key, p2.key ) as key,
CASE
WHEN p1.key IS NULL then p2.value
WHEN p2.key IS NULL THEN p1.value
ELSE jsonb_merge( p1.value, p2.value )
END AS value
FROM jsonb_each(1ドル) p1
FULL OUTER JOIN jsonb_each(2ドル) p2 ON p1.key = p2.key
) AS merged
-- Removing this condition reduces runtime by 70%
WHERE NOT (merged.value IS NULL OR merged.value in ( '[]', 'null', '{}') )
)
WHEN jsonb_typeof(2ドル) = 'null' OR (jsonb_typeof(2ドル) = 'array' AND jsonb_array_length(2ドル) < 1) OR 2ドル = '{}' THEN
NULL
ELSE
2ドル
END
$$;
Problem and question
As I said this works quite well from a functional point of view. However, that function is very slow.
One finding was that the condition on merged.value
slows the query down. Removing it results in about 70% lower execution time but obviously the result is not as required.
So how could we achieve a fast deep merge of jsonb objects?
Please note that the Postgres 9.5 ||
operator doesn't work as intended, i.e. it keeps the unwanted elements. While we could use the special remove operations that would complicate our queries and I'm not sure that would be faster.
Considered options
So far we considered the following (unsatisfying) options:
- Upgrade our 9.4 server to 9.5 or 9.6. Problem: the new operator doesn't work the way we need it to, so we'd still have to use a function or heavily refactor our queries. Additionally upgrading or even restarting our production servers is something we're meant to avoid as much as possible.
- Use some scripting language like Python etc. Again there's the problem of having to avoid server restarts. Additionally we'd have to do a full security review first.
That being said we'd like to solve the problem on Postgres 9.4 and using SQL or PL/pgSQL if possible.
Update
I experimented a little further and found the following function pass my tests and be way faster than my previous (10x). I'm pretty sure this works as intended but since I'm no database expert any second look is welcome:
CREATE OR REPLACE FUNCTION jsonb_update(jsonb1 JSONB, jsonb2 JSONB)
RETURNS JSONB LANGUAGE sql IMMUTABLE
AS $$
SELECT json_object_agg(merged.key, merged.value)::jsonb FROM
(
WITH existing_object AS (
SELECT key, value FROM jsonb_each(1ドル)
WHERE NOT (value IS NULL OR value in ('[]', 'null', '{}') )
),
new_object AS (
SELECT key, value FROM jsonb_each(2ドル)
),
deep_merge AS (
SELECT lft.key as key, jsonb_merge( lft.value, rgt.value ) as value
FROM existing_object lft
JOIN new_object rgt ON ( lft.key = rgt.key)
AND jsonb_typeof( lft.value ) = 'object'
AND jsonb_typeof( rgt.value ) = 'object'
)
-- Any non-empty element of jsonb1 that's not in jsonb2 (the keepers)
SELECT key, value FROM existing_object
WHERE key NOT IN (SELECT key FROM new_object )
UNION
-- Any non-empty element from jsonb2 that's not to be deep merged (the simple updates and new elements)
SELECT key, value FROM new_object
WHERE key NOT IN (SELECT key FROM deep_merge )
AND NOT (value IS NULL OR value in ('[]', 'null', '{}') )
UNION
-- All elements that need a deep merge
SELECT key, value FROM deep_merge
) AS merged
$$;
5 Answers 5
I am with a_horse on this: Upgrade to Postgres 9.6 to have new options at your disposal (and for other reasons).
While stuck with 9.4, it might help to simplify like this:
CREATE OR REPLACE FUNCTION jsonb_merge2(jsonb1 JSONB, jsonb2 JSONB)
RETURNS JSONB LANGUAGE sql IMMUTABLE AS
$func$
SELECT
CASE
WHEN jsonb_typeof(1ドル) = 'object' AND jsonb_typeof(2ドル) = 'object' THEN
(
SELECT jsonb_object_agg(merged.key, merged.value)
FROM (
SELECT key
, CASE WHEN p1.value <> p2.value -- implies both are NOT NULL
THEN jsonb_merge2(p1.value, p2.value)
ELSE COALESCE(p2.value, p1.value) -- p2 trumps p1
END AS value
FROM jsonb_each(1ドル) p1
FULL JOIN jsonb_each(2ドル) p2 USING (key) -- USING helps to simplify
) AS merged
WHERE merged.value IS NOT NULL -- simpler, might help query planner
AND merged.value NOT IN ( '[]', 'null', '{}' )
)
WHEN 2ドル IN ( '[]', 'null', '{}' ) THEN -- just as simple as above
NULL
ELSE
2ドル
END
$func$;
-
Thx, I'll try to convince our infrastructure people :) In the meantime I'll try to improve the situation we have - and I might have found a faster solution (see my update). Does that look reasonable?Thomas– Thomas2017年03月03日 15:36:51 +00:00Commented Mar 3, 2017 at 15:36
-
2nice. how would this function look in Pg >= 9.5 ?LiorH– LiorH2017年07月13日 14:31:15 +00:00Commented Jul 13, 2017 at 14:31
A solution:
create or replace function jsonb_merge_recurse(orig jsonb, delta jsonb)
returns jsonb language sql as $$
select
jsonb_object_agg(
coalesce(keyOrig, keyDelta),
case
when valOrig isnull then valDelta
when valDelta isnull then valOrig
when (jsonb_typeof(valOrig) <> 'object' or jsonb_typeof(valDelta) <> 'object') then valDelta
else jsonb_merge_recurse(valOrig, valDelta)
end
)
from jsonb_each(orig) e1(keyOrig, valOrig)
full join jsonb_each(delta) e2(keyDelta, valDelta) on keyOrig = keyDelta
$$;
Found here
Usage Example: merging two input tables into one json documents column:
Given table A, a standard sql table with columns; and table B, containing a single JSONB column named 'doc, like the following target table and its jdoc column.
create unlogged table target(jdoc jsonb);
CREATE INDEX idx_gin_target ON target USING GIN ((jdoc->'keyA'),(jdoc->'keyB'),(jdoc->'jkeyC'),(doc->'jkeyD'));
insert into target select jsonb_merge_recurse(t2.doc,to_jsonb(t1)) from A t1 join B t2 on t1."keyA" = t2.doc->>'keyA' and t1."keyB"::TEXT = t2.doc->>'keyB';
the result table contains a json column containing json documents made of the merge of json documents fields included in table B with column-values of table A.
To use the recursive power of the jsonb_merge_recurse function, better merge two json column each containing multi-leveled json documents.
-
For non-performance sensitive needs, this is very simple (and easy to understand).Josh Mc– Josh Mc2020年01月07日 22:05:01 +00:00Commented Jan 7, 2020 at 22:05
-
Thanks for the feed back @JoshMc , did you compare the speed with other methods ? Which one was the fastest and or efficient ?user1767316– user17673162020年01月11日 20:11:38 +00:00Commented Jan 11, 2020 at 20:11
Upgrade our 9.4 server to 9.5 or 9.6. Problem: the new operator doesn't work the way we need it to, so we'd still have to use a function or heavily refactor our queries. Additionally upgrading or even restarting our production servers is something we're meant to avoid as much as possible.
I think it does.
Using PostgreSQL 9.5+
First we need to create an aggregate function that does what you want, I copied this from here
CREATE AGGREGATE jsonb_object_agg(jsonb) (
SFUNC = 'jsonb_concat',
STYPE = jsonb,
INITCOND = '{}'
);
Now we use it with jsonb_strip_nulls
SELECT jsonb_pretty(
jsonb_strip_nulls(jsonb_object_agg(d ORDER BY id))
)
FROM ( VALUES
( 1, '{ "a":null, "b":1, "c":1, "f":1, "g": { "nested": 1 } }'::jsonb ),
( 2, '{ "b":2, "d":null, "e":2, "f":null, "g":{ "nested": 2 } }' )
) AS t(id,d);
{
"b": 2,
"c": 1,
"e": 2,
"g": {
"nested": 2
}
}
What you wanted (so you don't have to scroll).
{
"b": 2, //overridden
"c": 1, //kept
"e": 2, //added
"g": {
"nested": 2 //overridden
}
}
-
1Looks interesting, I'll check this out once I have my local Postgres 9.6 instance up and running again. Thanks for sharing.Thomas– Thomas2017年10月04日 13:48:36 +00:00Commented Oct 4, 2017 at 13:48
-
2This does not do a deep merge. The
g
object in the second row completely replaces the first.Knyri– Knyri2018年11月05日 19:52:10 +00:00Commented Nov 5, 2018 at 19:52
Here's a heavily rewritten yet optimized version (about 20 times faster than the original, about as fast as the updated version in my question but more correct in terms of requirements - the update had some flaws when it comes to "essentially empty" objects, see below for what those are):
It is rewritten in PlPgSQL and thus very "pretty". ;)
First, we need a new type for the merge function which basically represents the records that are returned by jsonb_each(...)
.
CREATE TYPE jsonEachRecord AS (KEY text, value JSONB);
Then there's a function to check whether a json object is "essentially empty", which means it only contains null values, empty objects, empty arrays or nested "essentially empty" objects.
CREATE OR REPLACE FUNCTION jsonb_is_essentially_empty(jsonb1 jsonb )
RETURNS BOOLEAN LANGUAGE plpgsql IMMUTABLE
AS $func$
DECLARE
result BOOLEAN = TRUE;
r RECORD;
BEGIN
IF jsonb_typeof( jsonb1 ) <> 'object' THEN
IF jsonb1 IS NOT NULL AND jsonb1 NOT in ('[]', 'null') THEN
result = FALSE;
END IF;
ELSE
for r in SELECT key, value FROM jsonb_each(jsonb1) loop
if jsonb_typeof(r.value) = 'object' then
IF NOT jsonb_is_essentially_empty(r.value) THEN
result = FALSE;
exit;
end if;
ELSE
IF r.value IS NOT NULL AND r.value NOT IN ('[]', 'null', '{}') THEN
result = FALSE;
exit;
END IF;
END IF;
end loop;
END IF;
return result;
END;
$func$;
Finally here's the actual merge function:
CREATE OR REPLACE FUNCTION jsonb_merge(jsonb1 JSONB, jsonb2 JSONB)
RETURNS JSONB LANGUAGE plpgsql IMMUTABLE
AS $func$
DECLARE
result jsonEachRecord[];
json_property jsonEachRecord;
idx int;
origArrayLength INT;
mergedValue JSONB;
mergedRecord jsonEachRecord;
BEGIN
FOR json_property IN (SELECT key, value FROM jsonb_each(jsonb1) ORDER BY key) LOOP
IF json_property.value IS NOT NULL AND json_property.value NOT IN ('[]', 'null', '{}') THEN
result = array_append(result, json_property);
END IF;
END LOOP;
idx = 1;
origArrayLength = array_length( result, 1);
FOR json_property IN (SELECT key, value FROM jsonb_each(jsonb2) ORDER BY key) LOOP
WHILE result[idx].key < json_property.key AND idx <= origArrayLength LOOP
idx = idx + 1;
END LOOP;
IF idx > origArrayLength THEN
IF NOT jsonb_is_essentially_empty( json_property.value ) THEN
result = array_append(result, json_property);
END IF;
ELSIF result[idx].key = json_property.key THEN
if jsonb_typeof(result[idx].value) = 'object' AND jsonb_typeof(json_property.value) = 'object' THEN
mergedValue = jsonb_merge( result[idx].value, json_property.value );
mergedRecord.key = json_property.key;
mergedRecord.value = mergedValue;
result[idx] = mergedRecord;
ELSE
result[idx] = json_property;
END IF;
idx = idx + 1;
ELSE
IF NOT jsonb_is_essentially_empty( json_property.value ) THEN
result = array_append(result, json_property);
END IF;
END IF;
END LOOP;
-- remove any remaining potentially empty elements
IF result IS NOT NULL THEN
FOR i IN REVERSE array_length( result, 1)..1 LOOP
IF jsonb_is_essentially_empty( result[i].value ) THEN
result = array_remove(result, result[i] );
END IF;
END LOOP;
END IF;
return (select json_object_agg(key, value) from unnest(result));
END;
$func$;
As you can see it's quite ugly and probably still as a bunch of flaws, e.g. the constant unnesting of arrays and aggregating of json objects. From what I read elsewhere accessing arrays via index requires the engine to start from the front element again each time you access an element so that could probably be optimized as well.
Probably the best solution is to write it in C, it is more straightforward than it sounds.
I had a similar problem deep adding JSONBs, the native solution was 4x faster than SQL with just one layer of depth.
This is the repo for the deep sum. It should be easy to adapt it to deep merge.
I've ran the code in postgres 9.6, but it might work in 9.4
-
Thanks for the link, I'll look into it. Unfortunately, as stated in my question, having to restart our database server to add a new native library (or would it work without restarting?) is something that's not easily accepted by the team leads.Thomas– Thomas2017年10月06日 10:00:58 +00:00Commented Oct 6, 2017 at 10:00
-
@Thomas, there's no need to restart the database to add a native extension.Gabriel Furstenheim– Gabriel Furstenheim2017年10月06日 11:58:02 +00:00Commented Oct 6, 2017 at 11:58
jsonb_strip_nulls()