I have data like
{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}]}
{"name": "b", "scope": "2", "items": [{"code": "x", "description": "xd"}]}
{"name": "c", "scope": "3", "items": [{"code": "x", "description": "xd"}]}
{"name": "d", "scope": "4", "items": [{"code": "x", "description": "xd"}]}
I want to filter out some fields in the json objects in my SELECT result, and the result could be something like:
{"name": "a","items": [{"code": "x"}, {"code": "x2"}]}
{"name": "b","items": [{"code": "x"}]}
{"name": "c","items": [{"code": "x"}]}
{"name": "d","items": [{"code": "x"}]}
1 Answer 1
Well, it's not pretty (see fiddle here):
CREATE TABLE test
(
j_str TEXT NOT NULL
);
Populate it:
INSERT INTO test VALUES
('{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}]}'),
('{"name": "b", "scope": "2", "items": [{"code": "x", "description": "xd"}]}'),
('{"name": "c", "scope": "3", "items": [{"code": "x", "description": "xd"}]}'),
('{"name": "d", "scope": "4", "items": [{"code": "x", "description": "xd"}]}'),
('{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}, {"code": "x3", "description": "xd3"}]}');
Notice that I've added a record with 3 codes!
And the first step (see fiddle):
SELECT
REGEXP_REPLACE
(
j_str,
'(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")', -- ("\w10,15": "\w{2,10}")',
'1円 3円'
) FROM test;
which gives:
regexp_replace
{"name": "a", "items": [{"code": "x"}, {"code": "x2", "description": "xd2"}]}
{"name": "b", "items": [{"code": "x"}]}
{"name": "c", "items": [{"code": "x"}]}
{"name": "d", "items": [{"code": "x"}]}
{"name": "a", "items": [{"code": "x"}, {"code": "x2", "description": "xd2"}, {"code": "x3", "description": "xd3"}]}
Explanation of the regexp:
'(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")',
'1円 3円'
Start from the beginning of the string (^
) anchor, then go to the first occurrence of the word " scope" (note preceding space) which is then followed by a double-quote and then by 1 to 3 digits (\d{1,3}
) followed by a double-quote then a comma and another space then followed by "\w{5}"....
- the rest of the string! The round brackets (...)
are "capturing groups"
- so in the replacement string I have 1円 (means the first capturing group) followed by the third - so the second one is deleted.
and then:
SELECT
REGEXP_REPLACE
(
REGEXP_REPLACE
(
j_str,
'(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")',
'1円 3円'
),
'\{("code": "\w{1,10}")(, "\w{10,20}": "\w{1,9}")(\})',
'{1円3円', 'g'
)
FROM test;
Result:
regexp_replace
{"name": "a", "items": [{"code": "x"}, {"code": "x2"}]}
{"name": "b", "items": [{"code": "x"}]}
{"name": "c", "items": [{"code": "x"}]}
{"name": "d", "items": [{"code": "x"}]}
{"name": "a", "items": [{"code": "x"}, {"code": "x2"}, {"code": "x3"}]}
So, you can see that the data is in your desired format! With a bit of work, it should be possible to use one single regexp for all of this - I can see a way of doing it which shouldn't require a specific "scope"
or "code"
words to be present - i.e. formulate the capturing groups in such a way as to combine regex 1 with the second one. Might be a good exercise?
"scope"
followed by a number in double quotes and always"description"
followed by a string in double quotes? Or are the patterns more difficult than that?"a","items":
do you require a space between the"a",
and"items":
?your_jsonb_column - 'scope'
would be the first step. But for the nested array elements, you will need to iterate over them, remove the items you don't want and the put them together usingjsonb_agg()