Read specific fields from Postgres jsonb

Question 1

I have data like

{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}]}
{"name": "b", "scope": "2", "items": [{"code": "x", "description": "xd"}]}
{"name": "c", "scope": "3", "items": [{"code": "x", "description": "xd"}]}
{"name": "d", "scope": "4", "items": [{"code": "x", "description": "xd"}]}

I want to filter out some fields in the json objects in my SELECT result, and the result could be something like:

{"name": "a","items": [{"code": "x"}, {"code": "x2"}]}
{"name": "b","items": [{"code": "x"}]}
{"name": "c","items": [{"code": "x"}]}
{"name": "d","items": [{"code": "x"}]}

Question 2

Hi and welcome to the forum! Will the text to be removed always be "scope" followed by a number in double quotes and always "description" followed by a string in double quotes? Or are the patterns more difficult than that?

Question 3

Also, you have "a","items": do you require a space between the "a", and "items":?

Question 4

Welcome. In order to help us help you can you follow the steps here dba.stackexchange.com/help/minimal-reproducible-example

Question 5

your_jsonb_column - 'scope' would be the first step. But for the nested array elements, you will need to iterate over them, remove the items you don't want and the put them together using jsonb_agg()

Question 6

I want to filter out some fields from the json object, spaces don't matter.

Question 7

Well, it's not pretty (see fiddle here):

CREATE TABLE test
(
 j_str TEXT NOT NULL
);

Populate it:

INSERT INTO test VALUES 
('{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}]}'),
('{"name": "b", "scope": "2", "items": [{"code": "x", "description": "xd"}]}'), 
('{"name": "c", "scope": "3", "items": [{"code": "x", "description": "xd"}]}'),
('{"name": "d", "scope": "4", "items": [{"code": "x", "description": "xd"}]}'),
('{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}, {"code": "x3", "description": "xd3"}]}');

Notice that I've added a record with 3 codes!

And the first step (see fiddle):

SELECT
REGEXP_REPLACE
(
 j_str, 
 '(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")', -- ("\w10,15": "\w{2,10}")', 
 '1円 3円'
) FROM test;

which gives:

regexp_replace
{"name": "a", "items": [{"code": "x"}, {"code": "x2", "description": "xd2"}]}
{"name": "b", "items": [{"code": "x"}]}
{"name": "c", "items": [{"code": "x"}]}
{"name": "d", "items": [{"code": "x"}]}
{"name": "a", "items": [{"code": "x"}, {"code": "x2", "description": "xd2"}, {"code": "x3", "description": "xd3"}]}

Explanation of the regexp:

'(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")',
'1円 3円'

Start from the beginning of the string (^) anchor, then go to the first occurrence of the word " scope" (note preceding space) which is then followed by a double-quote and then by 1 to 3 digits (\d{1,3}) followed by a double-quote then a comma and another space then followed by "\w{5}".... - the rest of the string! The round brackets (...) are "capturing groups" - so in the replacement string I have 1円 (means the first capturing group) followed by the third - so the second one is deleted.

and then:

SELECT
REGEXP_REPLACE
(
 REGEXP_REPLACE
 (
 j_str, 
 '(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")', 
 '1円 3円'
 ), 
 '\{("code": "\w{1,10}")(, "\w{10,20}": "\w{1,9}")(\})', 
 '{1円3円', 'g' 
)
FROM test;

Result:

regexp_replace
{"name": "a", "items": [{"code": "x"}, {"code": "x2"}]}
{"name": "b", "items": [{"code": "x"}]}
{"name": "c", "items": [{"code": "x"}]}
{"name": "d", "items": [{"code": "x"}]}
{"name": "a", "items": [{"code": "x"}, {"code": "x2"}, {"code": "x3"}]}

So, you can see that the data is in your desired format! With a bit of work, it should be possible to use one single regexp for all of this - I can see a way of doing it which shouldn't require a specific "scope" or "code" words to be present - i.e. formulate the capturing groups in such a way as to combine regex 1 with the second one. Might be a good exercise?

Vérace Vérace 31k9 gold badges73 silver badges86 bronze badges · Answer 1 · 2021-02-02 14:45:05Z

Well, it's not pretty (see fiddle here):

CREATE TABLE test
(
 j_str TEXT NOT NULL
);

Populate it:

INSERT INTO test VALUES 
('{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}]}'),
('{"name": "b", "scope": "2", "items": [{"code": "x", "description": "xd"}]}'), 
('{"name": "c", "scope": "3", "items": [{"code": "x", "description": "xd"}]}'),
('{"name": "d", "scope": "4", "items": [{"code": "x", "description": "xd"}]}'),
('{"name": "a", "scope": "1", "items": [{"code": "x", "description": "xd"}, {"code": "x2", "description": "xd2"}, {"code": "x3", "description": "xd3"}]}');

Notice that I've added a record with 3 codes!

And the first step (see fiddle):

SELECT
REGEXP_REPLACE
(
 j_str, 
 '(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")', -- ("\w10,15": "\w{2,10}")', 
 '1円 3円'
) FROM test;

which gives:

regexp_replace
{"name": "a", "items": [{"code": "x"}, {"code": "x2", "description": "xd2"}]}
{"name": "b", "items": [{"code": "x"}]}
{"name": "c", "items": [{"code": "x"}]}
{"name": "d", "items": [{"code": "x"}]}
{"name": "a", "items": [{"code": "x"}, {"code": "x2", "description": "xd2"}, {"code": "x3", "description": "xd3"}]}

Explanation of the regexp:

'(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")',
'1円 3円'

Start from the beginning of the string (^) anchor, then go to the first occurrence of the word " scope" (note preceding space) which is then followed by a double-quote and then by 1 to 3 digits (\d{1,3}) followed by a double-quote then a comma and another space then followed by "\w{5}".... - the rest of the string! The round brackets (...) are "capturing groups" - so in the replacement string I have 1円 (means the first capturing group) followed by the third - so the second one is deleted.

and then:

SELECT
REGEXP_REPLACE
(
 REGEXP_REPLACE
 (
 j_str, 
 '(^.*,)( "scope": "\d{1,3}", )("\w{5}": \[\{"\w{2,10}": "\w{1,5}")(, "\w{10,15}": "\w{1,10}")', 
 '1円 3円'
 ), 
 '\{("code": "\w{1,10}")(, "\w{10,20}": "\w{1,9}")(\})', 
 '{1円3円', 'g' 
)
FROM test;

Result:

regexp_replace
{"name": "a", "items": [{"code": "x"}, {"code": "x2"}]}
{"name": "b", "items": [{"code": "x"}]}
{"name": "c", "items": [{"code": "x"}]}
{"name": "d", "items": [{"code": "x"}]}
{"name": "a", "items": [{"code": "x"}, {"code": "x2"}, {"code": "x3"}]}

So, you can see that the data is in your desired format! With a bit of work, it should be possible to use one single regexp for all of this - I can see a way of doing it which shouldn't require a specific "scope" or "code" words to be present - i.e. formulate the capturing groups in such a way as to combine regex 1 with the second one. Might be a good exercise?

Stack Exchange Network

Read specific fields from Postgres jsonb

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Read specific fields from Postgres jsonb

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions