postgres JSON data

Question 1

I am having trouble grasping how to use JSON in PG. I have a fairly large table (~4M rows) with a JSONB column containing an array of anywhere from zero to a couple hundred "rows", each with several attributes. I believe this is a reasonable approximation of the data:

drop table temp_jd;
create table temp_jd (id serial,d jsonb);
insert into temp_jd (d) values 
 (
 '[
 {"thing":"v1","a1":"bla"},
 {"thing":"v2","a1":"blaugh"},
 {"otherthing":"v1","a1":"something"}
 ]'
 ),
 (
 '[
 {"thing":"v12","a12":"bla"},
 {"thing":"v2","a1":"blaugh"},
 {"morething":"v1","a1":"whatever"}
 ]'
 )
;

I'd ultimately like to query by various bits-n-pieces, and extract aggregated text so I can pretend I have "columns" eg

thing with values (v1; v2) and (v12; v2), or
thing_a1 with values (bla; blaugh) and (bla; blaugh)

I can extract values

select id,jsonb_array_elements(d)->'thing' as thingval from temp_jd;
id | thingval 
----+----------
 1 | "v1"
 1 | "v2"
 1 | 
 2 | "v12"
 2 | "v2"
 2 |

but I can't figure out how to aggregate them as strings.

I can perform basic query operations

select id, jsonb_array_elements(d)->>'morething' from temp_jd where d @> '[{"morething":"v1"}]';
 id | ?column? 
----+----------
 2 | 
 2 | 
 2 | v1

but I'm not sure how to get to key='thing' and thing.a1='blaugh' or key='thing' and thing.a1 LIKE 'blaugh%'

Any help in better understanding this would be greatly appreciated.

Question 2

What exactly is the output you are looking for?

Question 3

What version of Postgres?

Question 4

version is 12.2

Question 5

See above for desired output; I'd like an aggregation of various bits-n-pieces eg v1; v2 in a "column" thing

Question 6

See if this fiddle helps shed some light on what you can do.

This query blows out your jsonb object to where you can reach the keys and values of your objects as columns in separate rows:

select t.id, a.element, a.ind - 1 as array_index, o.key, o.value
 from temp_jd t
 cross join lateral jsonb_array_elements(t.d) 
 with ordinality as a(element, ind)
 cross join lateral jsonb_each(a.element) as o(key, value)
 order by t.id, a.ind, o.key;

To perform an aggregation on this similar to your example:

with blowup as (
 select t.id, a.element, a.ind - 1 as array_index, o.key, o.value
 from temp_jd t
 cross join lateral jsonb_array_elements(t.d) 
 with ordinality as a(element, ind)
 cross join lateral jsonb_each(a.element) as o(key, value)
 order by t.id, a.ind, o.key
)
select id, 
 array_agg(value order by array_index) as things
 from blowup 
 where key = 'thing'
 group by id;

Question 7

Thanks, that's kludgy! I will experiment, but I wonder if that can possibly perform as hundreds of potential "columns" across a few million rows?

Question 8

@user2797314 I will almost guarantee you that the performance will be terrible with your rowcounts. If performance is important to you, then you really cannot beat a normalized model as there has been nearly forty years of work behind making that go real fast. Just reading the jsonb objects and looking at what's inside them is very costly compared to dealing with a bunch of integers. I did some similar work recently, and we pulled stuff in as jsonb for storage and loaded relevant parts into a normalized schema for operational use. Speedup can be hundredfold vs. querying the json directly.

Question 9

Thanks again. I'm coming from normalized tables, hoping a "smart" cache will simplify some things for the ~90% of the traffic that basically just needs a summary of various aspects, maybe it just won't work out that way! (yet?!)

Question 10

@DLM You are very welcome. I wish you luck on what you are doing. One thing hurting you here is that arrays pretty much defeat the ability to index using jsonb_ops or jsonb_path_ops. I suspect there will never be any enhancement in this direction since normalization covers this use case. In case your objects are ugly in terms of keys, you may be able to derive benefit from a child table holding one object in each row with an FK back to temp_jd and indexed with a jsonb_ops gin index.

Mike Organek 12.6k3 gold badges15 silver badges28 bronze badges · Accepted Answer · 2020-08-11 15:09:10Z

2

See if this fiddle helps shed some light on what you can do.

This query blows out your jsonb object to where you can reach the keys and values of your objects as columns in separate rows:

select t.id, a.element, a.ind - 1 as array_index, o.key, o.value
 from temp_jd t
 cross join lateral jsonb_array_elements(t.d) 
 with ordinality as a(element, ind)
 cross join lateral jsonb_each(a.element) as o(key, value)
 order by t.id, a.ind, o.key;

To perform an aggregation on this similar to your example:

with blowup as (
 select t.id, a.element, a.ind - 1 as array_index, o.key, o.value
 from temp_jd t
 cross join lateral jsonb_array_elements(t.d) 
 with ordinality as a(element, ind)
 cross join lateral jsonb_each(a.element) as o(key, value)
 order by t.id, a.ind, o.key
)
select id, 
 array_agg(value order by array_index) as things
 from blowup 
 where key = 'thing'
 group by id;

Share

Improve this answer

answered Aug 11, 2020 at 15:09

Mike Organek's user avatar

Mike Organek

12.6k3 gold badges15 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

DLM

DLM Over a year ago

Thanks, that's kludgy! I will experiment, but I wonder if that can possibly perform as hundreds of potential "columns" across a few million rows?

2020年08月11日T20:33:40.99Z+00:00

Mike Organek

Mike Organek Over a year ago

@user2797314 I will almost guarantee you that the performance will be terrible with your rowcounts. If performance is important to you, then you really cannot beat a normalized model as there has been nearly forty years of work behind making that go real fast. Just reading the jsonb objects and looking at what's inside them is very costly compared to dealing with a bunch of integers. I did some similar work recently, and we pulled stuff in as jsonb for storage and loaded relevant parts into a normalized schema for operational use. Speedup can be hundredfold vs. querying the json directly.

2020年08月11日T20:46:16.13Z+00:00

DLM

DLM Over a year ago

Thanks again. I'm coming from normalized tables, hoping a "smart" cache will simplify some things for the ~90% of the traffic that basically just needs a summary of various aspects, maybe it just won't work out that way! (yet?!)

2020年08月11日T21:04:36.783Z+00:00

Mike Organek

Mike Organek Over a year ago

@DLM You are very welcome. I wish you luck on what you are doing. One thing hurting you here is that arrays pretty much defeat the ability to index using jsonb_ops or jsonb_path_ops. I suspect there will never be any enhancement in this direction since normalization covers this use case. In case your objects are ugly in terms of keys, you may be able to derive benefit from a child table holding one object in each row with an FK back to temp_jd and indexed with a jsonb_ops gin index.

2020年08月11日T21:42:51.29Z+00:00

CollectivesTM on Stack Overflow

postgres JSON data

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related