2

I am having trouble grasping how to use JSON in PG. I have a fairly large table (~4M rows) with a JSONB column containing an array of anywhere from zero to a couple hundred "rows", each with several attributes. I believe this is a reasonable approximation of the data:

drop table temp_jd;
create table temp_jd (id serial,d jsonb);
insert into temp_jd (d) values 
 (
 '[
 {"thing":"v1","a1":"bla"},
 {"thing":"v2","a1":"blaugh"},
 {"otherthing":"v1","a1":"something"}
 ]'
 ),
 (
 '[
 {"thing":"v12","a12":"bla"},
 {"thing":"v2","a1":"blaugh"},
 {"morething":"v1","a1":"whatever"}
 ]'
 )
;

I'd ultimately like to query by various bits-n-pieces, and extract aggregated text so I can pretend I have "columns" eg

  • thing with values (v1; v2) and (v12; v2), or
  • thing_a1 with values (bla; blaugh) and (bla; blaugh)

I can extract values

select id,jsonb_array_elements(d)->'thing' as thingval from temp_jd;
id | thingval 
----+----------
 1 | "v1"
 1 | "v2"
 1 | 
 2 | "v12"
 2 | "v2"
 2 | 

but I can't figure out how to aggregate them as strings.

I can perform basic query operations

select id, jsonb_array_elements(d)->>'morething' from temp_jd where d @> '[{"morething":"v1"}]';
 id | ?column? 
----+----------
 2 | 
 2 | 
 2 | v1

but I'm not sure how to get to key='thing' and thing.a1='blaugh' or key='thing' and thing.a1 LIKE 'blaugh%'

Any help in better understanding this would be greatly appreciated.

asked Aug 11, 2020 at 14:58
4
  • What exactly is the output you are looking for? Commented Aug 11, 2020 at 15:01
  • What version of Postgres? Commented Aug 11, 2020 at 15:01
  • version is 12.2 Commented Aug 11, 2020 at 20:29
  • See above for desired output; I'd like an aggregation of various bits-n-pieces eg v1; v2 in a "column" thing Commented Aug 11, 2020 at 20:30

1 Answer 1

2

See if this fiddle helps shed some light on what you can do.

This query blows out your jsonb object to where you can reach the keys and values of your objects as columns in separate rows:

select t.id, a.element, a.ind - 1 as array_index, o.key, o.value
 from temp_jd t
 cross join lateral jsonb_array_elements(t.d) 
 with ordinality as a(element, ind)
 cross join lateral jsonb_each(a.element) as o(key, value)
 order by t.id, a.ind, o.key; 

To perform an aggregation on this similar to your example:

with blowup as (
 select t.id, a.element, a.ind - 1 as array_index, o.key, o.value
 from temp_jd t
 cross join lateral jsonb_array_elements(t.d) 
 with ordinality as a(element, ind)
 cross join lateral jsonb_each(a.element) as o(key, value)
 order by t.id, a.ind, o.key
)
select id, 
 array_agg(value order by array_index) as things
 from blowup 
 where key = 'thing'
 group by id; 
answered Aug 11, 2020 at 15:09
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, that's kludgy! I will experiment, but I wonder if that can possibly perform as hundreds of potential "columns" across a few million rows?
@user2797314 I will almost guarantee you that the performance will be terrible with your rowcounts. If performance is important to you, then you really cannot beat a normalized model as there has been nearly forty years of work behind making that go real fast. Just reading the jsonb objects and looking at what's inside them is very costly compared to dealing with a bunch of integers. I did some similar work recently, and we pulled stuff in as jsonb for storage and loaded relevant parts into a normalized schema for operational use. Speedup can be hundredfold vs. querying the json directly.
Thanks again. I'm coming from normalized tables, hoping a "smart" cache will simplify some things for the ~90% of the traffic that basically just needs a summary of various aspects, maybe it just won't work out that way! (yet?!)
@DLM You are very welcome. I wish you luck on what you are doing. One thing hurting you here is that arrays pretty much defeat the ability to index using jsonb_ops or jsonb_path_ops. I suspect there will never be any enhancement in this direction since normalization covers this use case. In case your objects are ugly in terms of keys, you may be able to derive benefit from a child table holding one object in each row with an FK back to temp_jd and indexed with a jsonb_ops gin index.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.