0

I have a JSON with the following structure:

uid | item
1 |[{"id":1, "m":"123", "s":1},{"id":2, "msg":"on","s":2},{"id":3, "msg":"of","s":1}]
2 |[{"id":1, "m":"yes", "s":2},{"id":3, "msg":"gh","s":0}]
3 |[{"id":1, "m":"qa", "s":1},{"id":4, "msg":"ks"},{"id":5, "m":"test"}]

I want to query this table based on id and get rows based on matched object values. For e.g. for id=3 I want:

uid | id | m | s
 1 | 3 | of | 1
 2 | 3 | gh | 0

This table has over 500M rows so I would need an index. Using GIN(item jsonb_path_ops) with item @> '[{"id": 3}]' works, but I don't know how to get the exact json object from array that matched.

I can turn the JSON structure to:

{
 "1": { "m":"123", "s": 1 },
 "2": { "m":"on", "s":2 },
 "3": { "m":"of", "s": 1 }
}

and use GIN(item) and check for key exists with ? and then try to fetch the key value. But the index size might increase which might be an overkill given that I only want to search based on id.

Maybe I can use a B-Tree index but not sure how. What should be my JSON structure for most efficient querying? and what index should be used so that I get only the matched object from the array?

Some facts:

  1. Cardinality of id <<< uid. (id = ~100,000, uid = ~500,000,000)
  2. Normalizing this turns this table to 500M*10,000 rows which is extremely slow.
  3. Each JSON array will have upto 50 objects only.
asked May 14, 2020 at 11:16
0

1 Answer 1

1

There are 2 questions here;


First one is the index structure.

  • You can use gin(item) here (without jsonb_path_ops). It will be just fine. You are querying for key (id) only not the full path.
  • You can create an expression index (or an gin index on a generated column) for extracting id field values to an integer[] value. Most probably this index will be smaller than jsonb index (~%10 smaller). But I am not sure whether it's faster for your case.

Here is an example for second usage.

Create an immutable function.

CREATE FUNCTION to_my_int_array (jsonb) RETURNS integer[] AS 
$$
 SELECT
 ARRAY(
 SELECT (jsonb_array_elements(1ドル) ->> 'id')::integer
 );
$$
LANGUAGE sql
IMMUTABLE
RETURNS NULL ON NULL INPUT;

Create an expression index with the function above:

CREATE INDEX ON my_table USING gin (to_my_int_array (item));

Filter the query with function;

SELECT
 mt.uid,
 my_item.*
 FROM my_table mt,
 LATERAL jsonb_array_elements(mt.item) ae,
 LATERAL jsonb_to_record(ae) AS my_item (id int, m text, s int)
 WHERE
 to_my_int_array(mt.item) @> '{1}'::integer[];

Second question is the output. The query above also shows output for version 11. After version 12, you can simply use the query below;

SELECT
 uid,
 my_item.*
 FROM my_table mt,
 LATERAL jsonb_to_record(jsonb_path_query_first(mt.item, '$[*] ? (@.id == 1) ')) 
 AS my_item(id int, m text, s int)
 WHERE
 mt.item @> '[{"id": 1}]';
| uid | id | m | s |
| --- | --- | --- | --- |
| 1 | 1 | 123 | 1 |
| 2 | 1 | yes | 2 |
| 3 | 1 | qa | 1 |
answered May 14, 2020 at 17:59
3
  • Thank you for your answer, I'm using PG 11. I already have something similar for the second part. It's the first part I'm worried about. Could you please point me to a resource that does something like your 2nd bullet point of first answer. That looks like a good strategy. Commented May 14, 2020 at 19:24
  • I could extract a postgres array for id but I don't want to add this column physically to my table because it's size is too big. Is it possible to use this array directly in a GIN index? I tried but didn't succeed. I used SELECT ARRAY(select jsonb_extract_path_text(jsonb_array_elements(p.item), '1')) from test p Commented May 14, 2020 at 20:19
  • 1
    To test the index performance only, I just selected * without any lateral joins. I've only tried on 1M entries but gin creation on the generated array(a) took 80 secs and plain gin(b) took 38 sec and with jsonb_path_ops(c) took 21 sec. Search results were best for a=300ms, b=3.1s, c=3.9s. b's size was almost 2 times than the other 2. Now my only problem is, this table gets lots of inserts, which might become slow with a. I gained a little time by switching to 2nd JSON in OP structure and using jsonb_object_keys. It returns a set and I wrote a fn to convert it to the array. Commented May 15, 2020 at 0:13

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.