I have a JSON with the following structure:
uid | item
1 |[{"id":1, "m":"123", "s":1},{"id":2, "msg":"on","s":2},{"id":3, "msg":"of","s":1}]
2 |[{"id":1, "m":"yes", "s":2},{"id":3, "msg":"gh","s":0}]
3 |[{"id":1, "m":"qa", "s":1},{"id":4, "msg":"ks"},{"id":5, "m":"test"}]
I want to query this table based on id
and get rows based on matched object values. For e.g. for id=3
I want:
uid | id | m | s
1 | 3 | of | 1
2 | 3 | gh | 0
This table has over 500M rows so I would need an index. Using GIN(item jsonb_path_ops)
with item @> '[{"id": 3}]'
works, but I don't know how to get the exact json object from array that matched.
I can turn the JSON structure to:
{
"1": { "m":"123", "s": 1 },
"2": { "m":"on", "s":2 },
"3": { "m":"of", "s": 1 }
}
and use GIN(item)
and check for key exists with ?
and then try to fetch the key value. But the index size might increase which might be an overkill given that I only want to search based on id.
Maybe I can use a B-Tree index but not sure how. What should be my JSON structure for most efficient querying? and what index should be used so that I get only the matched object from the array?
Some facts:
- Cardinality of
id
<<<uid
. (id = ~100,000, uid = ~500,000,000) - Normalizing this turns this table to 500M*10,000 rows which is extremely slow.
- Each JSON array will have upto 50 objects only.
1 Answer 1
There are 2 questions here;
First one is the index structure.
- You can use gin(item) here (without jsonb_path_ops). It will be just fine. You are querying for key (id) only not the full path.
- You can create an expression index (or an gin index on a generated column) for extracting id field values to an integer[] value. Most probably this index will be smaller than jsonb index (~%10 smaller). But I am not sure whether it's faster for your case.
Here is an example for second usage.
Create an immutable function.
CREATE FUNCTION to_my_int_array (jsonb) RETURNS integer[] AS
$$
SELECT
ARRAY(
SELECT (jsonb_array_elements(1ドル) ->> 'id')::integer
);
$$
LANGUAGE sql
IMMUTABLE
RETURNS NULL ON NULL INPUT;
Create an expression index with the function above:
CREATE INDEX ON my_table USING gin (to_my_int_array (item));
Filter the query with function;
SELECT
mt.uid,
my_item.*
FROM my_table mt,
LATERAL jsonb_array_elements(mt.item) ae,
LATERAL jsonb_to_record(ae) AS my_item (id int, m text, s int)
WHERE
to_my_int_array(mt.item) @> '{1}'::integer[];
Second question is the output. The query above also shows output for version 11. After version 12, you can simply use the query below;
SELECT
uid,
my_item.*
FROM my_table mt,
LATERAL jsonb_to_record(jsonb_path_query_first(mt.item, '$[*] ? (@.id == 1) '))
AS my_item(id int, m text, s int)
WHERE
mt.item @> '[{"id": 1}]';
| uid | id | m | s |
| --- | --- | --- | --- |
| 1 | 1 | 123 | 1 |
| 2 | 1 | yes | 2 |
| 3 | 1 | qa | 1 |
-
Thank you for your answer, I'm using PG 11. I already have something similar for the second part. It's the first part I'm worried about. Could you please point me to a resource that does something like your 2nd bullet point of first answer. That looks like a good strategy.Heisenberg– Heisenberg2020年05月14日 19:24:10 +00:00Commented May 14, 2020 at 19:24
-
I could extract a postgres array for
id
but I don't want to add this column physically to my table because it's size is too big. Is it possible to use this array directly in a GIN index? I tried but didn't succeed. I usedSELECT ARRAY(select jsonb_extract_path_text(jsonb_array_elements(p.item), '1')) from test p
Heisenberg– Heisenberg2020年05月14日 20:19:10 +00:00Commented May 14, 2020 at 20:19 -
1To test the index performance only, I just selected * without any lateral joins. I've only tried on 1M entries but gin creation on the generated array(
a
) took 80 secs and plain gin(b
) took 38 sec and with jsonb_path_ops(c
) took 21 sec. Search results were best fora
=300ms,b
=3.1s,c
=3.9s.b
's size was almost 2 times than the other 2. Now my only problem is, this table gets lots of inserts, which might become slow witha
. I gained a little time by switching to 2nd JSON in OP structure and usingjsonb_object_keys
. It returns a set and I wrote a fn to convert it to the array.Heisenberg– Heisenberg2020年05月15日 00:13:29 +00:00Commented May 15, 2020 at 0:13