In my Postgres database, I have got lots of data (a million rows) and the schema looks like this:
create table test_tb (
id int,
tags jsonb
);
A few queries to be performed on this dataset are:
1.
select id, tags ->> 'tag_1' as tag_1
from test_tb
where tags ->> 'tag_1'= 'val_1'
select id, tags ->> 'tag_2' as tag_2
from test_tb
select id, tags ->> 'tag_1'
from test_tb
WHERE (tags ->> 'tag_1'='dv_0'
AND tags ->> 'tag_3'='dv_15')
AND (tags ->> 'tag_15'='dv_22' OR tags ->> 'tag_5'='dv_6')
OR (tags ->> 'tag_12'='dv_9');
Now the tags in each tuple are completely arbitrary, one tag might be appearing in only one tuple while another in hundreds of them, the number of tags in each tuple is around 20 - 30.
I tried storing tags in a jsonb
column and put GIN indexing on that column but it didn't optimize my queries. Please suggest some alternate schema.
1 Answer 1
You'll have to rewrite the queries so that they can use the GIN index. FOr the first query, that would be:
SELECT id, tags ->> 'tag_1' as tag_1
FROM test_tb
WHERE tags @> '{ "tag_1": "val_1" }';
For the second query, no index can help, since there is no WHERE
or ORDER BY
clause.
The third query is tricky, because it contains OR
, but since the result set contains the primary key, you can rewrite it to use the index:
SELECT id, tags ->> 'tag_1'
FROM test_tb
WHERE tags @> '{ "tag_1": "dv_0", "tag_3": "dv_15", "tag_15": "dv_22" }'
UNION
SELECT id, tags ->> 'tag_1'
FROM test_tb
WHERE tags @> '{ "tag_1": "dv_0", "tag_3": "dv_15", "tag_5": "dv_6" }'
UNION
SELECT id, tags ->> 'tag_1'
FROM test_tb
WHERE tags @> '{ "tag_12": "dv_9" }';
Here, UNION
is used instead of the OR
. The query becomes longer, but each of the subqueries can use the index.
-
Taking into account
WHERE ...
, you can rewriteSELECT id, tags ->> 'tag_1' as tag_1 ...
asSELECT id, "val_1" as tag_1 ...
.Akina– Akina2022年03月01日 07:25:10 +00:00Commented Mar 1, 2022 at 7:25 -
@Akina I don't understand. Where does column
val_1
come from?Laurenz Albe– Laurenz Albe2022年03月01日 07:33:39 +00:00Commented Mar 1, 2022 at 7:33 -
WHERE condition provides only this value in the output, and all returned rows will contain it, so what is the reason to extract it again?Akina– Akina2022年03月01日 07:45:24 +00:00Commented Mar 1, 2022 at 7:45
-
@Akina How does a
WHERE
condition output something? Sorry for being dense...Laurenz Albe– Laurenz Albe2022年03月01日 08:40:55 +00:00Commented Mar 1, 2022 at 8:40 -
1There is a lot of rows in the table. They may does not contain tag 'tag_1', they may contain it with the value "val_1", they may contain it with another value. You apply
WHERE tags @> '{ "tag_1": "val_1" }'
(OP applieswhere tags ->> 'tag_1'= 'val_1'
). Only rows which contain it with the value "val_1" will be selected. All returned rows will contain "val_1". Replacing the expression with the literal will give absolutely the same output, but less work to the server.Akina– Akina2022年03月01日 08:45:15 +00:00Commented Mar 1, 2022 at 8:45