1

In my Postgres database, I have got lots of data (a million rows) and the schema looks like this:

create table test_tb (
 id int,
 tags jsonb
);

A few queries to be performed on this dataset are:

1.

select id, tags ->> 'tag_1' as tag_1
from test_tb
where tags ->> 'tag_1'= 'val_1'
select id, tags ->> 'tag_2' as tag_2
from test_tb
select id, tags ->> 'tag_1'
from test_tb
WHERE (tags ->> 'tag_1'='dv_0'
 AND tags ->> 'tag_3'='dv_15')
 AND (tags ->> 'tag_15'='dv_22' OR tags ->> 'tag_5'='dv_6')
 OR (tags ->> 'tag_12'='dv_9'); 

Now the tags in each tuple are completely arbitrary, one tag might be appearing in only one tuple while another in hundreds of them, the number of tags in each tuple is around 20 - 30.

I tried storing tags in a jsonb column and put GIN indexing on that column but it didn't optimize my queries. Please suggest some alternate schema.

Laurenz Albe
62k4 gold badges57 silver badges93 bronze badges
asked Mar 1, 2022 at 3:07
0

1 Answer 1

1

You'll have to rewrite the queries so that they can use the GIN index. FOr the first query, that would be:

SELECT id, tags ->> 'tag_1' as tag_1
FROM test_tb
WHERE tags @> '{ "tag_1": "val_1" }';

For the second query, no index can help, since there is no WHERE or ORDER BY clause.

The third query is tricky, because it contains OR, but since the result set contains the primary key, you can rewrite it to use the index:

 SELECT id, tags ->> 'tag_1'
 FROM test_tb
 WHERE tags @> '{ "tag_1": "dv_0", "tag_3": "dv_15", "tag_15": "dv_22" }'
UNION
 SELECT id, tags ->> 'tag_1'
 FROM test_tb
 WHERE tags @> '{ "tag_1": "dv_0", "tag_3": "dv_15", "tag_5": "dv_6" }'
UNION
 SELECT id, tags ->> 'tag_1'
 FROM test_tb
 WHERE tags @> '{ "tag_12": "dv_9" }';

Here, UNION is used instead of the OR. The query becomes longer, but each of the subqueries can use the index.

answered Mar 1, 2022 at 3:50
6
  • Taking into account WHERE ..., you can rewrite SELECT id, tags ->> 'tag_1' as tag_1 ... as SELECT id, "val_1" as tag_1 .... Commented Mar 1, 2022 at 7:25
  • @Akina I don't understand. Where does column val_1 come from? Commented Mar 1, 2022 at 7:33
  • WHERE condition provides only this value in the output, and all returned rows will contain it, so what is the reason to extract it again? Commented Mar 1, 2022 at 7:45
  • @Akina How does a WHERE condition output something? Sorry for being dense... Commented Mar 1, 2022 at 8:40
  • 1
    There is a lot of rows in the table. They may does not contain tag 'tag_1', they may contain it with the value "val_1", they may contain it with another value. You apply WHERE tags @> '{ "tag_1": "val_1" }' (OP applies where tags ->> 'tag_1'= 'val_1'). Only rows which contain it with the value "val_1" will be selected. All returned rows will contain "val_1". Replacing the expression with the literal will give absolutely the same output, but less work to the server. Commented Mar 1, 2022 at 8:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.