0

I have a column of type text[] and I want to search over this column using this SQL operator @>

Note: @> operator, it filters the data as such ['a','b'] in ['a','b','c']

The returned objects will be those where the values passed are a subset of the data

The problem though is that integer comparison is better than string comparison.
I'm thinking of another way maybe if Postgres hashes the values and only then it compares.

Note, I cannot make use of indexes because there is not only one column, moreover, the query will first filter on some id and then it needs to filter those multi-valued columns.

My question, Is there is some feature in Postgresql that supports comparing integers instead of strings.

asked Oct 12, 2019 at 15:15
2
  • "Note, I cannot make use of indexes because there is not only one column, moreover, the query will first filter on some id and then it needs to filter those multi-valued columns." Have you tried it? Please show an example. Commented Oct 12, 2019 at 15:22
  • @jjanes The thing is, I have 4 million records and when I want to query the data, I will use some id that has an index, then I will end up with about 3 to 4k records, for those records I want to optimize the string search, thus any index related to them will not work because I already filtered the records using the aforementioned index. Commented Oct 12, 2019 at 15:36

1 Answer 1

1

If the strings are from a restricted set, you can define an ENUM datatype. This translates the strings to integers behind the scenes.

create type alph as enum ( 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z');
create table j as select floor(random()*100)::int, array_agg(substring('abcdefghijklmnopqrstuvwxyz',floor(random()*26)::int+1,1)) from generate_series(1,10000000) f(x) group by x%1000000;
create table j2 as select floor, array_agg::alph[] from j;

I get about a 2 fold speed improvement by doing:

select * from j2 where array_agg @> '{a,b}';

rather than

select * from j where array_agg @> '{a,b}';

If I include the condition and floor=7 (after creating an index on "floor"), then both queries are so fast that any difference in speed can not be reliably detected.

This seems like the essence of premature optimization to me.

answered Oct 12, 2019 at 16:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.