Index with ops for `LIKE` and `=` queries

Question 1

In my Rails 5 with PostgreSQL 9.5 app I have 2 frequent queries:

SELECT * FROM table WHERE lower(varchar_col) = 'some_string' AND int1_col = some_int AND int2_col = some_other_int
-- and
SELECT * FROM table WHERE lower(varchar_col) LIKE 'some_str%' AND int1_col = some_int

table is about 1.5m rows and quickly growing.

So, I've created 2 indices for both of these queries:

CREATE INDEX index_1 ON table USING btree (lower((varchar_col)::text), int1_col, int2_col)
-- and
CREATE INDEX index_2 ON table USING btree (lower((varchar_col)::text) varchar_pattern_ops, int1_col)

I have 2 questions, maybe they are a little noob, I'm not good in DB performance:

1.Can I use only one index for both of these queries? Something like:

CREATE INDEX index ON table USING btree (lower((varchar_col)::text) varchar_pattern_ops, int1_col, int2_col)

Does varchar_pattern_ops shows the same performance for LIKE and = queries? Will PostgreSQL planner use multicolumn index for 3 columns in query for 2 columns?

2.I noticed that Rails generate index with expression lower((varchar_col)::text, but my varchar_col has varchar type, not text. Does this matter and Rails made some mistake and I need to write CREATE INDEX query to specifically include column type in query?

Thanks for any help!

P.S. Of course, table's, columns' and indices' name are changed for better reading.

Question 2

Using an index which is somewhere between the one suggested by @ypercubeTM and your index_2 ... which would be:

CREATE INDEX index_1 ON t 
 USING btree 
 (int1_col, 
 lower((varchar_col)::text) varchar_pattern_ops, 
 int2_col) ;

... seems to be a good compromise.

You put first the two columns that you use always, and add the third one as the last. You add the varchar_pattern_ops so that queries with LIKE use the index.

From the two always used columns, put the one which is always tested for equality first, the one which is tested with other operators (>=, <=, like) second. If both were tested for equality, you normally would want to have the one which is more discriminant [which means, more or less, that has a bigger number of different values] first. If several columns are equally discriminant, you put first the ones with the datatypes easier/faster to compare (for instance: fixed-size values such as integers would go before varying length values such as texts).

These are the exeuction plans of the two queries you want to perform:

 EXPLAIN ANALYZE
 SELECT 
 * 
 FROM 
 t 
 WHERE 
 lower(varchar_col) = 'some_string' 
 AND int1_col = 5678 
 AND int2_col = 1234 ;

 | QUERY PLAN |
 | :------------------------------------------------------------------------------------------------------------- |
 | Index Scan using index_1 on t (cost=0.42..8.44 rows=1 width=40) (actual time=0.007..0.008 rows=1 loops=1) |
 | Index Cond: ((int1_col = 5678) AND (lower((varchar_col)::text) = 'some_string'::text) AND (int2_col = 1234)) |
 | Planning time: 0.274 ms |
 | Execution time: 0.023 ms |

 EXPLAIN ANALYZE
 SELECT * FROM t WHERE lower(varchar_col) LIKE 'some_str%' AND int1_col = 5678 ;

 | QUERY PLAN |
 | :------------------------------------------------------------------------------------------------------------------------------------------- |
 | Bitmap Heap Scan on t (cost=4.44..12.16 rows=1 width=40) (actual time=0.013..0.015 rows=3 loops=1) |
 | Recheck Cond: (int1_col = 5678) |
 | Filter: (lower((varchar_col)::text) ~~ 'some_str%'::text) |
 | Heap Blocks: exact=1 |
 | -> Bitmap Index Scan on index_1 (cost=0.00..4.44 rows=2 width=0) (actual time=0.005..0.005 rows=3 loops=1) |
 | Index Cond: ((int1_col = 5678) AND (lower((varchar_col)::text) ~>=~ 'some'::text) AND (lower((varchar_col)::text) ~<~ 'somf'::text)) |
 | Planning time: 0.177 ms |
 | Execution time: 0.027 ms |

Both query plans use your index and are fast (although, my simulated data might not be realistic at all).

dbfiddle here

joanolo joanolo 13.7k8 gold badges39 silver badges67 bronze badges · Accepted Answer · 2017-05-06 18:43:31Z

Using an index which is somewhere between the one suggested by @ypercubeTM and your index_2 ... which would be:

CREATE INDEX index_1 ON t 
 USING btree 
 (int1_col, 
 lower((varchar_col)::text) varchar_pattern_ops, 
 int2_col) ;

... seems to be a good compromise.

You put first the two columns that you use always, and add the third one as the last. You add the varchar_pattern_ops so that queries with LIKE use the index.

From the two always used columns, put the one which is always tested for equality first, the one which is tested with other operators (>=, <=, like) second. If both were tested for equality, you normally would want to have the one which is more discriminant [which means, more or less, that has a bigger number of different values] first. If several columns are equally discriminant, you put first the ones with the datatypes easier/faster to compare (for instance: fixed-size values such as integers would go before varying length values such as texts).

These are the exeuction plans of the two queries you want to perform:

 EXPLAIN ANALYZE
 SELECT 
 * 
 FROM 
 t 
 WHERE 
 lower(varchar_col) = 'some_string' 
 AND int1_col = 5678 
 AND int2_col = 1234 ;

 | QUERY PLAN |
 | :------------------------------------------------------------------------------------------------------------- |
 | Index Scan using index_1 on t (cost=0.42..8.44 rows=1 width=40) (actual time=0.007..0.008 rows=1 loops=1) |
 | Index Cond: ((int1_col = 5678) AND (lower((varchar_col)::text) = 'some_string'::text) AND (int2_col = 1234)) |
 | Planning time: 0.274 ms |
 | Execution time: 0.023 ms |

 EXPLAIN ANALYZE
 SELECT * FROM t WHERE lower(varchar_col) LIKE 'some_str%' AND int1_col = 5678 ;

 | QUERY PLAN |
 | :------------------------------------------------------------------------------------------------------------------------------------------- |
 | Bitmap Heap Scan on t (cost=4.44..12.16 rows=1 width=40) (actual time=0.013..0.015 rows=3 loops=1) |
 | Recheck Cond: (int1_col = 5678) |
 | Filter: (lower((varchar_col)::text) ~~ 'some_str%'::text) |
 | Heap Blocks: exact=1 |
 | -> Bitmap Index Scan on index_1 (cost=0.00..4.44 rows=2 width=0) (actual time=0.005..0.005 rows=3 loops=1) |
 | Index Cond: ((int1_col = 5678) AND (lower((varchar_col)::text) ~>=~ 'some'::text) AND (lower((varchar_col)::text) ~<~ 'somf'::text)) |
 | Planning time: 0.177 ms |
 | Execution time: 0.027 ms |

Both query plans use your index and are fast (although, my simulated data might not be realistic at all).

dbfiddle here

Stack Exchange Network

Index with ops for `LIKE` and `=` queries

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Index with ops for `LIKE` and `=` queries

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions