postgres: index to SELECT strings like '%foo%';

Question 1

I have an expensive, periodic PostgreSQL 9.3 query of the form:

SELECT * from mytable where name NOT LIKE '%foo%';

foo is actually a constant here that never changes. The query is expensive because it requires a table scan. I expect that only a few rows will actually match the query, so I want to use a partial index to speed up the query.

CREATE INDEX foo_idx ON mytable ((name NOT LIKE '%foo%')) WHERE name NOT LIKE '%foo%';

However, when I do an EXPLAIN on my query, it still is using a sequential scan instead of relying on the new index. What am I doing wrong?

I have tried several different expressions inside the parenthesized expression clause of the CREATE INDEX statement, but nothing helps:

CREATE INDEX foo_idx ON mytable (name) WHERE name NOT LIKE '%foo%';
CREATE INDEX foo_idx ON mytable ((name LIKE '%foo%')) WHERE name NOT LIKE '%foo%';

I have also tried removing the WHERE clause for the index, and that doesn't help either.

Question 2

IF you have wildcard BEFORE you can`t use an index.

Question 3

@Mihai Irrelevant, because he's trying to use an expression index. It is true that you cannot use a normal b-tree index with a prefix wildcard. (You can still use a trigram index if you created one).

Question 4

Mike, out of curiosity, if you wrap the NOT LIKE '%foo%' in a simple SQL function that you use in both the index and query, does that then use the index?

Question 5

Also, did you analyze the table after adding the index? Your first index worked like a charm for me once I analyzed the table.

Question 6

@Mihai: it is possible to create an index that supports like '%foo%', not sure about not like though: postgresonline.com/journal/archives/…

Question 7

The index should work as is. As pointed out by @jjanes, you just needed to run ANALYZE. However, I suggest you modify the index. The expression in your definition is not useful.

(削除) CREATE INDEX foo_idx ON intentions ((data LIKE '%foo%')) WHERE data NOT LIKE '%foo%'; (削除ここまで)

The expression is always FALSE and useless noise. Either simplify to the slightly cheaper equivalent index:

CREATE INDEX foo_idx ON intentions ((FALSE)) WHERE data NOT LIKE '%foo%';

Or, better yet, put something useful there. Since each index entry is going to allocate MAXALIGN (typically 8 bytes) for data anyway, and the overhead per index row is another 12 bytes (and the default fillfactor for btree indexes is 90%) you might as well put the index to good use and help other queries. If you have any other queries that might use the same partial index?

If you put something worth up to 8 bytes (like one or two int4 columns), the index retains exactly the same size and performance, but offers more possible use cases. If (one of) the column(s) is involved in updates, you have some additional maintenance cost. Still, if you have anything useful, use it. Any additional column that's used in a JOIN / WHERE or ORDER BY clause. Or column(s) that enable index-only scans:

CREATE INDEX foo_idx ON intentions (<useful_column(s)>)
WHERE data NOT LIKE '%foo%';

Question 8

You could try a slightly different approach.

CREATE INDEX foo_idx ON mytable (strpos(name,'foo'));
SELECT * from mytable where strpos(name,'foo') = 0;

Index on Expression

Question 9

The NOT LIKE will be problematic. Imagine this query without the NOT. The index could be walked to find the interval of values (indexes are sorted lists) which match. The NOT means you want everything else, both before and after the matching interval. It's a pretty special optimiser which will do that.

You may be able to invert the indexes to identify all the rows which do contain "foo", then use that result in a NOT EXISTS (or equivalent) to find your desired output. I think you'll still be seeing a table scan, however.

Alternatively add the column IsFoo to the table. Populate this on table writes (perhaps use a persisted calculated column?). Include IsFoo in your indexes and filters.

Question 10

I needed to run ANALYZE for the proper query plan to take effect. Once this was done, everything worked like a charm. My final index looks like this:

CREATE INDEX foo_idx ON intentions ((data LIKE '%foo%')) WHERE data NOT LIKE '%foo%';

score 2 · Answer 1 · 2014-09-10 01:20:28Z

The index should work as is. As pointed out by @jjanes, you just needed to run ANALYZE. However, I suggest you modify the index. The expression in your definition is not useful.

(削除) CREATE INDEX foo_idx ON intentions ((data LIKE '%foo%')) WHERE data NOT LIKE '%foo%'; (削除ここまで)

The expression is always FALSE and useless noise. Either simplify to the slightly cheaper equivalent index:

CREATE INDEX foo_idx ON intentions ((FALSE)) WHERE data NOT LIKE '%foo%';

Or, better yet, put something useful there. Since each index entry is going to allocate MAXALIGN (typically 8 bytes) for data anyway, and the overhead per index row is another 12 bytes (and the default fillfactor for btree indexes is 90%) you might as well put the index to good use and help other queries. If you have any other queries that might use the same partial index?

If you put something worth up to 8 bytes (like one or two int4 columns), the index retains exactly the same size and performance, but offers more possible use cases. If (one of) the column(s) is involved in updates, you have some additional maintenance cost. Still, if you have anything useful, use it. Any additional column that's used in a JOIN / WHERE or ORDER BY clause. Or column(s) that enable index-only scans:

CREATE INDEX foo_idx ON intentions (<useful_column(s)>)
WHERE data NOT LIKE '%foo%';

MickyT MickyT 1,1117 silver badges7 bronze badges · Answer 2 · 2014-09-09 00:22:36Z

You could try a slightly different approach.

CREATE INDEX foo_idx ON mytable (strpos(name,'foo'));
SELECT * from mytable where strpos(name,'foo') = 0;

Index on Expression

score 1 · Answer 3 · 2014-09-09 03:24:29Z

The NOT LIKE will be problematic. Imagine this query without the NOT. The index could be walked to find the interval of values (indexes are sorted lists) which match. The NOT means you want everything else, both before and after the matching interval. It's a pretty special optimiser which will do that.

You may be able to invert the indexes to identify all the rows which do contain "foo", then use that result in a NOT EXISTS (or equivalent) to find your desired output. I think you'll still be seeing a table scan, however.

Alternatively add the column IsFoo to the table. Populate this on table writes (perhaps use a persisted calculated column?). Include IsFoo in your indexes and filters.

Mike Curtiss Mike Curtiss 1711 silver badge4 bronze badges · Answer 4 · 2014-09-09 23:36:53Z

I needed to run ANALYZE for the proper query plan to take effect. Once this was done, everything worked like a charm. My final index looks like this:

CREATE INDEX foo_idx ON intentions ((data LIKE '%foo%')) WHERE data NOT LIKE '%foo%';

Stack Exchange Network

postgres: index to SELECT strings like '%foo%';

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

postgres: index to SELECT strings like '%foo%';

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions