I'm using Postgres' full text search to return results, but it seems any query with 3 characters or fewer returns no results. Is there a way to configure Postgres so that I can query with 2 and 3 character long strings? My data set is extremely small, so performance would not be an issue.
For reference, here's my table definition:
CREATE TABLE doctokens (
id serial primary key,
inner_text TEXT NOT NULL,
tokens tsvector
)
INSERT INTO doctokens (inner_text) VALUES ('Route builders give you a fluent syntax')
After an insert, I create the token tsvector
:
UPDATE doctokens SET tokens = setweight(to_tsvector('english', COALESCE(inner_text, '')), 'A')
And here's how I'm querying the table:
SELECT ts_headline('english', inner_text, query, 'StartSel = <em>, StopSel = </em>, ShortWord = 1') as inner_text_highlights
FROM (SELECT inner_text, ts_rank_cd(tokens, query) AS rank, query
FROM doctokens, plainto_tsquery('english', 'rou') AS query
WHERE tokens @@ query
ORDER BY rank DESC
LIMIT 5) AS query_results;
1 Answer 1
'route' and 'routing' stem to 'rout'. Neither 'rou' nor 'routi' do. So, they don't match.
You can use the prefix match to make 'rou' match, but plainto_tsquery doesn't understand that so you have to make the tsquery yourself:
SELECT ts_headline('english', inner_text, query, 'StartSel = <em>, StopSel = </em>, ShortWord = 1') as inner_text_highlights
FROM (SELECT inner_text, ts_rank_cd(tokens, query) AS rank, query
FROM doctokens, (select 'rou:*'::tsquery as query ) AS query
WHERE tokens @@ query
ORDER BY rank DESC
LIMIT 5) AS query_results;
inner_text_highlights
--------------------------------------------------
<em>Route</em> builders give you a fluent syntax
Does <em>Routing</em> rhyme with looting?
Prefix matching won't let you match 'routi:*', because the things it would match against get the 'i' stripped off when stemming.
-
If I wanted to get the benefits of both search methods, should I just have two separate queries in my
SELECT
- once withplainto_tsquery()
and one with prefix matching?Dave– Dave2019年09月02日 21:53:23 +00:00Commented Sep 2, 2019 at 21:53 -
@Dave You could do that, probably by applying a UNION ALL (rather than an OR). But I think that the ts_rank_cd function, executed on two different query strings, might not return comparable results, so you might not want to sort them together. I'd probably just execute the plainto_tsquery one first, and then if it returned no rows, try the other one. (Or give the user a checkbox, 'plain' or 'advanced')jjanes– jjanes2019年09月02日 22:05:26 +00:00Commented Sep 2, 2019 at 22:05
-
Is there any way to make
routi
also match, or is that not really possible withtsquery
? I know I could resort to usingWHERE inner_text LIKE '%routi%'
, but was curious if there is any way to make it work without usingLIKE
.Dave– Dave2019年09月02日 22:20:31 +00:00Commented Sep 2, 2019 at 22:20 -
You could use a configuration which doesn't do stemming, like 'simple' rather than 'english'. But at that point, I don't know if you are really doing full text search anymore. If is really important, you could keep two tsvector columns or FTS indexes, with different configurations to choose between them. Or one FTS and one pg_trgm: You can search with LIKE and rank with <->, but pg_trgm doesn't offer something like ts_headline.jjanes– jjanes2019年09月03日 01:11:16 +00:00Commented Sep 3, 2019 at 1:11
-
Thanks. That's the route I ultimately took - I created an English tsvector and a simple tsvector at insertion. Then, I query the English lexemes with
plainto_tsquery('english', :query)
and the simple lexemes using an approach similar to what you laid out, union the two data sets together, and grab the top 5 with the highest rank.Dave– Dave2019年09月03日 14:50:09 +00:00Commented Sep 3, 2019 at 14:50
Routing
(inserted the same way as above), and searching forrouti
does not yield any results, either. So, maybe what I'm running into is just a limitation of the capabilities of Postgres' full text search?