I have to search for hyphenated words like 'good-morning', 'good-evening', etc.
My query is:
select id, ts_headline(content,
to_tsquery('english','good-morning'),
'HighlightAll=true MaxFragments=100 FragmentDelimiter=$')
from table
where ts_content @@ to_tsquery('english','good-morning');
When executing this query I also get results of 'good' and 'morning' separately. But I want exactly matching words and fragments.
(For ts_content
I used the same default config english
to create the tsvector
.)
How can I search such hyphenated words in PostgreSQL full text search?
-
Assuming you run at least Postgres 9.6? (Please always declare your version of Postgres.)Erwin Brandstetter– Erwin Brandstetter2018年04月21日 13:23:34 +00:00Commented Apr 21, 2018 at 13:23
1 Answer 1
The key word here is phrase search, introduced with Postgres 9.6.
Use the "FOLLOWED BY" operator <->
or one of the related <N>
operators. Or better yet, use the function phraseto_tsquery()
to generate your tsquery
.
Quoting the manual, it ...
produces
tsquery
that searches for a phrase, ignoring punctuation
phraseto_tsquery
behaves much likeplainto_tsquery
, except that it inserts the<->
(FOLLOWED BY) operator between surviving words instead of the&
(AND) operator. Also, stop words are not simply discarded, but are accounted for by inserting<N>
operators rather than<->
operators. This function is useful when searching for exact lexeme sequences, since the FOLLOWED BY operators check lexeme order not just the presence of all the lexemes.
Your query would work like this:
select id
, ts_headline(content, phraseto_tsquery('english', 'good-morning')
, 'HighlightAll=true MaxFragments=100 FragmentDelimiter=$')
from tbl
where ts_content @@ phraseto_tsquery('english','good-morning');
phraseto_tsquery('english', 'good-morning')
generates this tsquery
:
'good-morn' <-> 'good' <-> 'morn'
Since "good-morning" is identified as asciihword
(hyphenated ASCII word), the stemmed complete word is added before the components. The manual:
It is possible for the parser to produce overlapping tokens from the same piece of text. As an example, a hyphenated word will be reported both as the entire word and as each component: (followed by an example)
to_tsvector()
basically does the same on the other end, so everything matches up. This allows for fine-grained options with hyphenated words. The above only finds "good-morning" with a hyphen (or variants stemming to the same). To find all strings with "good" followed by "morn" (or variants stemming to the same) use phraseto_tsquery('english','good morning')
generating this tsquery: 'good' <-> 'morn'
OTOH, you can enforce exact matches by adding another filter like:
...
AND content ~* 'good-morning' -- case insensitive regexp match
Or:
...
AND content ILIKE '%good-morning%'
Seems a bit redundant to the human eye, but this way you get fast full text index support and exact matches.
The latter is mostly equivalent, but different (fewer) characters have special meaning in the LIKE
pattern and might need escaping. Related:
- PostgreSQL: Regular Expression escape function
- Pattern matching with LIKE, SIMILAR TO or regular expressions
Example to demonstrate the operator <N>
:
phraseto_tsquery('english', 'Juliet and the Licks')
generates this tsquery
:
'juliet' <3> 'lick'
<3>
meaning that lick
must be the third lexeme after juliet
.
-
Query:
select id , ts_headline(content, phraseto_tsquery('english', 'rhus-t') , 'HighlightAll=true MaxFragments=100 FragmentDelimiter=$') from vqbooks where ts_content @@ phraseto_tsquery('english','rhus-t');
result: " Lyss..,, Puls., <b>Rhus</b>-t., Sabad., " and " 'infant may have a <b>Rhus</b> toxicodendron picture. (NB: <b>Rhus</b>-t desires milk) I don't want to highlight have a <b>Rhus</b> toxicodendron". I want only first fragment to be highlighted.user3098231– user30982312018年04月26日 08:24:44 +00:00Commented Apr 26, 2018 at 8:24 -
1@user3098231: A small setting for the option
MaxFragments
might help some. But I am afraid that phrase search is not currently supported well ints_headline()
. A bug has been reported. See: dba.stackexchange.com/q/204856/3684Erwin Brandstetter– Erwin Brandstetter2018年04月26日 11:22:41 +00:00Commented Apr 26, 2018 at 11:22 -
1
phraseto_tsquery('english', 'good-morning')
produces'good-morn' <-> 'good' <-> 'morn'
, not'good' <-> 'morn'
. How are you getting this result? (I'm on Postgres 10, windows)dtheodor– dtheodor2019年02月19日 11:09:27 +00:00Commented Feb 19, 2019 at 11:09 -
@dtheodor: Good catch. I rectified the error and added proper information.Erwin Brandstetter– Erwin Brandstetter2019年02月20日 02:10:13 +00:00Commented Feb 20, 2019 at 2:10
Explore related questions
See similar questions with these tags.