2

We're implementing Postgres full-text search. While the bulk of the text is normal English, there are a number of place names which include diacriticals (ñ,á,ó, etc)

More specifically, there are words like cañon -- I want to make sure that users who type canon also get matches for cañon.

Is there specific support for this? Or is the solution simply to manually manage the specific special cases using a thesaurus?

asked Aug 26, 2016 at 15:18

1 Answer 1

1

You can use unaccent extension, either by pre-processing the text with unaccent function, or creating your own text search configuration. For instance (based on the example on unaccent's doc):

CREATE TEXT SEARCH CONFIGURATION my_conf ( COPY = english );
ALTER TEXT SEARCH CONFIGURATION my_conf
 ALTER MAPPING FOR hword, hword_part, word
 WITH unaccent, english_stem;

So you can match it:

mydb=> SELECT to_tsvector('english','cañon'), to_tsvector('my_conf', 'cañon');
 to_tsvector | to_tsvector 
-------------+-------------
 'cañon':1 | 'canon':1
(1 row)
mydb=> SELECT
mydb-> to_tsvector('english','cañon') @@ to_tsquery('english', 'cañon'),
mydb-> to_tsvector('english','canon') @@ to_tsquery('english', 'canon'),
mydb-> to_tsvector('english','cañon') @@ to_tsquery('english', 'canon'),
mydb-> to_tsvector('english','canon') @@ to_tsquery('english', 'cañon'),
mydb-> to_tsvector('my_conf','cañon') @@ to_tsquery('my_conf', 'cañon'),
mydb-> to_tsvector('my_conf','canon') @@ to_tsquery('my_conf', 'canon'),
mydb-> to_tsvector('my_conf', 'cañon') @@ to_tsquery('my_conf', 'canon'),
mydb-> to_tsvector('my_conf', 'canon') @@ to_tsquery('my_conf', 'cañon');
 ?column? | ?column? | ?column? | ?column? | ?column? | ?column? | ?column? | ?column? 
----------+----------+----------+----------+----------+----------+----------+----------
 t | t | f | f | t | t | t | t
(1 row)

See the last query, I'm matching combinations of cañon and canon, when using english configuration (first 4 columns) will only match if both are the same (first 2 ones), but my_conf configuration (last 4 columns) will match in all of them.

answered Aug 26, 2016 at 17:00

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.