I am trying to find a best match for a name within a list of names.
I came up with the below code, which works, but I feel that the creation of the intermediate ratios
list is not the best solution performance-wise.
Could it be somehow replaced? Would reduce
be a good fit here (I have tried it, but could not get the index of best match and its value at the same time).
from fuzzywuzzy import fuzz
name_to_match = 'john'
names = ['mike', 'james', 'jon', 'jhn', 'jimmy', 'john']
ratios = [fuzz.ratio(name_to_match, name) for name in names]
best_match = names[ratios.index(max(ratios))]
print(best_match) # output is 'john'
-
2\$\begingroup\$ Someone else asked about this on stack overflow once before, and I suggested they try downloading python-levenshtein since the github page suggests it may speed up execution by 4-10x. They later confirmed that it did in fact speed up their solution, so you may want to try that as well. \$\endgroup\$Dillon Davis– Dillon Davis2019年03月11日 08:43:09 +00:00Commented Mar 11, 2019 at 8:43
2 Answers 2
You should take advantage of the key
argument to max()
:
The key argument specifies a one-argument ordering function like that used for
list.sort()
.
best_match = max(names, key=lambda name: fuzz.ratio(name_to_match, name))
fuzzywuzzy already includes functionality to return only the best match. unless you need it sorted like 200_success's solution, this would be the easiest:
from fuzzywuzzy import process
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
process.extractOne("cowboys", choices)
# returns ("Dallas Cowboys", 90)
Explore related questions
See similar questions with these tags.