lua-users home
lua-l archive

Re: Fuzzy search

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I recently needed the Levenshtein algorithm in Lua:
function levenshtein(s, t)
 local s, t = tostring(s), tostring(t)
 if type(s) == 'string' and type(t) == 'string' then
 local m, n, d = #s, #t, {}
 for i = 0, m do d[i] = { [0] = i } end
 for j = 1, n do d[0][j] = j end
 for i = 1, m do
 for j = 1, n do
 local cost = s:sub(i,i) == t:sub(j,j) and 0 or 1
 d[i][j] = math.min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+cost)
 end
 end
 return d[m][n]
 end
end
Very inefficient but it does the job:
 Lua 5.1.2 Copyright (C) 1994-2007 Lua.org, PUC-Rio
 > = levenshtein('referrer', 'referrer') -- zero distance
 0
 > = levenshtein('referrer', 'referer') -- distance of one character
 1
 > = levenshtein('random', 'strings') -- random big distance
 6
Petite Abeille wrote:
On Jan 4, 2008, at 5:36 PM, Aaron Brown wrote:
Jeff Wise wrote:
Is there a "fuzzy search" function/algorithm available
which would show these strings as "close to being equal"?
There's a technique called stemming that does at least some
of what you want:
http://en.wikipedia.org/wiki/Stemming
Another one of interest:
http://en.wikipedia.org/wiki/Levenshtein_distance

AltStyle によって変換されたページ (->オリジナル) /