Skip to main content
Code Review

Return to Revisions

3 of 4
replaced http://stackoverflow.com/ with https://stackoverflow.com/
function fuzzy_match(str,pattern){
 pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
 return (new RegExp(pattern)).test(str);
};
  1. String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:

     pattern = pattern.split("").join(".*")
    
  2. The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.

     pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
    
  3. Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.

     var cache = _.memoize(function(pattern){
     return new RegExp(pattern.split("").reduce(function(a,b){
     return a+'[^'+b+']*'+b;
     })
     })
     function fuzzy_match(str,pattern){
     return cache(pattern).test(str)
     };
    
  4. If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:

     var fuzzy_match = (function(){
     var cache = _.memoize(function(str){
     return new RexEgp("^"+str.replace(/./g, function(x){
     return /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
     })+"$");
     })
     return function(str, pattern){
     return cache(str).test(pattern)
     })
     })()
    

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?), and flank the regex with string boundary anchors.

John Dvorak
  • 1.2k
  • 9
  • 15
default

AltStyle によって変換されたページ (->オリジナル) /