Return to Revisions

3 of 4

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 11:33

function fuzzy_match(str,pattern){
 pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
 return (new RegExp(pattern)).test(str);
};

String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:
```
 pattern = pattern.split("").join(".*")
```
The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
```
 pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
```
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
```
 var cache = _.memoize(function(pattern){
 return new RegExp(pattern.split("").reduce(function(a,b){
 return a+'[^'+b+']*'+b;
 })
 })
 function fuzzy_match(str,pattern){
 return cache(pattern).test(str)
 };
```
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
```
 var fuzzy_match = (function(){
 var cache = _.memoize(function(str){
 return new RexEgp("^"+str.replace(/./g, function(x){
 return /[\-\[\]\/\{\}\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
 })+"$");
 })
 return function(str, pattern){
 return cache(str).test(pattern)
 })
 })()
```

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?), and flank the regex with string boundary anchors.

answered Mar 14, 2013 at 12:37

John Dvorak

1.2k
9
15

default