5
\$\begingroup\$

Task:

On an alphabet set [a-z], a simplified regular expression is much simpler than the normal regular expression.

It has only two meta characters: '.' and '*'.

'.' -- exact one arbitrary character match.

'*' -- zero or more arbitrary character match.

Note that it is different from the normal regular expression in that '' is on its own and does NOT act as a decorator on the leading character, I.e. in our simplified regular expression. "" is equivalent to ".*" in regular expression. Write a parser which takes a pattern string and a candidate string as input and decides to either accept or reject the candidate string based on the pattern string.

  1. Only the full match is acceptable. The parser rejects partial matches.
  2. You can safely assume all the characters in both input pattern string and candidate string are valid characters, i.e. you don't need to validate the input strings.
  3. Your parser need to at least pass all the following test cases.
  4. You can write the parser in any programing language you like.
  5. Write up an analysis on the run-time complexity of your code.

Example:

| pattern string | candidate string | result |
| abc | abc | accept |
| * | abc | accept |
| *abc | abc | accept |
| *abc | aaabbbabc | accept |
| a*bc | aaabbbabc | accept |
| a*bc | abc | accept |
| a* | abc | accept |
| a* | a | accept |
| a* | aa | accept |
| a* | abcdef | accept |
| *abc* | abc | accept |
| ***** | abc | accept |
| ... | abc | accept |
| .* | abc | accept |
| .bc* | abc | accept |
| .b*c*a | abca | accept |
| * | /EMPTY STRING/ | accept |
| abc | abcd | reject |
| *a | abcd | reject |
| a | /EMPTY STRING/ | reject |
| .a*c | abc | reject |
| a.*b | abc | reject |
| .. | abc | reject |
| /EMPTY STRING/ | /EMPTY STRING/ | reject |
| /EMPTY STRING/ | abc | reject |

My Solution:

public class RegxEngine
{
 public static boolean match(String regx, String candidate)
 {
 // Empty regx will always return false
 if (regx.isEmpty())
 {
 return false;
 }
 else
 {
 if (regx.charAt(0) == '*')
 {
 // Last * matches everything
 if (regx.length() == 1)
 {
 return true;
 }
 else
 {
 return matchStar(regx.substring(1), candidate);
 }
 }
 // Return false if text is empty but pattern is not *
 else if (candidate.isEmpty())
 {
 return false;
 }
 else if (regx.charAt(0) == '.' || regx.charAt(0) == candidate.charAt(0))
 {
 // If the last regx matches the last text
 if (regx.length() == 1 && candidate.length() == 1)
 {
 return true;
 }
 // If hasn't reached the end, try to match the rest strings
 else
 {
 return match(regx.substring(1), candidate.substring(1));
 }
 }
 else
 {
 return false; 
 }
 }
 }
 // Otherwise skip as many chars as required
 private static boolean matchStar(String regx, String candidate)
 {
 for (int i = 0; i < candidate.length(); i++)
 {
 if (match(regx, candidate.substring(i)))
 {
 return true;
 }
 else
 {
 continue;
 }
 }
 
 return false;
 }
}

This was an interview question and I did get through to the next round. I am asking simply looking for more ideas.

My Questions:

  1. Is my code \$O(m*n)\$, where \$m\$ is the length of the candidate string and \$n\$ is the length of the pattern string?
  2. In terms of performance/time complexity, could this be done better?
Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked Apr 11, 2012 at 9:11
\$\endgroup\$
0

2 Answers 2

3
\$\begingroup\$

In answer to 1) - No. Consider the regex *********a and the string b. Your code will have to consider all 9! (362880) cases before returning false.

That being said, this solution seems to be going in the right direction, and for this restricted language we can stop short of actually building the DFA equivalent to the expression. (Which is expensive in space to create, but will be cheaper each time it is run against a candidate string).

We acheive this by caching the result of each regex-substring for future use (the dynamic programming approach). This prevents the problematic recursive explosion in the original problem and will guarantee an O(n*m) runtime, but could cost up to O(n*m) space for each regex candidate pair.

static Map<Pair<String, String>, boolean> result_cache;
//Assume we have a pair, as in:
//http://stackoverflow.com/a/677248/1331457
public static boolean match(String regx, String candidate)
{
 Pair<String, String> key = Pair(regx, candidate);
 if result_cache.containsKey(key){
 return true;
 }
 boolean result = false;
 // Empty regx will always return false
 if (regx.isEmpty())
 {
 result = false;
 }
 else
 {
 if (regx.charAt(0) == '*')
 {
 // Last * matches everything
 if (regx.length() == 1)
 {
 result = true;
 }
 else
 {
 result = matchStar(regx.substring(1), candidate);
 }
 }
 // Return false if text is empty but pattern is not *
 else if (candidate.isEmpty())
 {
 result = false;
 }
 else if (regx.charAt(0) == '.' || regx.charAt(0) == candidate.charAt(0))
 {
 // If the last regx matches the last text
 if (regx.length() == 1 && candidate.length() == 1)
 {
 result = true;
 }
 // If hasn't reached the end, try to match the rest strings
 else
 {
 result = match(regx.substring(1), candidate.substring(1));
 }
 }
 else
 {
 result = false; 
 }
 }
 result_cache.put(key, result);
 return result;
}
//Similarly for other methods.
answered Apr 14, 2012 at 0:13
\$\endgroup\$
4
  • \$\begingroup\$ Thanks for the answers. You really made me think when you mentioned *********a. What if we preprocess the pattern string to be *a which is equivalent to *********a? \$\endgroup\$ Commented Apr 14, 2012 at 15:35
  • \$\begingroup\$ This is a good first pass optimisation. But it won't help you in cases like *.*.*.*.*.*a, b. The dynamic programming approach that I suggest above should deal with both in linear time. \$\endgroup\$ Commented Apr 14, 2012 at 15:48
  • \$\begingroup\$ @cmh can you explain why its 9!. \$\endgroup\$ Commented Dec 23, 2014 at 11:06
  • \$\begingroup\$ regex *********a and string b,looks like only 10 matches. \$\endgroup\$ Commented Dec 23, 2014 at 11:13
1
\$\begingroup\$

Can you translate the regular expression into ad-hoc Java code, compile it on the fly, and then execute it?

After it is translated and compiled, which should take less than a second, the execution will be a couple orders of magnitude faster than any data-driven parser.

answered Apr 17, 2012 at 16:32
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.