After reading @JavaDeveloper's recent question, I was inspired1 to try my hand at writing code to accomplish the same task.
The "rules" for this code (i.e., its intent) is to match a subset of regular expressions, defined as follows:
.
(dot) means a any single character match*
(star) means 0 or more character match?
(question) means previous char is an option i.ecolou?r
matchescolor
andcolour
.
Just to be clear: I'm well aware that this is not how regular expressions are normally defined, and that the fundamental capability of this subset is exactly that--a subset of what full regexes can recognize/match.
As an aside: although tagging a question with both C and C++ is normally a mistake, I believe in this case it's justified. In particular, the match
function is at least intended to be work as either C or C++. The test rig (enabled by defining TEST
when compiling) uses features to specific to C++, but match
itself should not.
Without further ado, my attempt at implementing this functionality:
#include <string.h>
#ifndef __cplusplus
#include <stdbool.h>
#endif
bool match(char const *needle, char const *haystack) {
for (; *needle!='0円'; ++needle) {
switch (*needle) {
case '.': ++haystack;
case '?': break;
case '*': {
size_t max = strlen(needle);
for (size_t i = 0; i < max; i++)
if (match(needle + 1, haystack + i))
return true;
return false;
}
default:
if (*haystack == *needle)
++haystack;
else if (needle[1] != '?')
return false;
}
}
return *haystack == '0円';
}
#ifdef TEST
#include <iostream>
#include <vector>
#include <string>
template<class F>
void assertTrue(F f, char const *needle, char const *hay_stack) {
static const std::string names [] = { "Failure", "Success" };
std::cout << names[f(needle, hay_stack)];
std::cout << " for: " << needle << "->" << hay_stack << "\n";
}
template<class F>
void assertFalse(F f, char const *needle, char const *hay_stack) {
static const std::string names [] = { "Success", "Failure" };
std::cout << names[f(needle, hay_stack)];
std::cout << " for: " << needle << "->" << hay_stack << "\n";
}
int main(){
assertTrue(match,"colou?**rs", "colours");
assertTrue(match,"colou?**rs", "colors");
assertTrue(match,"colou**?rs", "colours");
// assertTrue(RegexToy.match,"colou**?rs", "colors"); <---- exlusive case.
assertTrue(match,"colou?**r", "colour");
assertTrue(match,"colou?**r", "color");
assertTrue(match,"colou?*", "colou");
assertTrue(match,"colou?*", "colo");
assertTrue(match,"colou?r", "colour");
assertTrue(match,"colou?r", "color");
assertTrue(match,"colou?*?r", "colour");
assertTrue(match,"colou?*?r", "color");
// Success cases
assertTrue(match,"", "");
assertTrue(match, "***", "");
std::vector<char const *> regexList { "abc****", "abc", "a*b*c", "****abc", "**a**b**c****", "abc*" };
char const *str1 = "abc";
for (auto regex : regexList)
assertTrue(match,regex, str1);
char const *regex = "abc****";
std::vector<char const *> strList1 { "abcxyz", "abcx", "abc" };
for (auto str : strList1)
assertTrue(match,regex, str);
regex = "***abc";
std::vector<char const *> strList2 { "xyzabc", "xabc", "abc" };
for (auto str : strList2)
assertTrue(match, regex, str);
assertTrue(match, "a.c", "abc");
assertTrue(match, "a*.*c", "abc");
assertTrue(match,"a*.b.*c", "axxbxxbxxc");
// Fail cases.
assertFalse(match,"abc", "abcd");
assertFalse(match,"*a", "abcd");
assertFalse(match,"a", "");
assertFalse(match,".a*c", "abc");
assertFalse(match,"a.*b", "abc");
assertFalse(match,"..", "abc");
assertFalse(match,"", "abc");
}
#endif
- Along with stealing the idea from @JavaDeveloper, I also stole his test rig, with only minor editing. I hope that's not a problem (I think my making attribution to him clear mans it falls within SE's licensing).
-
\$\begingroup\$ This is actually a complete wildcards grammar, I think. \$\endgroup\$AJMansfield– AJMansfield2014年03月07日 00:47:44 +00:00Commented Mar 7, 2014 at 0:47
-
\$\begingroup\$ @AJMansfield: It is much closer to globbing than normal regexes, but I decided to keep the same rules as the previous question. \$\endgroup\$Jerry Coffin– Jerry Coffin2014年03月07日 00:55:40 +00:00Commented Mar 7, 2014 at 0:55
2 Answers 2
The regex flavour is non-standard, as you noted. Just to be clear, ?
acts as a zero-or-one-of-the-previous-character modifier, while *
acts more like a shell glob.
You have an access-past-the-end-of-string error when testing for glob matches in a test such as this:
assertFalse(match, "**a", "b");
The for-loop speculatively matches max
substrings of the haystack
, where max
is based on the length of the needle
. If the matches don't succeed, you could easily walk past the end of the haystack.
Your switch
block is fragile: the '.'
case relies on the break
of the following '?'
case. That's a bit too clever and dangerous for my taste, and a warning comment would be useful.
-
\$\begingroup\$ Good point--I'd intended
strlen(haystack)
, but that's not quite right either. Thanks. \$\endgroup\$Jerry Coffin– Jerry Coffin2014年03月06日 20:10:35 +00:00Commented Mar 6, 2014 at 20:10
The output of your unit tests is unintuitive. For example,
Success for: ..->abc
Huh? That should not match. Oh, it turns out that it was an assertFalse()
. It would have been clearer to print
[PASS] Match ..->abc returns false