Check if simple regex matches string

Question 1

A simple regex is either:

_ (which matches the empty string)
Any lowercase letter a to z (which matches that letter)
r*, where r is a regex (which matches r any number of times)
(r|s), where r and s are regexes (which matches either r or s)
(r+s), where r and s are regexes (which matches r followed by s)

Note that due to the recursive definition, * can occur multiple times in a row.

Here are some examples of regexes and their matches:

(a+b) matches only ab
((a|_)+b*) matches , a, b, ab, bb, abb, but not baaa, aab
(c+(o+((l+(o+(u|_)))+r))) matches only color and colour
(a|b)* matches only strings containing letters a and b (so , ab, bab, but not abc)
(_***|(a+b***)) matches only the empty string or a followed by any number of bs.

Your task is to write a program that takes such a regex and a string of lowercase letters, and outputs whether or not the regex matches the entire string (output should be as described here).

The shortest code in bytes wins.

Question 2

Thanks! Yes, the shortest code wins. Added that to the question.

Question 3

Can we assume that the regex consists entirely of lowercase letters and _*(|)?

Question 4

Yes, you can assume that it's a string of lower case letters, and that the input regex will be a valid regex as described above.

Question 5

@ankh-morpork has pointed out that under the question's definition of a simple regex, any number of * in a row results in a valid simple regex; for example, qa***z should match qz and qaaz. Is this the definition we should stick to? If so, 8+ answers will need to be edited, as they are currently incorrect.

Question 6

In addition to Deadcode's comment above, _* is also a valid "simple regex" that matches only an empty string. Deleting _ would leave single *, which is a grammar error in most regex flavors (if not all).

Question 7

Haskell, 203 bytes

Nobody had done this by implementing a small regex engine yet, and I felt like it had to be done. This obviously won't win. but I'm hoping it will inspire someone to write an even more golfed regex engine.

I've rewritten my solution to avoid directly parsing the regular expression into its AST. Instead, the parsing process constructs a function that is used to match a string against the input regex.

The main function is (&) :: String -> String -> Bool which takes a string representation of a regex and a string to test, returning a boolean value. This calls to the next function which handles most of the work in parsing the regex and matching the string.

Function p :: String -> ([String] -> [String], String) takes a string representation of a regex and returns as the first element of a tuple a function that returns a list of all possible unmatched suffixes of strings in the input list after satisfying the regex parsed from the input string. The regex fully matches the string if the empty string is contained in the list of possible unmatched suffixes.

r&s=elem""$fst(p r)[s]
p(c:t)|c>'`'=t% \s->[t|h:t<-s,c==h]|c>'^'=t%id|(l,o:t)<-p t,(r,_:u)<-p t=u%last(r.l:[\s->r s++l s|o>'+'])
m#s=s++filter(`notElem`s)(m s)
('*':t)%m=t%until(\s->s==m#s)(m#)
s%m=(m,s)

Try it online!

To get rid of one byte, I replaced import Data.List; m#s=nub$s++m s with m#s=s++filter(`notElem`s)(m s). These functions aren't equivalent if there are duplicate elements in either s of m s. The new function does, however, remove all elements from m s that already exist in s, so until still terminates once no new suffixes are discovered by the application of m.

Ungolfed Code

import Data.List
match :: String -> String -> Bool
match r s =elem ""$(fst $ parseRegex r)[s]
parseRegex :: String -> ([String] -> [String], String)
parseRegex ('_':t) = parseKleene id t
parseRegex (c:t) | c >= 'a' = parseKleene (>>=p) t
 where p (c':t')| c==c' = [t']
 p _ = []
parseRegex ('(':t) =
 let (l, (o:t')) = parseRegex t in
 let (r, (_:t'')) = parseRegex t' in
 parseKleene (if o=='+' then (r.l) else (\ss-> (r ss)++(l ss))) t''
parseKleene :: ([String] -> [String]) -> String -> ([String] -> [String], String)
parseKleene p ('*':t) = parseKleene p' t
 where
 p' ss
 | ss' <- nub$ p ss,
 ss /= ss' = ss ++ (p' ss')
 | otherwise = ss
parseKleene p s = (p,s)

GolfScript, 198 bytes

I was able to beat my Haskell solution by implementing the first algorithm I tried in GolfScript instead of Haskell. I don't think it's interesting enough for a separate answer, so I'll just leave it here. There are likely some golfing opportunities since I learned GolfScript just for this.

This solution is in the form of a block that expects the test string on the top of the stack followed by the regex string.

{[.;]1円+{(.96>{0[\]}{2%0{r\(\r\:s;[@]s(;\}i}{if}:i~\{(.3%}{;2円[\]\}until[.;]\+\}:r~\;{.{(.{.4%{2%{)@\m\)\;m}{)\;{.@.@m 1$|.@={\;}{\o}i}:o~}i}{;)@.@m@@\)\;m|}i}{;(:c;;{,},{(\;c=},{(;}%}i}{;}i}:m~""?}

Try it online!

Question 8

Fantastic work and great golfing! It even works with

((((a|d)|(g|j))|((((c|f)|i)+((a|d)|(g|j))*)+((b|e)|h)))|(((((b|e)|h)|((((c|f)|i)+((a|d)|(g|j))*)+((c|f)|i)))+(((a|d)|(g|j))|((((b|e)|h)+((a|d)|(g|j))*)+((c|f)|i)))*)+(((c|f)|i)|((((b|e)|h)+((a|d)|(g|j))*)+((b|e)|h)))))*

, the regex that tests decimal numbers for divisibility by 3 (where 0-9 is replaced with a-j).

Question 9

@Deadcode thanks for bringing some attention to my solution. It's good to know that people are seeing it since I put a good bit of time into trying out different golfing strategies.

Question 10

@Deadcode The way I understand the problem, that pattern should be valid: _ is defined as a regex and r* is defined to be a regex for all regex r, so we can chain * as much as we like on top of _. With regards to hanging, I think (but am not confident) that it will terminate but in an embarrassingly exponential amount of time.

Question 11

@Deadcode The easy fix required importing Data.List for nub, but I'm fairly confident the problem is solved.

Question 12

You can change %(\s->...) to % \s->...

Question 13

APL (Dyalog Unicode), 39 bytes ^SBCS

Edit: now works with runs of * even after _

Full program. Prompts stdin for string and then for regex. Returns a list consisting of an empty list (by default, this prints as two spaces) for matches and an empty list (empty line) for non-matches.

(1⌽'$^','\*+' '_'⎕R'*' '()'⊢⍞~'+')⎕S⍬⊢⍞

Try it online! (output made easier to read by converting all output to JSON)

⍞ prompt stdin (for string)

⊢ on that, apply the following:

(...)⎕S⍬ PCRE Search for the following, returning an empty list for each match

~'+' remove all plusses from the following:

⍞ prompt stdin (for regex)

⊢ on that, apply the following:

'\*+' '_'⎕R'*' '()' PCRE Replace runs of * with * and _ with ()

'$^', prepend dollar sign and caret (indicating end and start)

1⌽ rotate the first character ($) to the end

Question 14

_*** is a valid simple regex. Please revise your answer to allow it.

Question 15

@Deadcode Fixed.

Question 16

APL (Dyalog Unicode), ^{(削除) 295 (削除ここまで)} 277 bytes

a←819⌶⎕A
E←{⍵{(⍺⊆⍨~⍵),⍺[⍸⍵]}(⍵∊'|+')∧0=+\-⌿'()'∘.=⍵}1↓ ̄1↓⊢
M←{c←⊃⌽⍵⋄c∊'0',a:0⋄c∊'_*':1⋄r s o←E⍵⋄o='|':∨/∇ ̈r s⋄∧/∇ ̈r s}
D←{c←⊃⌽⍵⋄c∊'0_':'0'⋄c=⍺:'_'⋄c∊a:'0'⋄c='*':1⌽∊')('(⍺∇ ̄1↓⍵)'+'⍵⋄r s o←E⍵⋄o='|':1⌽∊')('(⍺∇r)'|',⍺∇s⋄M r:1⌽∊')(('(⍺∇r)'+'s')|',⍺∇s⋄1⌽∊')('(⍺∇r)'+'s}
{M⊃D/(⌽⍵),⊂⍺}

Try it online!

^{-18 bytes thanks to @ngn.}

This is a proof of concept that we can do a "simple regex matching" without any backtracking, thus avoiding possible infinite loops due to _* or r**. This is also a showcase that APL is a general-purpose programming language.

The anonymous function at the last line does the regex matching; use it as (regex) f (input string). The return value is 1 if the match is successful, 0 otherwise.

Concept

Given a simple regex R and the first character c of input string, we can construct (or derive) another simple regex R' that matches exactly the strings s where the original R matches c+s.

$$ \forall R \in \text{simple regex}, c \in \text{[a-z]}, s \in \text{[a-z]*}, \\ \exists R' \in \text{simple regex}, R' =\sim s \iff R =\sim c+s $$

Combine this with a tester which checks if r matches an empty string (epsilon), and we get a fully working simple regex matcher: given a regex \$ R_0 \$ and string \$ s = c_1 c_2 \cdots c_n \$, sequentially derive \$ R_0, c_1 \rightarrow R_1, c_2 \rightarrow R_2 \cdots \rightarrow R_n \$ and then test if \$ R_n \$ matches epsilon.

My code uses the following algorithm for testing epsilon match (MatchEps) and computing R' from R and c (Derive).

T = True, F = False
0 = null regex (never matches)
_ = "empty string" regex
a = single-char regex
r, s = any (sub-)regex
MatchEps :: regex -> bool
MatchEps 0 = F # Null regex can't match empty string
MatchEps _ = T # Empty-string regex trivially matches empty string
MatchEps a = F # Single-char can't match
MatchEps r* = T # Kleene matches as zero iteration
MatchEps (r|s) = MatchEps r or MatchEps s
MatchEps (r+s) = MatchEps r and MatchEps s
Derive :: char -> regex -> regex
# No matching string at all
Derive c 0 = 0
# _ can't match any string that starts with c
Derive c _ = 0
# Single-char regex only matches itself followed by empty string
Derive c a = if c == 'a' then _ else 0
# r* matches either _ or (r+r*);
# _ can't start with c, so it must be first `r` of (r+r*) that starts with c
Derive c r* = ([Derive c r]+r*)
# r or s; simply derive from r or derive from s
Derive c (r|s) = ([Derive c r]|[Derive c s])
# r followed by s; it matters if r can match _
Derive c (r+s) =
 # if r matches _, either [r starts with c] or [r matches _ and s starts with c]
 if MatchEps r then (([Derive c r]+s)|[Derive c s])
 # otherwise, r always starts with c
 else ([Derive c r]+s)

Ungolfed, with comments

⍝ Unwrap single layer of (...) and extract (r, s, op) from (r|s) or (r+s)
ExtractRS←{⍵{(⍺⊆⍨~⍵),⍺[⍸⍵]}(⍵∊'|+')∧0=+\-⌿'()'∘.=⍵}1↓ ̄1↓⊢
 ⍝ 1↓ ̄1↓⊢ Drop the outermost ()
 ⍝ {...} Pass the result to the function as ⍵...
 ⍝ +\-⌿'()'∘.=⍵ Compute the layers of nested ()s
 ⍝ (⍵∊'|+')∧0= Locate the operator (`|` or `+`) as bool vector
 ⍝ ⍵{...} Pass to inner function again ⍵ as ⍺, above as ⍵
 ⍝ ⍺[⍸⍵] Extract the operator
 ⍝ (⍺⊆⍨~⍵), Prepend the left and right regexes
⍝ Tests if the given regex matches an empty string (epsilon, eps)
MatchEps←{
 c←⊃⌽⍵ ⍝ Classify the regex by last char
 c∊'0',819⌶⎕A:0 ⍝ 0(no match) or lowercase: false
 c∊'_*':1 ⍝ _(empty) or Kleene: true
 r s op←ExtractRS ⍵ ⍝ The rest is (r|s) or (r+s); extract it
 op='|': ∨/∇ ̈r s ⍝ (r|s): r =~ eps or s =~ eps
 ∧/∇ ̈r s ⍝ (r+s): r =~ eps and s =~ eps
}
⍝ Derives regex `R'` from original regex `R` and first char `c`
Derive←{
 c←⊃⌽⍵ ⍝ Classify the regex by last char
 c∊'0_':,'0' ⍝ 0 or _ doesn't start with any c
 c=⍺:,'_' ⍝ Single char that matches
 c∊819⌶⎕A:'0' ⍝ Single char that doesn't match
 c='*': '(',(⍺∇ ̄1↓⍵),'+',⍵,')' ⍝ One char from Kleene: (R*)' = (R'+R*)
 r s op←ExtractRS ⍵ ⍝ Extract (r|s) or (r+s)
 op='|': '(',(⍺∇r),'|',(⍺∇s),')' ⍝ (r|s): one char from either branch
 MatchEps r: '((',(⍺∇r),'+',s,')|',(⍺∇s),')' ⍝ (r+s) and r =~ eps: ((r'+s)|s')
 '(',(⍺∇r),'+',s,')' ⍝ (r+s) but not r =~ eps: (r'+s)
}
⍝ Main function: Fold the string by Derive with initial regex,
⍝ and then test if the result matches eps
f←{MatchEps⊃Derive/(⌽⍵),⊂⍺}

Final note

This is not an original idea of mine; it is part of a series of exercises on a theorem proving textbook. I can claim that the algorithm is proven to work (because I did complete the correctness proofs), though I can't open the entire proof to public.

Question 17

Very cool solution. I remember working through those exercises when I took a software verification course - Lots of fun.

Question 18

Python 3, 69 bytes

lambda r,s:re.match(re.sub('(?<=[_*])\*+|[_+]','',r)+'$',s)
import re

Try it online!

Simple - just convert it to ordinary regex, using ordinary regex!

Question 19

I think this is actually a 56 byte answer, because you don't have to count the f= for lambda answers, and can put the import after it: Try it online!

Question 20

_*** is a valid simple regex. Please revise your answer to allow it.

Question 21

@Deadcode it may be two years late... but I've done it.

Question 22

JavaScript (ES6), 45 bytes

Takes input as (regex)(string). Returns a Boolean value.

Applies the straightforward method of removing [_+] from the simple regex to turn it into a standard regex.

r=>s=>!!s.match(`^${r.replace(/[_+]/g,"")}$`)

Try it online!

Or 43 bytes by returning either null or an object.

Question 23

_*** is a valid simple regex. Please revise your answer to allow it.

Question 24

R, (削除) 55 (削除ここまで) 75 bytes

function(x,y)grepl(paste0("^",gsub("([+_]|(?<=\\*))\\**","",x,pe=T),"$"),y)

Try it online!

A function that takes a simple regex x and a vector of strings y and returns a vector of logical values the same length as y indicating whether x matches.

Question 25

_*** is a valid simple regex. Please revise your answer to allow it.

Question 26

@Deadcode fixed to allow this and ab****.

Question 27

Retina, (削除) 38 (削除ここまで) 35 bytes

*1A`
1G`
^
a`
_
()
\*+
*
"$-5"~`\+

Try it online! Takes the simple regex on the first line and the string to match on the second. Explanation:

*1A`

Delete the first line, but don't actually change the working string. The string to match still gets stored in the history, which allows us to refer to it later.

1G`

Keep only the first line.

^
a`

Prefix the a modifier to anchor the pattern to the whole string.

_
()

Turn the _s into ()s to match an empty string that can be "repeated" with *.

\*+
*

Reduces runs of * to a single *.

\+

Delete any +s.

"$-5"~`

Execute that as a stage, using the history as the working string.

Question 28

As specified in the problem, this is supposed to match the entire string (implied ^...$ around the pattern), but this is matching as if there is neither a ^ nor a $.

Question 29

@Deadcode Sorry, I might not have noticed that edit by the time I finished answering the question.

Question 30

Figured that might be it, which means you opened the question within within 27 minutes of when it was posted.

Question 31

_*** is a valid simple regex. Please revise your answer to allow it.

Question 32

@Deadcode It being the weekend, I probably had the "Newest Questions" tab open all day.

Question 33

Java 8, 55 bytes

r->s->s.matches(r.replaceAll("\\+|(_|(\\*))\\**","2ドル"))

Try it online.

Removes all +; all _ with zero or more trailing *; and changes all sequences of more than one subsequent * with a single *. Then it check if the String matches this modified regex. Note that in Java, the String#matches method implicitly adds a leading and trailing ^...$ to check the entire String.

Question 34

_*** is a valid simple regex. Please revise your answer to allow it.

Question 35

@Deadcode Both _*** and ab**** are fixed now at the cost of 15 bytes. Can probably be golfed a bit, though.

Question 36

PHP, (削除) 983 (削除ここまで) (削除) 976 (削除ここまで) (削除) 954 (削除ここまで) (削除) 930 (削除ここまで) (削除) 910 (削除ここまで) (削除) 892 (削除ここまで) 838 bytes

<?php list(,$s,$i)=$argv;$p=0;$u=[2,[3,[2,[1,'('],$o=[2,[4,[2,&$u,[1,'|+']]],&$u],[1,')']],[1,'_'.join(range('a','z'))]],[5,[1,'*']]];m($o,$a);$s=$i;$p=0;echo m(o($a))&&$p==strlen($s);function m($m,&$a=[]){global$p,$s;$r=$p;$n=array_shift($m);foreach($m as$t){$b=[];if($n==1)if(($c=$s[$p]??0)&&strpos($t,$c)!==!1){$a[]=$c;$p++;return 1;}if($n==2){if(!m($t,$b)){$p=$r;return!1;}$a[]=$b;}if($n==3){if(m($t,$b)){$a[]=$b;return 1;}}if($n==4){k:$b=[];$r=$p;if(!m($t,$b)){$p=$r;return 1;}$a[]=$b;goto k;}if($n==5){if(m($t,$b))$a[]=$b;else{$a[]=[];$p=$r;}return 1;}if($n==6)return 1;}return $n==2?:$p!=$p=$r;}function o($a){$e=$b=u($a[1]);if($a[0]){$e=[2];foreach($a[0]as$u){$e[]=u($u[0]);$e[0]=$u[1][0]=='+'?2:3;}$e[]=$b;}return$e;}function u($u){$w=$u[0][0];$v=$w[0][0];$t=$v!='('?($v=='_'?[6,0]:[1,$v]):o($w[1]);return$u[1][0]==[]?$t:[4,$t];}

Try it online!

Ungolfed

<?php
list($dummy,$string,$user_test)=$argv;
$pointer = 0;
//production rules
$unit = [];
$char = ['char','_abcdefghijklmnopqrstuvwxyz'];
$separator = ['char','|+'];
$unit_and_separator = ['and',&$unit,$separator];
$operators_list = ['list',$unit_and_separator];
$operators = ['and',$operators_list,&$unit];
$open_bracket = ['char','('];
$close_bracket = ['char',')'];
$brackets = ['and',$open_bracket,$operators,$close_bracket];
$atom = ['or',$brackets,$char];
$star = ['opt',['char','*']];
$unit = ['and',$atom,$star];
$ast = [];
match($operators, $ast);
$user_regex = buildoperators($ast);
$user_ast = [];
$string = $user_test;
$pointer = 0;
// answer here 1=matched blank=not matched
echo match($user_regex, $user_ast)&&($pointer==strlen($string));
// recursive descent parser 
function match($test_match, &$ast) {
 global $pointer,$string;
 $original_pointer = $pointer;
 foreach (array_slice($test_match,1) as $test) {
 switch ($test_match[0]) {
 case 'and':
 $sub_match = [];
 $pass = match($test,$sub_match);
 if (!$pass) {$pointer = $original_pointer;return false;}
 $ast[] = $sub_match;
 break;
 case 'or':
 $sub_match = [];
 $pass = match($test, $sub_match);
 if ($pass) {
 $ast[] = $sub_match;
 return true;
 }
 break;
 case 'list':
 do {
 $sub_match = [];
 $original_pointer=$pointer;
 $pass = match($test, $sub_match);
 if (!$pass) {
 $pointer = $original_pointer;
 return true;
 }
 $ast[] = $sub_match;
 } while (true);
 break;
 case 'char':
 $char = substr($string,$pointer,1);
 if ($char && @strpos($test,$char)!==false) {
 $ast[]=substr($string,$pointer,1);
 $pointer++;
 return true;
 }
 break;
 case 'emptystring':
 return true;
 break;
 case 'opt':
 $pass = match($test, $sub_match);
 if ($pass) {$ast[] = $sub_match;} else {$ast[] = []; $pointer = $original_pointer;}
 return true;
 break;
 }
 }
 if ($test_match[0] == 'and') {
 return true;
 } else {
 $pointer = $original_pointer;
 return false;
 }
}
// build user production rules
function buildoperators($ast) {
 if ($ast[0]) { 
 $engine = ['and'];
 foreach ($ast[0] as $unit_and_separator) {
 $engine[] = buildunit($unit_and_separator[0]);
 switch ($unit_and_separator[1][0]) {
 case '+':
 $engine[0]='and';
 break;
 case '|':
 $engine[0]='or';
 break;
 }
 }
 $engine[] = buildunit($ast[1]);
 } else {
 $engine = buildunit($ast[1]);
 }
 return $engine;
}
function buildunit($unit) {
 $star = !empty($unit[1][0]);
 if ($star) {
 return ['list',buildatom($unit[0][0])];
 } else {
 return buildatom($unit[0][0]);
 }
}
function buildatom($atom) {
 if ($atom[0][0]=='(') {
 return buildoperators($atom[1]);
 } elseif ($atom[0][0]=='_') {
 return ['emptystring',''];
 } else {
 return ['char',$atom[0][0]];
 }
}

Question 37

Very cool that you did this, and welcome to PPCG! Seems it needs some bugfixes though. It is missing the implied $ at the end of the pattern, e.g. (a|b) matches ac. And it seems using * anywhere except after the outermost expression results in error messages, e.g. (a*|b) crashes, but (a|b)* works.

Question 38

Also, (((a+a)|a)+(a+b)) should be able to backtrack and match aab, but currently only ((a|(a+a))+(a+b)) is matching aab.

Question 39

Thanks @Deadcode. Both bugs are fixed in the golfed version. It's larger, not sure how much more I can squeeze from the lemon. The ungolfed version was working correctly with (a*|b). The backtracking is not compatible with my overly simplistic recursive descent parser. As you've noted treat it gently and check shorter productions first to make it work.

Question 40

Just a few extras. It's worth noting that the original question did not specify the behaviour of something like a+b+c|d, since there are no precedence rules. My code will accept it, but treat it as a|b|c|d, which is probably undesirable. Also, my regex engine makes up for its deficiency in that it is fact more powerful than conventional regex, as it also returns an abstract syntax tree of the results (in $user_ast).

Question 41

Perl 5, 34 + `-p` flag = 35 bytes

Full program. Takes the simple regex pattern, followed by string to match against, from stdin as two separate lines, then loops and does it again, until EOF is encountered. Prints 1 for a match or nothing for a non-match (with no newline in either case).

@ankh-morpork has pointed out that technically, given the question's description of simple regexes, any number of * in a row makes a valid simple regex. @Bubbler has pointed out that _* also needs to work (and be equivalent to _). The other answers haven't taken these things into account yet, but I will do so:

s/[_+]/()/g;s/\*+/*/g;$_=<>=~/^$_/

Try it online!

To allow simple regexes such as (_***+a) to work, _ is changed to () instead of . For golf reasons, + is also changed to (), although changing it to would work.

This solution exploits the fact that valid input won't contain newlines, input can be assumed to be valid, and both the implicit <> (from -p) and the explicit <> include the terminating newline read from stdin, thus $ doesn't need to be added at the end of the regex (as both the pattern and the string ), only ^ needs to be inserted at the beginning.

Perl 5, 20 + `-p` flag = 21 bytes (looser, obsolete interpretation of question)

y/_+//d;$_=<>=~/^$_/

Try it online!

Like most of the other solutions, deletes the _ and + characters to turn the simple regex into a standard regex. This means that the simple regex (_*+a) will not work, as it becomes (*a) after the deletion. Anything containing ** won't work either; in standard regex, an already-quantified expression cannot be quantified again.

Question 42

You don't need to count the flags into bytes. Instead, you can write e.g. Perl 5 `-p`, 34 bytes.

Question 43

@Bubbler I know, but I've intentionally done this so that it is considered to be answer in the "Perl" language, and thus directly comparable to other answers in this language, rather than being in a different language, "Perl + -p".

Question 44

Perl 5, 47 + `-alp` flag = 50 bytes

$_=$F[0];s/[_+]/()/g;s/\*+/*/g;$_=$F[1]=~/^$_$/

Try it online!

Perl 5, 41 + `-alp` flag = 44 bytes

Obsolete: it not supports _***-like regexes

$_=eval'$F[1]=~/^'.($F[0]=~y/_+//rd).'$/'

Try it online!

Question 45

_*** is a valid simple regex. Please revise your answer to allow it.

Question 46

@Deadcode Fixed.

Question 47

C# (Visual C# Interactive Compiler), 522 bytes

a=>b=>{int y=0,e=a.Length,k=0,o,q;var z=new int[e];for(;k<e;k++)if(a[k]<41){for(o=q=1;o>0;q++)o+=a[q+k]<41?1:a[q+k]>41?0:-1;z[k+--q]=k+1;z[k]=k+q+2;}void t(string s,int j){for(;j<e;){var l=a[j++];var w=j<e&&a[j]==42;if(j>1&&a[j-2]<41)for(int r=j,d=0;r<z[j-2]-1;)if(a[r++]>123&z.Take(r).Skip(j-2).Count(x=>x>0)%2>0)t(s,r);if(l>96&l<124)do{if(w)t(s,j+1);if(s==""||s[0]!=l)return;s=s[1..];}while(w);if(l==42&&a[j-2]==41||l<41&z[j-1]<=e&&a[z[j-1]-1]==42)t(s,z[j-1]);j=l>123?a.IndexOf(')',j)+1:j;}y=s==""?1:y;}t(b,0);return y;}

Saved 6 bytes thanks to ceilingcat

Try it online!

Question 48

Japt `-¡`, 13 bytes

è^+'$iVr"_|%+

Try it

true

Question 49

_*** is a valid simple regex. Please revise your answer to allow it.

ankh-morpork 1,4851 gold badge12 silver badges15 bronze badges · Accepted Answer · 2019-12-18 15:57:21Z

Haskell, 203 bytes

Nobody had done this by implementing a small regex engine yet, and I felt like it had to be done. This obviously won't win. but I'm hoping it will inspire someone to write an even more golfed regex engine.

I've rewritten my solution to avoid directly parsing the regular expression into its AST. Instead, the parsing process constructs a function that is used to match a string against the input regex.

The main function is (&) :: String -> String -> Bool which takes a string representation of a regex and a string to test, returning a boolean value. This calls to the next function which handles most of the work in parsing the regex and matching the string.

Function p :: String -> ([String] -> [String], String) takes a string representation of a regex and returns as the first element of a tuple a function that returns a list of all possible unmatched suffixes of strings in the input list after satisfying the regex parsed from the input string. The regex fully matches the string if the empty string is contained in the list of possible unmatched suffixes.

r&s=elem""$fst(p r)[s]
p(c:t)|c>'`'=t% \s->[t|h:t<-s,c==h]|c>'^'=t%id|(l,o:t)<-p t,(r,_:u)<-p t=u%last(r.l:[\s->r s++l s|o>'+'])
m#s=s++filter(`notElem`s)(m s)
('*':t)%m=t%until(\s->s==m#s)(m#)
s%m=(m,s)

Try it online!

To get rid of one byte, I replaced import Data.List; m#s=nub$s++m s with m#s=s++filter(`notElem`s)(m s). These functions aren't equivalent if there are duplicate elements in either s of m s. The new function does, however, remove all elements from m s that already exist in s, so until still terminates once no new suffixes are discovered by the application of m.

Ungolfed Code

import Data.List
match :: String -> String -> Bool
match r s =elem ""$(fst $ parseRegex r)[s]
parseRegex :: String -> ([String] -> [String], String)
parseRegex ('_':t) = parseKleene id t
parseRegex (c:t) | c >= 'a' = parseKleene (>>=p) t
 where p (c':t')| c==c' = [t']
 p _ = []
parseRegex ('(':t) =
 let (l, (o:t')) = parseRegex t in
 let (r, (_:t'')) = parseRegex t' in
 parseKleene (if o=='+' then (r.l) else (\ss-> (r ss)++(l ss))) t''
parseKleene :: ([String] -> [String]) -> String -> ([String] -> [String], String)
parseKleene p ('*':t) = parseKleene p' t
 where
 p' ss
 | ss' <- nub$ p ss,
 ss /= ss' = ss ++ (p' ss')
 | otherwise = ss
parseKleene p s = (p,s)

GolfScript, 198 bytes

I was able to beat my Haskell solution by implementing the first algorithm I tried in GolfScript instead of Haskell. I don't think it's interesting enough for a separate answer, so I'll just leave it here. There are likely some golfing opportunities since I learned GolfScript just for this.

This solution is in the form of a block that expects the test string on the top of the stack followed by the regex string.

{[.;]1円+{(.96>{0[\]}{2%0{r\(\r\:s;[@]s(;\}i}{if}:i~\{(.3%}{;2円[\]\}until[.;]\+\}:r~\;{.{(.{.4%{2%{)@\m\)\;m}{)\;{.@.@m 1$|.@={\;}{\o}i}:o~}i}{;)@.@m@@\)\;m|}i}{;(:c;;{,},{(\;c=},{(;}%}i}{;}i}:m~""?}

Try it online!

Fantastic work and great golfing! It even works with ((((a|d)|(g|j))|((((c|f)|i)+((a|d)|(g|j))*)+((b|e)|h)))|(((((b|e)|h)|((((c|f)|i)+((a|d)|(g|j))*)+((c|f)|i)))+(((a|d)|(g|j))|((((b|e)|h)+((a|d)|(g|j))*)+((c|f)|i)))*)+(((c|f)|i)|((((b|e)|h)+((a|d)|(g|j))*)+((b|e)|h)))))*, the regex that tests decimal numbers for divisibility by 3 (where 0-9 is replaced with a-j).
@Deadcode thanks for bringing some attention to my solution. It's good to know that people are seeing it since I put a good bit of time into trying out different golfing strategies.
@Deadcode The way I understand the problem, that pattern should be valid: _ is defined as a regex and r* is defined to be a regex for all regex r, so we can chain * as much as we like on top of _. With regards to hanging, I think (but am not confident) that it will terminate but in an embarrassingly exponential amount of time.
@Deadcode The easy fix required importing Data.List for nub, but I'm fairly confident the problem is solved.

Stack Exchange Network

Check if simple regex matches string

13 Answers 13

Haskell, 203 bytes

Ungolfed Code

GolfScript, 198 bytes

APL (Dyalog Unicode), 39 bytes ^SBCS

APL (Dyalog Unicode), ^{(削除) 295 (削除ここまで)} 277 bytes

Concept

Ungolfed, with comments

Final note

Python 3, 69 bytes

JavaScript (ES6), 45 bytes

R, (削除) 55 (削除ここまで) 75 bytes

Retina, (削除) 38 (削除ここまで) 35 bytes

Java 8, 55 bytes

PHP, (削除) 983 (削除ここまで) (削除) 976 (削除ここまで) (削除) 954 (削除ここまで) (削除) 930 (削除ここまで) (削除) 910 (削除ここまで) (削除) 892 (削除ここまで) 838 bytes

Ungolfed

Perl 5, 34 + `-p` flag = 35 bytes

Perl 5, 20 + `-p` flag = 21 bytes (looser, obsolete interpretation of question)

Perl 5, 47 + `-alp` flag = 50 bytes

Perl 5, 41 + `-alp` flag = 44 bytes

C# (Visual C# Interactive Compiler), 522 bytes

Japt `-¡`, 13 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Check if simple regex matches string

13 Answers 13

Haskell, 203 bytes

Ungolfed Code

GolfScript, 198 bytes

APL (Dyalog Unicode), 39 bytes SBCS

APL (Dyalog Unicode), (削除) 295 (削除ここまで) 277 bytes

Concept

Ungolfed, with comments

Final note

Python 3, 69 bytes

JavaScript (ES6), 45 bytes

R, (削除) 55 (削除ここまで) 75 bytes

Retina, (削除) 38 (削除ここまで) 35 bytes

Java 8, 55 bytes

PHP, (削除) 983 (削除ここまで) (削除) 976 (削除ここまで) (削除) 954 (削除ここまで) (削除) 930 (削除ここまで) (削除) 910 (削除ここまで) (削除) 892 (削除ここまで) 838 bytes

Ungolfed

Perl 5, 34 + -p flag = 35 bytes

Perl 5, 20 + -p flag = 21 bytes (looser, obsolete interpretation of question)

Perl 5, 47 + -alp flag = 50 bytes

Perl 5, 41 + -alp flag = 44 bytes

C# (Visual C# Interactive Compiler), 522 bytes

Japt -¡, 13 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

APL (Dyalog Unicode), 39 bytes ^SBCS

APL (Dyalog Unicode), ^{(削除) 295 (削除ここまで)} 277 bytes

Perl 5, 34 + `-p` flag = 35 bytes

Perl 5, 20 + `-p` flag = 21 bytes (looser, obsolete interpretation of question)

Perl 5, 47 + `-alp` flag = 50 bytes

Perl 5, 41 + `-alp` flag = 44 bytes

Japt `-¡`, 13 bytes