A simple regex is either:
_(which matches the empty string)- Any lowercase letter
atoz(which matches that letter) r*, whereris a regex (which matchesrany number of times)(r|s), whererandsare regexes (which matches eitherrors)(r+s), whererandsare regexes (which matchesrfollowed bys)
Note that due to the recursive definition, * can occur multiple times in a row.
Here are some examples of regexes and their matches:
(a+b)matches onlyab((a|_)+b*)matches ,a,b,ab,bb,abb, but notbaaa,aab(c+(o+((l+(o+(u|_)))+r)))matches onlycolorandcolour(a|b)*matches only strings containing lettersaandb(so ,ab,bab, but notabc)(_***|(a+b***))matches only the empty string orafollowed by any number ofbs.
Your task is to write a program that takes such a regex and a string of lowercase letters, and outputs whether or not the regex matches the entire string (output should be as described here).
The shortest code in bytes wins.
13 Answers 13
Haskell, 203 bytes
Nobody had done this by implementing a small regex engine yet, and I felt like it had to be done. This obviously won't win. but I'm hoping it will inspire someone to write an even more golfed regex engine.
I've rewritten my solution to avoid directly parsing the regular expression into its AST. Instead, the parsing process constructs a function that is used to match a string against the input regex.
The main function is (&) :: String -> String -> Bool which takes a string representation of a regex and a string to test, returning a boolean value. This calls to the next function which handles most of the work in parsing the regex and matching the string.
Function p :: String -> ([String] -> [String], String) takes a string representation of a regex and returns as the first element of a tuple a function that returns a list of all possible unmatched suffixes of strings in the input list after satisfying the regex parsed from the input string. The regex fully matches the string if the empty string is contained in the list of possible unmatched suffixes.
r&s=elem""$fst(p r)[s]
p(c:t)|c>'`'=t% \s->[t|h:t<-s,c==h]|c>'^'=t%id|(l,o:t)<-p t,(r,_:u)<-p t=u%last(r.l:[\s->r s++l s|o>'+'])
m#s=s++filter(`notElem`s)(m s)
('*':t)%m=t%until(\s->s==m#s)(m#)
s%m=(m,s)
To get rid of one byte, I replaced import Data.List; m#s=nub$s++m s with m#s=s++filter(`notElem`s)(m s). These functions aren't equivalent if there are duplicate elements in either s of m s. The new function does, however, remove all elements from m s that already exist in s, so until still terminates once no new suffixes are discovered by the application of m.
Ungolfed Code
import Data.List
match :: String -> String -> Bool
match r s =elem ""$(fst $ parseRegex r)[s]
parseRegex :: String -> ([String] -> [String], String)
parseRegex ('_':t) = parseKleene id t
parseRegex (c:t) | c >= 'a' = parseKleene (>>=p) t
where p (c':t')| c==c' = [t']
p _ = []
parseRegex ('(':t) =
let (l, (o:t')) = parseRegex t in
let (r, (_:t'')) = parseRegex t' in
parseKleene (if o=='+' then (r.l) else (\ss-> (r ss)++(l ss))) t''
parseKleene :: ([String] -> [String]) -> String -> ([String] -> [String], String)
parseKleene p ('*':t) = parseKleene p' t
where
p' ss
| ss' <- nub$ p ss,
ss /= ss' = ss ++ (p' ss')
| otherwise = ss
parseKleene p s = (p,s)
GolfScript, 198 bytes
I was able to beat my Haskell solution by implementing the first algorithm I tried in GolfScript instead of Haskell. I don't think it's interesting enough for a separate answer, so I'll just leave it here. There are likely some golfing opportunities since I learned GolfScript just for this.
This solution is in the form of a block that expects the test string on the top of the stack followed by the regex string.
{[.;]1円+{(.96>{0[\]}{2%0{r\(\r\:s;[@]s(;\}i}{if}:i~\{(.3%}{;2円[\]\}until[.;]\+\}:r~\;{.{(.{.4%{2%{)@\m\)\;m}{)\;{.@.@m 1$|.@={\;}{\o}i}:o~}i}{;)@.@m@@\)\;m|}i}{;(:c;;{,},{(\;c=},{(;}%}i}{;}i}:m~""?}
-
3\$\begingroup\$ Fantastic work and great golfing! It even works with
((((a|d)|(g|j))|((((c|f)|i)+((a|d)|(g|j))*)+((b|e)|h)))|(((((b|e)|h)|((((c|f)|i)+((a|d)|(g|j))*)+((c|f)|i)))+(((a|d)|(g|j))|((((b|e)|h)+((a|d)|(g|j))*)+((c|f)|i)))*)+(((c|f)|i)|((((b|e)|h)+((a|d)|(g|j))*)+((b|e)|h)))))*, the regex that tests decimal numbers for divisibility by 3 (where0-9is replaced witha-j). \$\endgroup\$Deadcode– Deadcode2020年01月07日 09:58:29 +00:00Commented Jan 7, 2020 at 9:58 -
1\$\begingroup\$ @Deadcode thanks for bringing some attention to my solution. It's good to know that people are seeing it since I put a good bit of time into trying out different golfing strategies. \$\endgroup\$ankh-morpork– ankh-morpork2020年01月09日 00:03:32 +00:00Commented Jan 9, 2020 at 0:03
-
1\$\begingroup\$ @Deadcode The way I understand the problem, that pattern should be valid:
_is defined as a regex andr*is defined to be a regex for all regexr, so we can chain*as much as we like on top of_. With regards to hanging, I think (but am not confident) that it will terminate but in an embarrassingly exponential amount of time. \$\endgroup\$ankh-morpork– ankh-morpork2020年01月09日 01:05:02 +00:00Commented Jan 9, 2020 at 1:05 -
1\$\begingroup\$ @Deadcode The easy fix required importing
Data.Listfornub, but I'm fairly confident the problem is solved. \$\endgroup\$ankh-morpork– ankh-morpork2020年01月09日 15:21:25 +00:00Commented Jan 9, 2020 at 15:21 -
1\$\begingroup\$ You can change
%(\s->...)to% \s->...\$\endgroup\$H.PWiz– H.PWiz2020年01月10日 15:26:02 +00:00Commented Jan 10, 2020 at 15:26
APL (Dyalog Unicode), 39 bytes SBCS
Edit: now works with runs of * even after _
Full program. Prompts stdin for string and then for regex. Returns a list consisting of an empty list (by default, this prints as two spaces) for matches and an empty list (empty line) for non-matches.
(1⌽'$^','\*+' '_'⎕R'*' '()'⊢⍞~'+')⎕S⍬⊢⍞
Try it online! (output made easier to read by converting all output to JSON)
⍞ prompt stdin (for string)
⊢ on that, apply the following:
(...)⎕S⍬ PCRE Search for the following, returning an empty list for each match
~'+' remove all plusses from the following:
⍞ prompt stdin (for regex)
⊢ on that, apply the following:
'\*+' '_'⎕R'*' '()' PCRE Replace runs of * with * and _ with ()
'$^', prepend dollar sign and caret (indicating end and start)
1⌽ rotate the first character ($) to the end
-
\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:40:35 +00:00Commented Jan 9, 2020 at 5:40 -
\$\begingroup\$ @Deadcode Fixed. \$\endgroup\$Adám– Adám2020年01月09日 06:46:34 +00:00Commented Jan 9, 2020 at 6:46
APL (Dyalog Unicode), (削除) 295 (削除ここまで) 277 bytes
a←819⌶⎕A
E←{⍵{(⍺⊆⍨~⍵),⍺[⍸⍵]}(⍵∊'|+')∧0=+\-⌿'()'∘.=⍵}1↓ ̄1↓⊢
M←{c←⊃⌽⍵⋄c∊'0',a:0⋄c∊'_*':1⋄r s o←E⍵⋄o='|':∨/∇ ̈r s⋄∧/∇ ̈r s}
D←{c←⊃⌽⍵⋄c∊'0_':'0'⋄c=⍺:'_'⋄c∊a:'0'⋄c='*':1⌽∊')('(⍺∇ ̄1↓⍵)'+'⍵⋄r s o←E⍵⋄o='|':1⌽∊')('(⍺∇r)'|',⍺∇s⋄M r:1⌽∊')(('(⍺∇r)'+'s')|',⍺∇s⋄1⌽∊')('(⍺∇r)'+'s}
{M⊃D/(⌽⍵),⊂⍺}
-18 bytes thanks to @ngn.
This is a proof of concept that we can do a "simple regex matching" without any backtracking, thus avoiding possible infinite loops due to _* or r**. This is also a showcase that APL is a general-purpose programming language.
The anonymous function at the last line does the regex matching; use it as (regex) f (input string). The return value is 1 if the match is successful, 0 otherwise.
Concept
Given a simple regex R and the first character c of input string, we can construct (or derive) another simple regex R' that matches exactly the strings s where the original R matches c+s.
$$ \forall R \in \text{simple regex}, c \in \text{[a-z]}, s \in \text{[a-z]*}, \\ \exists R' \in \text{simple regex}, R' =\sim s \iff R =\sim c+s $$
Combine this with a tester which checks if r matches an empty string (epsilon), and we get a fully working simple regex matcher: given a regex \$ R_0 \$ and string \$ s = c_1 c_2 \cdots c_n \$, sequentially derive \$ R_0, c_1 \rightarrow R_1, c_2 \rightarrow R_2 \cdots \rightarrow R_n \$ and then test if \$ R_n \$ matches epsilon.
My code uses the following algorithm for testing epsilon match (MatchEps) and computing R' from R and c (Derive).
T = True, F = False
0 = null regex (never matches)
_ = "empty string" regex
a = single-char regex
r, s = any (sub-)regex
MatchEps :: regex -> bool
MatchEps 0 = F # Null regex can't match empty string
MatchEps _ = T # Empty-string regex trivially matches empty string
MatchEps a = F # Single-char can't match
MatchEps r* = T # Kleene matches as zero iteration
MatchEps (r|s) = MatchEps r or MatchEps s
MatchEps (r+s) = MatchEps r and MatchEps s
Derive :: char -> regex -> regex
# No matching string at all
Derive c 0 = 0
# _ can't match any string that starts with c
Derive c _ = 0
# Single-char regex only matches itself followed by empty string
Derive c a = if c == 'a' then _ else 0
# r* matches either _ or (r+r*);
# _ can't start with c, so it must be first `r` of (r+r*) that starts with c
Derive c r* = ([Derive c r]+r*)
# r or s; simply derive from r or derive from s
Derive c (r|s) = ([Derive c r]|[Derive c s])
# r followed by s; it matters if r can match _
Derive c (r+s) =
# if r matches _, either [r starts with c] or [r matches _ and s starts with c]
if MatchEps r then (([Derive c r]+s)|[Derive c s])
# otherwise, r always starts with c
else ([Derive c r]+s)
Ungolfed, with comments
⍝ Unwrap single layer of (...) and extract (r, s, op) from (r|s) or (r+s)
ExtractRS←{⍵{(⍺⊆⍨~⍵),⍺[⍸⍵]}(⍵∊'|+')∧0=+\-⌿'()'∘.=⍵}1↓ ̄1↓⊢
⍝ 1↓ ̄1↓⊢ Drop the outermost ()
⍝ {...} Pass the result to the function as ⍵...
⍝ +\-⌿'()'∘.=⍵ Compute the layers of nested ()s
⍝ (⍵∊'|+')∧0= Locate the operator (`|` or `+`) as bool vector
⍝ ⍵{...} Pass to inner function again ⍵ as ⍺, above as ⍵
⍝ ⍺[⍸⍵] Extract the operator
⍝ (⍺⊆⍨~⍵), Prepend the left and right regexes
⍝ Tests if the given regex matches an empty string (epsilon, eps)
MatchEps←{
c←⊃⌽⍵ ⍝ Classify the regex by last char
c∊'0',819⌶⎕A:0 ⍝ 0(no match) or lowercase: false
c∊'_*':1 ⍝ _(empty) or Kleene: true
r s op←ExtractRS ⍵ ⍝ The rest is (r|s) or (r+s); extract it
op='|': ∨/∇ ̈r s ⍝ (r|s): r =~ eps or s =~ eps
∧/∇ ̈r s ⍝ (r+s): r =~ eps and s =~ eps
}
⍝ Derives regex `R'` from original regex `R` and first char `c`
Derive←{
c←⊃⌽⍵ ⍝ Classify the regex by last char
c∊'0_':,'0' ⍝ 0 or _ doesn't start with any c
c=⍺:,'_' ⍝ Single char that matches
c∊819⌶⎕A:'0' ⍝ Single char that doesn't match
c='*': '(',(⍺∇ ̄1↓⍵),'+',⍵,')' ⍝ One char from Kleene: (R*)' = (R'+R*)
r s op←ExtractRS ⍵ ⍝ Extract (r|s) or (r+s)
op='|': '(',(⍺∇r),'|',(⍺∇s),')' ⍝ (r|s): one char from either branch
MatchEps r: '((',(⍺∇r),'+',s,')|',(⍺∇s),')' ⍝ (r+s) and r =~ eps: ((r'+s)|s')
'(',(⍺∇r),'+',s,')' ⍝ (r+s) but not r =~ eps: (r'+s)
}
⍝ Main function: Fold the string by Derive with initial regex,
⍝ and then test if the result matches eps
f←{MatchEps⊃Derive/(⌽⍵),⊂⍺}
Final note
This is not an original idea of mine; it is part of a series of exercises on a theorem proving textbook. I can claim that the algorithm is proven to work (because I did complete the correctness proofs), though I can't open the entire proof to public.
-
1\$\begingroup\$ Very cool solution. I remember working through those exercises when I took a software verification course - Lots of fun. \$\endgroup\$ankh-morpork– ankh-morpork2020年01月09日 16:33:04 +00:00Commented Jan 9, 2020 at 16:33
Python 3, 69 bytes
lambda r,s:re.match(re.sub('(?<=[_*])\*+|[_+]','',r)+'$',s)
import re
Simple - just convert it to ordinary regex, using ordinary regex!
-
\$\begingroup\$ I think this is actually a 56 byte answer, because you don't have to count the
f=for lambda answers, and can put the import after it: Try it online! \$\endgroup\$Deadcode– Deadcode2020年01月09日 00:18:42 +00:00Commented Jan 9, 2020 at 0:18 -
1\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:40:51 +00:00Commented Jan 9, 2020 at 5:40 -
2\$\begingroup\$ @Deadcode it may be two years late... but I've done it. \$\endgroup\$Miriam– Miriam2022年02月16日 22:44:10 +00:00Commented Feb 16, 2022 at 22:44
JavaScript (ES6), 45 bytes
Takes input as (regex)(string). Returns a Boolean value.
Applies the straightforward method of removing [_+] from the simple regex to turn it into a standard regex.
r=>s=>!!s.match(`^${r.replace(/[_+]/g,"")}$`)
Or 43 bytes by returning either null or an object.
-
\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:41:56 +00:00Commented Jan 9, 2020 at 5:41
R, (削除) 55 (削除ここまで) 75 bytes
function(x,y)grepl(paste0("^",gsub("([+_]|(?<=\\*))\\**","",x,pe=T),"$"),y)
A function that takes a simple regex x and a vector of strings y and returns a vector of logical values the same length as y indicating whether x matches.
-
\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:40:45 +00:00Commented Jan 9, 2020 at 5:40 -
1\$\begingroup\$ @Deadcode fixed to allow this and
ab****. \$\endgroup\$Nick Kennedy– Nick Kennedy2020年01月09日 07:18:25 +00:00Commented Jan 9, 2020 at 7:18
Retina, (削除) 38 (削除ここまで) 35 bytes
*1A`
1G`
^
a`
_
()
\*+
*
"$-5"~`\+
Try it online! Takes the simple regex on the first line and the string to match on the second. Explanation:
*1A`
Delete the first line, but don't actually change the working string. The string to match still gets stored in the history, which allows us to refer to it later.
1G`
Keep only the first line.
^
a`
Prefix the a modifier to anchor the pattern to the whole string.
_
()
Turn the _s into ()s to match an empty string that can be "repeated" with *.
\*+
*
Reduces runs of * to a single *.
\+
Delete any +s.
"$-5"~`
Execute that as a stage, using the history as the working string.
-
\$\begingroup\$ As specified in the problem, this is supposed to match the entire string (implied
^...$around the pattern), but this is matching as if there is neither a^nor a$. \$\endgroup\$Deadcode– Deadcode2020年01月09日 00:24:25 +00:00Commented Jan 9, 2020 at 0:24 -
\$\begingroup\$ @Deadcode Sorry, I might not have noticed that edit by the time I finished answering the question. \$\endgroup\$Neil– Neil2020年01月09日 01:05:16 +00:00Commented Jan 9, 2020 at 1:05
-
\$\begingroup\$ Figured that might be it, which means you opened the question within within 27 minutes of when it was posted. \$\endgroup\$Deadcode– Deadcode2020年01月09日 01:09:30 +00:00Commented Jan 9, 2020 at 1:09
-
\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:42:49 +00:00Commented Jan 9, 2020 at 5:42 -
\$\begingroup\$ @Deadcode It being the weekend, I probably had the "Newest Questions" tab open all day. \$\endgroup\$Neil– Neil2020年01月09日 10:41:19 +00:00Commented Jan 9, 2020 at 10:41
Java 8, 55 bytes
r->s->s.matches(r.replaceAll("\\+|(_|(\\*))\\**","2ドル"))
Removes all +; all _ with zero or more trailing *; and changes all sequences of more than one subsequent * with a single *. Then it check if the String matches this modified regex. Note that in Java, the String#matches method implicitly adds a leading and trailing ^...$ to check the entire String.
-
\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:42:10 +00:00Commented Jan 9, 2020 at 5:42 -
1\$\begingroup\$ @Deadcode Both
_***andab****are fixed now at the cost of 15 bytes. Can probably be golfed a bit, though. \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2020年01月09日 07:50:36 +00:00Commented Jan 9, 2020 at 7:50
PHP, (削除) 983 (削除ここまで) (削除) 976 (削除ここまで) (削除) 954 (削除ここまで) (削除) 930 (削除ここまで) (削除) 910 (削除ここまで) (削除) 892 (削除ここまで) 838 bytes
<?php list(,$s,$i)=$argv;$p=0;$u=[2,[3,[2,[1,'('],$o=[2,[4,[2,&$u,[1,'|+']]],&$u],[1,')']],[1,'_'.join(range('a','z'))]],[5,[1,'*']]];m($o,$a);$s=$i;$p=0;echo m(o($a))&&$p==strlen($s);function m($m,&$a=[]){global$p,$s;$r=$p;$n=array_shift($m);foreach($m as$t){$b=[];if($n==1)if(($c=$s[$p]??0)&&strpos($t,$c)!==!1){$a[]=$c;$p++;return 1;}if($n==2){if(!m($t,$b)){$p=$r;return!1;}$a[]=$b;}if($n==3){if(m($t,$b)){$a[]=$b;return 1;}}if($n==4){k:$b=[];$r=$p;if(!m($t,$b)){$p=$r;return 1;}$a[]=$b;goto k;}if($n==5){if(m($t,$b))$a[]=$b;else{$a[]=[];$p=$r;}return 1;}if($n==6)return 1;}return $n==2?:$p!=$p=$r;}function o($a){$e=$b=u($a[1]);if($a[0]){$e=[2];foreach($a[0]as$u){$e[]=u($u[0]);$e[0]=$u[1][0]=='+'?2:3;}$e[]=$b;}return$e;}function u($u){$w=$u[0][0];$v=$w[0][0];$t=$v!='('?($v=='_'?[6,0]:[1,$v]):o($w[1]);return$u[1][0]==[]?$t:[4,$t];}
Ungolfed
<?php
list($dummy,$string,$user_test)=$argv;
$pointer = 0;
//production rules
$unit = [];
$char = ['char','_abcdefghijklmnopqrstuvwxyz'];
$separator = ['char','|+'];
$unit_and_separator = ['and',&$unit,$separator];
$operators_list = ['list',$unit_and_separator];
$operators = ['and',$operators_list,&$unit];
$open_bracket = ['char','('];
$close_bracket = ['char',')'];
$brackets = ['and',$open_bracket,$operators,$close_bracket];
$atom = ['or',$brackets,$char];
$star = ['opt',['char','*']];
$unit = ['and',$atom,$star];
$ast = [];
match($operators, $ast);
$user_regex = buildoperators($ast);
$user_ast = [];
$string = $user_test;
$pointer = 0;
// answer here 1=matched blank=not matched
echo match($user_regex, $user_ast)&&($pointer==strlen($string));
// recursive descent parser
function match($test_match, &$ast) {
global $pointer,$string;
$original_pointer = $pointer;
foreach (array_slice($test_match,1) as $test) {
switch ($test_match[0]) {
case 'and':
$sub_match = [];
$pass = match($test,$sub_match);
if (!$pass) {$pointer = $original_pointer;return false;}
$ast[] = $sub_match;
break;
case 'or':
$sub_match = [];
$pass = match($test, $sub_match);
if ($pass) {
$ast[] = $sub_match;
return true;
}
break;
case 'list':
do {
$sub_match = [];
$original_pointer=$pointer;
$pass = match($test, $sub_match);
if (!$pass) {
$pointer = $original_pointer;
return true;
}
$ast[] = $sub_match;
} while (true);
break;
case 'char':
$char = substr($string,$pointer,1);
if ($char && @strpos($test,$char)!==false) {
$ast[]=substr($string,$pointer,1);
$pointer++;
return true;
}
break;
case 'emptystring':
return true;
break;
case 'opt':
$pass = match($test, $sub_match);
if ($pass) {$ast[] = $sub_match;} else {$ast[] = []; $pointer = $original_pointer;}
return true;
break;
}
}
if ($test_match[0] == 'and') {
return true;
} else {
$pointer = $original_pointer;
return false;
}
}
// build user production rules
function buildoperators($ast) {
if ($ast[0]) {
$engine = ['and'];
foreach ($ast[0] as $unit_and_separator) {
$engine[] = buildunit($unit_and_separator[0]);
switch ($unit_and_separator[1][0]) {
case '+':
$engine[0]='and';
break;
case '|':
$engine[0]='or';
break;
}
}
$engine[] = buildunit($ast[1]);
} else {
$engine = buildunit($ast[1]);
}
return $engine;
}
function buildunit($unit) {
$star = !empty($unit[1][0]);
if ($star) {
return ['list',buildatom($unit[0][0])];
} else {
return buildatom($unit[0][0]);
}
}
function buildatom($atom) {
if ($atom[0][0]=='(') {
return buildoperators($atom[1]);
} elseif ($atom[0][0]=='_') {
return ['emptystring',''];
} else {
return ['char',$atom[0][0]];
}
}
-
\$\begingroup\$ Very cool that you did this, and welcome to PPCG! Seems it needs some bugfixes though. It is missing the implied
$at the end of the pattern, e.g.(a|b)matchesac. And it seems using*anywhere except after the outermost expression results in error messages, e.g.(a*|b)crashes, but(a|b)*works. \$\endgroup\$Deadcode– Deadcode2020年01月09日 00:11:58 +00:00Commented Jan 9, 2020 at 0:11 -
\$\begingroup\$ Also,
(((a+a)|a)+(a+b))should be able to backtrack and matchaab, but currently only((a|(a+a))+(a+b))is matchingaab. \$\endgroup\$Deadcode– Deadcode2020年01月09日 03:30:00 +00:00Commented Jan 9, 2020 at 3:30 -
\$\begingroup\$ Thanks @Deadcode. Both bugs are fixed in the golfed version. It's larger, not sure how much more I can squeeze from the lemon. The ungolfed version was working correctly with (a*|b). The backtracking is not compatible with my overly simplistic recursive descent parser. As you've noted treat it gently and check shorter productions first to make it work. \$\endgroup\$Guillermo Phillips– Guillermo Phillips2020年01月09日 11:55:50 +00:00Commented Jan 9, 2020 at 11:55
-
\$\begingroup\$ Just a few extras. It's worth noting that the original question did not specify the behaviour of something like a+b+c|d, since there are no precedence rules. My code will accept it, but treat it as a|b|c|d, which is probably undesirable. Also, my regex engine makes up for its deficiency in that it is fact more powerful than conventional regex, as it also returns an abstract syntax tree of the results (in $user_ast). \$\endgroup\$Guillermo Phillips– Guillermo Phillips2020年01月09日 12:15:29 +00:00Commented Jan 9, 2020 at 12:15
Perl 5, 34 + -p flag = 35 bytes
Full program. Takes the simple regex pattern, followed by string to match against, from stdin as two separate lines, then loops and does it again, until EOF is encountered. Prints 1 for a match or nothing for a non-match (with no newline in either case).
@ankh-morpork has pointed out that technically, given the question's description of simple regexes, any number of * in a row makes a valid simple regex. @Bubbler has pointed out that _* also needs to work (and be equivalent to _). The other answers haven't taken these things into account yet, but I will do so:
s/[_+]/()/g;s/\*+/*/g;$_=<>=~/^$_/
To allow simple regexes such as (_***+a) to work, _ is changed to () instead of . For golf reasons, + is also changed to (), although changing it to would work.
This solution exploits the fact that valid input won't contain newlines, input can be assumed to be valid, and both the implicit <> (from -p) and the explicit <> include the terminating newline read from stdin, thus $ doesn't need to be added at the end of the regex (as both the pattern and the string ), only ^ needs to be inserted at the beginning.
Perl 5, 20 + -p flag = 21 bytes (looser, obsolete interpretation of question)
y/_+//d;$_=<>=~/^$_/
Like most of the other solutions, deletes the _ and + characters to turn the simple regex into a standard regex. This means that the simple regex (_*+a) will not work, as it becomes (*a) after the deletion. Anything containing ** won't work either; in standard regex, an already-quantified expression cannot be quantified again.
-
\$\begingroup\$ You don't need to count the flags into bytes. Instead, you can write e.g.
Perl 5 `-p`, 34 bytes. \$\endgroup\$Bubbler– Bubbler2020年01月09日 05:31:29 +00:00Commented Jan 9, 2020 at 5:31 -
\$\begingroup\$ @Bubbler I know, but I've intentionally done this so that it is considered to be answer in the "Perl" language, and thus directly comparable to other answers in this language, rather than being in a different language, "Perl +
-p". \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:38:23 +00:00Commented Jan 9, 2020 at 5:38
Perl 5, 47 + -alp flag = 50 bytes
$_=$F[0];s/[_+]/()/g;s/\*+/*/g;$_=$F[1]=~/^$_$/
Perl 5, 41 + -alp flag = 44 bytes
Obsolete: it not supports _***-like regexes
$_=eval'$F[1]=~/^'.($F[0]=~y/_+//rd).'$/'
-
1\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:42:25 +00:00Commented Jan 9, 2020 at 5:42 -
\$\begingroup\$ @Deadcode Fixed. \$\endgroup\$Denis Ibaev– Denis Ibaev2020年01月09日 18:33:14 +00:00Commented Jan 9, 2020 at 18:33
C# (Visual C# Interactive Compiler), 522 bytes
a=>b=>{int y=0,e=a.Length,k=0,o,q;var z=new int[e];for(;k<e;k++)if(a[k]<41){for(o=q=1;o>0;q++)o+=a[q+k]<41?1:a[q+k]>41?0:-1;z[k+--q]=k+1;z[k]=k+q+2;}void t(string s,int j){for(;j<e;){var l=a[j++];var w=j<e&&a[j]==42;if(j>1&&a[j-2]<41)for(int r=j,d=0;r<z[j-2]-1;)if(a[r++]>123&z.Take(r).Skip(j-2).Count(x=>x>0)%2>0)t(s,r);if(l>96&l<124)do{if(w)t(s,j+1);if(s==""||s[0]!=l)return;s=s[1..];}while(w);if(l==42&&a[j-2]==41||l<41&z[j-1]<=e&&a[z[j-1]-1]==42)t(s,z[j-1]);j=l>123?a.IndexOf(')',j)+1:j;}y=s==""?1:y;}t(b,0);return y;}
Saved 6 bytes thanks to ceilingcat
-
\$\begingroup\$
_***is a valid simple regex. Please revise your answer to allow it. \$\endgroup\$Deadcode– Deadcode2020年01月09日 05:41:22 +00:00Commented Jan 9, 2020 at 5:41
_*(|)? \$\endgroup\$*in a row results in a valid simple regex; for example,qa***zshould matchqzandqaaz. Is this the definition we should stick to? If so, 8+ answers will need to be edited, as they are currently incorrect. \$\endgroup\$_*is also a valid "simple regex" that matches only an empty string. Deleting_would leave single*, which is a grammar error in most regex flavors (if not all). \$\endgroup\$