Your task
Given a simple regular expression, you have to count how many strings of length n have a match of length n with the given simple regex. This will just be a subset of regexs. Like, no lookaheads or named groups or recursion or whatever weird things regexs have.
Simple regular expression
For the purposes of this challenge, a regex is said to be simple if, for one, it only contains characters in the ASCII range 32-126. Furthermore, it should only use the following functionalities:
- match literal characters, much like the regex
abcwould only match the string "abc"; - match options, like
abc|defwould match "abc" and "def"; - match exactly 0 or 1 occurrence of something, e.g.
https?matches "http" and "https"; - match 1 or more occurrences of something, e.g.
ah+would match "ah", "ahh", "ahhh", "ahhhh", etc; - match any amount of occurrences of something, e.g.
1*matches "", "1", "11", "111", "1111", etc; - match between
nandmoccurrences of something, e.g.lo{1,4}lmatches only "lol", "lool", "loool" and "looool". Ifnis ommited, it matches up tomoccurrences. Ifmis ommited, it matches at leastnoccurrences. Assume at least one ofnormis present; - use
()to group, e.g.ab(c|d)efwould match "abcef" and "abdef" (c.f. 2nd item in this list) or(10)+would match "10", "1010", "101010", "10101010", etc; - use
.to match any character (in the ASCII range[32, 126]), soab.would match "abc", "ab9", "ab)", etc; - use
\to escape the special meaning of a character, e.g.ab?would match "a" and "ab", whileab\?only matches "ab?"; - use
[]as a group of possible characters. Inside the brackets, all characters lose their special behaviours, except for-and\. This means that, for one,ab[cde]is shorthand forab(c|d|e)and secondly,ab[?+*]matches "ab?", "ab+" and "ab*"; also related to[]: - use
-to specify a character range within brackets. The ranges you have to support area-z,A-Zand0-9, as well as their subsets, likeh-zor3-8. E.g., the regexab[c-g]matches "abc", "abd", "abe", "abf" and "abg"; Note that-has no special meaning outside of[]soa-zwould only match "a-z".
Input
The input for your program/function/routine/etc should be a string representing the regex and an integer n. For the regex, you can further assume:
- all characters that show up are in the ASCII range
[32, 126] - if
{n,m}is used, then \$n \leq m \$ - if
-is used inside[]then the specified range is well-formed
Output
The number of strings of length n that match the given regex. You only have to account for characters in the ASCII range [32, 126].
Test cases
".*", 0 -> 1
".*", 1 -> 95
".*", 2 -> 9025
".*", 3 -> 857375
".*", 4 -> 81450625
"abc", 2 -> 0
"abc", 4 -> 0
"ab|ac|ad", 2 -> 3
"a(b|c)", 2 -> 2
"hell(o|oo)", 5 -> 1
"https?", 5 -> 1
"ho{1,4}ly", 6 -> 1
"ho{3,}ly", 137 -> 1
"[abcde]{,2}", 2 -> 25
"(10)+", 7 -> 0
"(10)+", 8 -> 1
"ab\?", 3 -> 1
"[t7]h[i1][s5] is c[0o]d[Ee3] g[0oO][l1L]f", 17 -> 432
"\+351 9[1236] [0-9]{3,3} [0-9]{2,2} [0-9][0-9]", 17 -> 40000000
"-", 1 -> 1
"\\", 1 -> 1
"[+?*]", 1 -> 3
"Abc([d-z]*|(.H)+)", 11 -> 5132188812826241
"ab|ab", 2 -> 1
".(.(.(.(.|a))))|hello", 5 -> 7737809375
This is code code-golf so shortest solution in bytes, wins. If you like this challenge, consider upvoting it... And happy golfing!
4 Answers 4
Java 10, (削除) 356 (削除ここまで) (削除) 342 (削除ここまで) (削除) 284 (削除ここまで) (削除) 275 (削除ここまで) (削除) 233 (削除ここまで) (削除) 225 (削除ここまで) 220 bytes
import java.util.*;List S;r->k->{S=new Stack();p("",k);k=0;for(var p:S)k+=(p+"").matches(r.replaceAll("\\{(,\\d+\\})","{01ドル"))?1:0;return k;}void p(String p,int k){if(k<1)S.add(p);else for(char i=32;i<127;)p(p+i++,k-1);}
Works, but way too slow for \$n\geq4\$.
Explanation:
import java.util.*; // Required import for Stack and List
List S; // List on class-level, uninitialized
k->r->{ // Method with integer and String parameter and integer return-type
S=new Stack(); // Create a new List
p("",k); // Put all permutations of length `k` consisting of printable ASCII
// characters in this list
k=0; // Re-use `k` as counter-integer, starting at 0
for(var p:S) // Loop over all permutations in the List:
k+= // Increase the counter by:
(p+"") // Convert the permutation from Object to String
.matches(r // If it matches the input-regex,
.replaceAll("\\{(,\\d+\\})","{01ドル"))?
// after we've replaced all {,m} with {0,m}
1 // Increase the counter by 1
: // Else:
0; // Leave the count the same by increasing with 0
return k;} // And return the counter as result
void p( // Separated method with two parameters and no return-type, where:
String p, // `p` is the prefix-String, starting empty
int k){ // `k` is the length the permutations we want to generate
if(k<1) // If `k` is 0:
S.add(p); // Add the current prefix-String to the List
else // Else:
for(char i=32;i<127;)
// Loop over the printable ASCII characters:
p( // And do a recursive call, with:
p+i++, // The prefix-String appended with the current character
k-1);} // And `k` decreased by 1
Some notes:
- Java's regex only supports
{n,m}and{n,}, since{,m}could be written as{0,m}. - And Java's
String#matchesbuiltin implicitly adds a leading^and trailing$to check the entire String.
-
1\$\begingroup\$ Will wait for further golfing and explanation :D \$\endgroup\$RGS– RGS2020年02月13日 09:33:56 +00:00Commented Feb 13, 2020 at 9:33
-
1\$\begingroup\$ Will leave a comment appearing to be commenting on the answer but actually end up being a meta continuation of a now existent joke. \$\endgroup\$2020年02月13日 10:09:24 +00:00Commented Feb 13, 2020 at 10:09
-
\$\begingroup\$ Good luck with that, hope it beats my brute-force solution! \$\endgroup\$RGS– RGS2020年02月13日 12:10:22 +00:00Commented Feb 13, 2020 at 12:10
-
\$\begingroup\$ @RGS I doubt it.. Java's regex uses
{n,m}instead of{n:m}, so I have to account for that. In addition, it only supports{n,}and{n,m}(since{,m}can be{0,m}). So some bytes are already lost there in comparison to your answer. And Python just has better/shorter builtins in general tbh.. :) \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2020年02月13日 12:14:29 +00:00Commented Feb 13, 2020 at 12:14
JavaScript (Node.js), 86 bytes
Another brute-force solution taking input as (n)(regex).
n=>g=(p,s='',c=127)=>n*!s[n-1]?--c>>5&&g(p,s+Buffer([c]))+g(p,s,c):!!s.match(`^${p}$`)
Commented
n => // n = target string length
g = ( // g is a recursive function taking:
p, // p = regex pattern
s = '', // s = current string
c = 127 // c = current ASCII code
) => //
n * !s[n - 1] ? // if n is not equal to 0 and s is less than n char. long:
--c >> 5 && // decrement c; if it's greater than or equal to 32:
g( // do a 1st recursive call where:
p, //
s + Buffer([c]) // the character with ASCII code c is added to s
// c is not passed so that it's reset to 127
) + //
g( // do a 2nd recursive call where:
p, s, c // this value of c is discarded
) //
: // else:
!!s.match(`^${p}$`) // return true (1) if s matches p, or false (0) otherwise
-
\$\begingroup\$ +1 Looking good! I see you also had to do the
^_$tweak :) \$\endgroup\$RGS– RGS2020年02月13日 14:30:20 +00:00Commented Feb 13, 2020 at 14:30 -
1\$\begingroup\$ I have the feeling you can contribute so many tips to the JavaScript tips page xD I.e. the
Buffer([c])to convert from integer to character. PS: Why do codegolfers always like to obscure their code? ;) The>>5can simply be>31(which imo makes it a lot more readable). Not that it matters here, since the byte-count remains the same, but stil. \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2020年02月13日 15:00:48 +00:00Commented Feb 13, 2020 at 15:00 -
1\$\begingroup\$ @KevinCruijssen
Buffer()doesn't exist in JS. It's a Node thing. So maybe we need a Node tips page, but I'm not sure there are enough Node-specific tricks to justify it. Or maybe we should include Node in Tips for Golfing in ECMAScript 6 and above. \$\endgroup\$Arnauld– Arnauld2020年02月13日 15:43:55 +00:00Commented Feb 13, 2020 at 15:43
Python 3, (削除) 139 (削除ここまで) (削除) 137 (削除ここまで) 129 bytes
Saved 2 naughty bytes thanks to @Kevin, then 6 thanks to @ovs and then 2 more by rewriting f, thanks to @wilkben!
lambda r,n:sum(1for s in map(''.join,i.product(map(chr,range(32,127)),repeat=n))if re.match(f"^{r}$",s))
import re,itertools as i
You can try the smaller test cases online. My solution generates all strings of length n with the generator g and then tries to match the regex to the whole string. If the test case is the regex r, I use the regex ^r$ to force the match to be on the whole string.
-
1\$\begingroup\$ This fails on any test case that uses the
{n:m}syntax, since Python regexes don't have that (see the[abcde]{:2}, 2 -> 0 Xin the results of your test suite). \$\endgroup\$Grimmy– Grimmy2020年02月13日 12:17:09 +00:00Commented Feb 13, 2020 at 12:17 -
1\$\begingroup\$ @Grimmy you are absolutely right! My problem was that I wrote the challenge believing the syntax I knew was
{n:m}and didn't even check it. Since there is only one other answer that also benefits from me correcting the syntax to{n,m}which was what I intended, I edited the OP. Thanks for being on the lookout! Now the test cases are fine :) \$\endgroup\$RGS– RGS2020年02月13日 12:36:18 +00:00Commented Feb 13, 2020 at 12:36 -
1\$\begingroup\$
gcan be 6 bytes shorter with some rearrangements:g=lambda n:[""][n:]or(s+chr(c+32)for c in range(95)for s in g(n-1)). \$\endgroup\$ovs– ovs2020年02月13日 16:15:00 +00:00Commented Feb 13, 2020 at 16:15 -
\$\begingroup\$ -8 bytes using itertools
lambda r,n:sum(1for s in map(''.join,i.product(map(chr,range(32,127)),repeat=n))if re.match(f"^{r}$",s)) import re,itertools as i\$\endgroup\$wilkben– wilkben2020年02月13日 18:32:07 +00:00Commented Feb 13, 2020 at 18:32
perl -E, 79 bytes
(,,ドル$")=@ARGV;@@="";@@=map{,ドル=$_;map,ドル.chr,32..126}@@for 1..,ドル;say~~grep/$"/,@@
-
1\$\begingroup\$ Welcome to this community! Can you please include a tio.run link, so that people can check your code passes the test cases? :) \$\endgroup\$RGS– RGS2020年02月13日 17:12:11 +00:00Commented Feb 13, 2020 at 17:12
lambda s,n:4\$\endgroup\$4. :P \$\endgroup\${n:m}withoutn(i.e."[abcde]{:2}", 2 -> 25) \$\endgroup\$