Count regex matches

Question 1

Your task

Given a simple regular expression, you have to count how many strings of length n have a match of length n with the given simple regex. This will just be a subset of regexs. Like, no lookaheads or named groups or recursion or whatever weird things regexs have.

Simple regular expression

For the purposes of this challenge, a regex is said to be simple if, for one, it only contains characters in the ASCII range 32-126. Furthermore, it should only use the following functionalities:

match literal characters, much like the regex abc would only match the string "abc";
match options, like abc|def would match "abc" and "def";
match exactly 0 or 1 occurrence of something, e.g. https? matches "http" and "https";
match 1 or more occurrences of something, e.g. ah+ would match "ah", "ahh", "ahhh", "ahhhh", etc;
match any amount of occurrences of something, e.g. 1* matches "", "1", "11", "111", "1111", etc;
match between n and m occurrences of something, e.g. lo{1,4}l matches only "lol", "lool", "loool" and "looool". If n is ommited, it matches up to m occurrences. If m is ommited, it matches at least n occurrences. Assume at least one of n or m is present;
use () to group, e.g. ab(c|d)ef would match "abcef" and "abdef" (c.f. 2nd item in this list) or (10)+ would match "10", "1010", "101010", "10101010", etc;
use . to match any character (in the ASCII range [32, 126]), so ab. would match "abc", "ab9", "ab)", etc;
use \ to escape the special meaning of a character, e.g. ab? would match "a" and "ab", while ab\? only matches "ab?";
use [] as a group of possible characters. Inside the brackets, all characters lose their special behaviours, except for - and \. This means that, for one, ab[cde] is shorthand for ab(c|d|e) and secondly, ab[?+*] matches "ab?", "ab+" and "ab*"; also related to []:
use - to specify a character range within brackets. The ranges you have to support are a-z, A-Z and 0-9, as well as their subsets, like h-z or 3-8. E.g., the regex ab[c-g] matches "abc", "abd", "abe", "abf" and "abg"; Note that - has no special meaning outside of [] so a-z would only match "a-z".

Input

The input for your program/function/routine/etc should be a string representing the regex and an integer n. For the regex, you can further assume:

all characters that show up are in the ASCII range [32, 126]
if {n,m} is used, then \$n \leq m \$
if - is used inside [] then the specified range is well-formed

Output

The number of strings of length n that match the given regex. You only have to account for characters in the ASCII range [32, 126].

Test cases

".*", 0 -> 1
".*", 1 -> 95
".*", 2 -> 9025
".*", 3 -> 857375
".*", 4 -> 81450625
"abc", 2 -> 0
"abc", 4 -> 0
"ab|ac|ad", 2 -> 3
"a(b|c)", 2 -> 2
"hell(o|oo)", 5 -> 1
"https?", 5 -> 1
"ho{1,4}ly", 6 -> 1
"ho{3,}ly", 137 -> 1
"[abcde]{,2}", 2 -> 25
"(10)+", 7 -> 0
"(10)+", 8 -> 1
"ab\?", 3 -> 1
"[t7]h[i1][s5] is c[0o]d[Ee3] g[0oO][l1L]f", 17 -> 432
"\+351 9[1236] [0-9]{3,3} [0-9]{2,2} [0-9][0-9]", 17 -> 40000000
"-", 1 -> 1
"\\", 1 -> 1
"[+?*]", 1 -> 3
"Abc([d-z]*|(.H)+)", 11 -> 5132188812826241
"ab|ab", 2 -> 1
".(.(.(.(.|a))))|hello", 5 -> 7737809375

This is code code-golf so shortest solution in bytes, wins. If you like this challenge, consider upvoting it... And happy golfing!

Question 2

Okay, I count 4 regex matches. Do I win? :P

Question 3

@Lyxal python, 12 bytes, lambda s,n:4

Question 4

Keg, 1 byte: 4. :P

Question 5

Suggested test case: one containing {n:m} without n (i.e. "[abcde]{:2}", 2 -> 25)

Question 6

Was the notion of simple regex an attempt at allowing someone to solve this by other means than brute force? (If yes, I think it's still much too complicated for that, as almost all key features of regular expressions are included.)

Question 7

Java 10, (削除) 356 (削除ここまで) (削除) 342 (削除ここまで) (削除) 284 (削除ここまで) (削除) 275 (削除ここまで) (削除) 233 (削除ここまで) (削除) 225 (削除ここまで) 220 bytes

import java.util.*;List S;r->k->{S=new Stack();p("",k);k=0;for(var p:S)k+=(p+"").matches(r.replaceAll("\\{(,\\d+\\})","{01ドル"))?1:0;return k;}void p(String p,int k){if(k<1)S.add(p);else for(char i=32;i<127;)p(p+i++,k-1);}

Works, but way too slow for \$n\geq4\$.

Try it online.

Explanation:

import java.util.*; // Required import for Stack and List
List S; // List on class-level, uninitialized
k->r->{ // Method with integer and String parameter and integer return-type
 S=new Stack(); // Create a new List
 p("",k); // Put all permutations of length `k` consisting of printable ASCII
 // characters in this list
 k=0; // Re-use `k` as counter-integer, starting at 0
 for(var p:S) // Loop over all permutations in the List:
 k+= // Increase the counter by:
 (p+"") // Convert the permutation from Object to String
 .matches(r // If it matches the input-regex,
 .replaceAll("\\{(,\\d+\\})","{01ドル"))?
 // after we've replaced all {,m} with {0,m}
 1 // Increase the counter by 1
 : // Else:
 0; // Leave the count the same by increasing with 0
 return k;} // And return the counter as result
void p( // Separated method with two parameters and no return-type, where:
 String p, // `p` is the prefix-String, starting empty
 int k){ // `k` is the length the permutations we want to generate
 if(k<1) // If `k` is 0:
 S.add(p); // Add the current prefix-String to the List
 else // Else:
 for(char i=32;i<127;)
 // Loop over the printable ASCII characters:
 p( // And do a recursive call, with:
 p+i++, // The prefix-String appended with the current character
 k-1);} // And `k` decreased by 1

Some notes:

Java's regex only supports {n,m} and {n,}, since {,m} could be written as {0,m}.
And Java's String#matches builtin implicitly adds a leading ^ and trailing $ to check the entire String.

Question 8

Will wait for further golfing and explanation :D

Question 9

Will leave a comment appearing to be commenting on the answer but actually end up being a meta continuation of a now existent joke.

Question 10

Good luck with that, hope it beats my brute-force solution!

Question 11

@RGS I doubt it.. Java's regex uses {n,m} instead of {n:m}, so I have to account for that. In addition, it only supports {n,} and {n,m} (since {,m} can be {0,m}). So some bytes are already lost there in comparison to your answer. And Python just has better/shorter builtins in general tbh.. :)

Question 12

JavaScript (Node.js), 86 bytes

Another brute-force solution taking input as (n)(regex).

n=>g=(p,s='',c=127)=>n*!s[n-1]?--c>>5&&g(p,s+Buffer([c]))+g(p,s,c):!!s.match(`^${p}$`)

Try it online!

Commented

n => // n = target string length
 g = ( // g is a recursive function taking:
 p, // p = regex pattern
 s = '', // s = current string
 c = 127 // c = current ASCII code
 ) => //
 n * !s[n - 1] ? // if n is not equal to 0 and s is less than n char. long:
 --c >> 5 && // decrement c; if it's greater than or equal to 32:
 g( // do a 1st recursive call where:
 p, //
 s + Buffer([c]) // the character with ASCII code c is added to s
 // c is not passed so that it's reset to 127
 ) + //
 g( // do a 2nd recursive call where:
 p, s, c // this value of c is discarded
 ) //
 : // else:
 !!s.match(`^${p}$`) // return true (1) if s matches p, or false (0) otherwise

Question 13

+1 Looking good! I see you also had to do the ^_$ tweak :)

Question 14

I have the feeling you can contribute so many tips to the JavaScript tips page xD I.e. the Buffer([c]) to convert from integer to character. PS: Why do codegolfers always like to obscure their code? ;) The >>5 can simply be >31 (which imo makes it a lot more readable). Not that it matters here, since the byte-count remains the same, but stil.

Question 15

@KevinCruijssen Buffer() doesn't exist in JS. It's a Node thing. So maybe we need a Node tips page, but I'm not sure there are enough Node-specific tricks to justify it. Or maybe we should include Node in Tips for Golfing in ECMAScript 6 and above.

Question 16

Python 3, (削除) 139 (削除ここまで) (削除) 137 (削除ここまで) 129 bytes

Saved 2 naughty bytes thanks to @Kevin, then 6 thanks to @ovs and then 2 more by rewriting f, thanks to @wilkben!

lambda r,n:sum(1for s in map(''.join,i.product(map(chr,range(32,127)),repeat=n))if re.match(f"^{r}$",s))
import re,itertools as i

You can try the smaller test cases online. My solution generates all strings of length n with the generator g and then tries to match the regex to the whole string. If the test case is the regex r, I use the regex ^r$ to force the match to be on the whole string.

Question 17

This fails on any test case that uses the {n:m} syntax, since Python regexes don't have that (see the [abcde]{:2}, 2 -> 0 X in the results of your test suite).

Question 18

@Grimmy you are absolutely right! My problem was that I wrote the challenge believing the syntax I knew was {n:m} and didn't even check it. Since there is only one other answer that also benefits from me correcting the syntax to {n,m} which was what I intended, I edited the OP. Thanks for being on the lookout! Now the test cases are fine :)

Question 19

g can be 6 bytes shorter with some rearrangements: g=lambda n:[""][n:]or(s+chr(c+32)for c in range(95)for s in g(n-1)).

Question 20

-8 bytes using itertools

lambda r,n:sum(1for s in map(''.join,i.product(map(chr,range(32,127)),repeat=n))if re.match(f"^{r}$",s)) import re,itertools as i

Question 21

perl -E, 79 bytes

(,,ドル$")=@ARGV;@@="";@@=map{,ドル=$_;map,ドル.chr,32..126}@@for 1..,ドル;say~~grep/$"/,@@

Question 22

Welcome to this community! Can you please include a tio.run link, so that people can check your code passes the test cases? :)

Kevin Cruijssen 136k14 gold badges154 silver badges394 bronze badges · Accepted Answer · 2020-02-13 09:29:30Z

Java 10, (削除) 356 (削除ここまで) (削除) 342 (削除ここまで) (削除) 284 (削除ここまで) (削除) 275 (削除ここまで) (削除) 233 (削除ここまで) (削除) 225 (削除ここまで) 220 bytes

import java.util.*;List S;r->k->{S=new Stack();p("",k);k=0;for(var p:S)k+=(p+"").matches(r.replaceAll("\\{(,\\d+\\})","{01ドル"))?1:0;return k;}void p(String p,int k){if(k<1)S.add(p);else for(char i=32;i<127;)p(p+i++,k-1);}

Works, but way too slow for \$n\geq4\$.

Try it online.

Explanation:

import java.util.*; // Required import for Stack and List
List S; // List on class-level, uninitialized
k->r->{ // Method with integer and String parameter and integer return-type
 S=new Stack(); // Create a new List
 p("",k); // Put all permutations of length `k` consisting of printable ASCII
 // characters in this list
 k=0; // Re-use `k` as counter-integer, starting at 0
 for(var p:S) // Loop over all permutations in the List:
 k+= // Increase the counter by:
 (p+"") // Convert the permutation from Object to String
 .matches(r // If it matches the input-regex,
 .replaceAll("\\{(,\\d+\\})","{01ドル"))?
 // after we've replaced all {,m} with {0,m}
 1 // Increase the counter by 1
 : // Else:
 0; // Leave the count the same by increasing with 0
 return k;} // And return the counter as result
void p( // Separated method with two parameters and no return-type, where:
 String p, // `p` is the prefix-String, starting empty
 int k){ // `k` is the length the permutations we want to generate
 if(k<1) // If `k` is 0:
 S.add(p); // Add the current prefix-String to the List
 else // Else:
 for(char i=32;i<127;)
 // Loop over the printable ASCII characters:
 p( // And do a recursive call, with:
 p+i++, // The prefix-String appended with the current character
 k-1);} // And `k` decreased by 1

Some notes:

Java's regex only supports {n,m} and {n,}, since {,m} could be written as {0,m}.
And Java's String#matches builtin implicitly adds a leading ^ and trailing $ to check the entire String.

Will leave a comment appearing to be commenting on the answer but actually end up being a meta continuation of a now existent joke.
@RGS I doubt it.. Java's regex uses {n,m} instead of {n:m}, so I have to account for that. In addition, it only supports {n,} and {n,m} (since {,m} can be {0,m}). So some bytes are already lost there in comparison to your answer. And Python just has better/shorter builtins in general tbh.. :)

Stack Exchange Network

Count regex matches

Your task

Simple regular expression

Input

Output

Test cases

4 Answers 4

Java 10, (削除) 356 (削除ここまで) (削除) 342 (削除ここまで) (削除) 284 (削除ここまで) (削除) 275 (削除ここまで) (削除) 233 (削除ここまで) (削除) 225 (削除ここまで) 220 bytes

JavaScript (Node.js), 86 bytes

Commented

Python 3, (削除) 139 (削除ここまで) (削除) 137 (削除ここまで) 129 bytes

perl -E, 79 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Count regex matches

Your task

Simple regular expression

Input

Output

Test cases

4 Answers 4

Java 10, (削除) 356 (削除ここまで) (削除) 342 (削除ここまで) (削除) 284 (削除ここまで) (削除) 275 (削除ここまで) (削除) 233 (削除ここまで) (削除) 225 (削除ここまで) 220 bytes

JavaScript (Node.js), 86 bytes

Commented

Python 3, (削除) 139 (削除ここまで) (削除) 137 (削除ここまで) 129 bytes

perl -E, 79 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions