I have this regex:
(\d?)[A-Za-z](?=(\d?)(.*))(?=(.* ){5})
It would be used by me for Chess FEN strings.
I wonder if regex [A-Za-z]
is faster than [RNBQKrnbqk]
? I only need to check the given letters (but no other letters will appear).
My thoughts are:
[A-Za-z]
- the regex engine can match if a char is65 <= c <= 90
or97 <= c <= 122
. Worst case is 4 comparisons.[RNBQKrnbqk]
- the engine checks if the inspected char equals every given character in the group. Worst case is 10 comparisons.
Am I understanding regex correctly?
1 Answer 1
No. The matching for a-zA-Z
would be slower than the exact character-set you supply: RNBQKrnbqk
.
You can observe this behaviour by checking the backtrace it generates. I compared 3 different patterns, two being your own, and the third I found on chess.stackexchange.com:
(\d?)[a-z](?=(\d?)(.*))(?=(.* ){5})
has 64 matches generated in 14323 steps
(\d?)[rnbqk](?=(\d?)(.*))(?=(.* ){5})
has 32 matches generated in 7512 steps
and the pattern from chess.stackexchange:
([rnbqkp1-8]+\/){7}([rnbqkp1-8]+)\s[bw]\s(-|K?Q?k?q?)\s(-|[a-h][36])
has 2 matches generated in 95 steps
Note that I have enabled the ignorecase
flag in all three of them.
-
\$\begingroup\$ slight improvement to your code: ([rnbqkp1-8]+\/){7}([rnbqkp1-8]+) [bw] (-|KQ?k?q?|K?Qk?q?|K?Q?kq?|K?Q?k?q) (-|[a-h][36]) \$\endgroup\$Superluminal– Superluminal2018年08月02日 12:52:31 +00:00Commented Aug 2, 2018 at 12:52
-
\$\begingroup\$ also "\d+ \d+" make sense. It makes it somehow faster. regex101.com/r/ykc7s9/2 \$\endgroup\$Superluminal– Superluminal2018年08月02日 12:52:50 +00:00Commented Aug 2, 2018 at 12:52
-
\$\begingroup\$ @S.G. Adding multiple cases
KQkq
checks defeats the purpose ofK?Q?k?q?
pattern. \$\endgroup\$hjpotter92– hjpotter922018年08月02日 13:39:32 +00:00Commented Aug 2, 2018 at 13:39 -
\$\begingroup\$ yes, but K?Q?k?q? matches an empty String which is not a valid. For example " 2P5/6P1/P1R5/RP6/3r4/p5bp/4p3/6P1 b a6 17 63" is accepted but is not valid \$\endgroup\$Superluminal– Superluminal2018年08月02日 14:35:00 +00:00Commented Aug 2, 2018 at 14:35
-
\$\begingroup\$ regex101.com/r/ykc7s9/9 \$\endgroup\$Superluminal– Superluminal2018年08月02日 14:40:53 +00:00Commented Aug 2, 2018 at 14:40
[A-Za-z]
and[RNBQKrnbqk]
wont be noticeable. I'd take a look at your lookaheads (?=(\d?)(.*)) can match 0 or more of any character (\d?) matching 0 or 1 number so if there is no number .* matches. Also the {5} in (?=(.* ){5}) is redundant as .* can match nothing to the end of the input so {5} can match 5 lots of nothing \$\endgroup\$