regular expression problem

MRAB python at mrabarnett.plus.com
Sun Oct 28 16:04:37 EDT 2018


On 2018年10月28日 18:51, Karsten Hilbert wrote:
> Dear list members,
>> I cannot figure out why my regular expression does not work as I expect it to:
>> #---------------------------
> #!/usr/bin/python
>> from __future__ import print_function
> import re as regex
>> rx_works = '\$<[^<:]+?::.*?::\d*?>\$|\$<[^<:]+?::.*?::\d+-\d+>\$'
> # it fails if switched around:
> rx_fails = '\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$'
> line = 'junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk'
>> print ('')
> print ('line:', line)
> print ('expected: $<match_A::options A::4>$')
> print ('expected: $<match_B::options B::4-5>$')
>> print ('')
> placeholders_in_line = regex.findall(rx_works, line, regex.IGNORECASE)
> print('found (works):')
> for ph in placeholders_in_line:
> 	print (ph)
>> print ('')
> placeholders_in_line = regex.findall(rx_fails, line, regex.IGNORECASE)
> print('found (fails):')
> for ph in placeholders_in_line:
> 	print (ph)
>> #---------------------------
>> I am sure I simply don't see the problem ?
>Here are some of the steps while matching the second regex. (View this 
in a monospaced font.)
1:
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
^
2:
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
 ^
3:
The .*? matches as few characters as possible, initially none.
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
 ^
4:
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
 ^
At this point it can't match, so it backtracks.
5:
The .*? matches more characters, including the ":".
After more matching it's like the following.
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
 ^
6:
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
 ^
Again it can't match, so it backtracks.
7:
The .*? matches more characters, including the ":".
After more matching it's like the following.
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
 ^
8:
junk $<match_A::options A::4>$ junk $<match_B::options B::4-5>$ junk
 ^
\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$
 ^
Success!
The first choice has matched this:
$<match_A::options A::4>$ junk $<match_B::options B::4-5>$


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /