Groups in regular expressions don't repeat as expected

John Nagle nagle at animats.com
Sun Apr 24 15:43:41 EDT 2011


On 4/21/2011 6:16 AM, Neil Cerutti wrote:
> On 2011年04月20日, John Nagle<nagle at animats.com> wrote:
>> Findall does something a bit different. It returns a list of
>> matches of the entire pattern, not repeats of groups within
>> the pattern.
>>>> Consider a regular expression for matching domain names:
>>>>>>> kre = re.compile(r'^([a-zA-Z0-9\-]+)(?:\.([a-zA-Z0-9\-]+))+$')
>>>>> s = 'www.example.com'
>>>>> ms = kre.match(s)
>>>>> ms.groups()
>> ('www', 'com')
>>>>> msall = kre.findall(s)
>>>>> msall
>> [('www', 'com')]
>>>> This is just a simple example. But it illustrates an unnecessary
>> limitation. The matcher can do the repeated matching; you just can't
>> get the results out.
>> Thanks for the further explantion.
>> Assuming a fake API that returned multiple group matches as a
> tuple:
>>>> ? print(re.match(r"^([a-z])+$", "abcdef").groups())
> (('a', 'b', 'c', 'd', 'e', 'f'),)
>> I was thinking of applying findall something like this, but you
> have to make multiple calls:
>>>>> m = re.match(r"^[a-z]+$", s)
>>>> if m:
> ... print(re.findall(r"[a-z]", m.group()))
> ...
> ['a', 'b', 'c', 'd', 'e', 'f']
>> I can see that getting really annoying. Is there a better way to
> make multiple group matches accessible without adding a third
> element type as a group element?

 The most elegant solution would be to have a regular expression
function that returned a tree of tuples or lists. Then you could
express an entire language syntax as a regular expression and
get out a parse tree.
 Since the regular expression system is actually doing that work,
then discarding the results, it seems a reasonable extension.
I'm not suggesting extending regular expression matching itself,
just the way the results are stored.
				John Nagle


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /