Nothing to repeat

Martin Gregorie martin at address-in-sig.invalid
Sun Jan 9 13:05:46 EST 2011


On 2011年1月09日 16:49:35 +0000, Tom Anderson wrote:
>> Any thoughts on what i should do? Do i have to bite the bullet and apply
> some cleverness in my pattern generation to avoid situations like this?
>This sort of works:
 
import re
f = open("test.txt")
p = re.compile("(spam*)*")
for line in f:
 print "input line: %s" % (line.strip())
 for m in p.findall(line):
 if m != "":
 print "==> %s" % (m)
when I feed it 
=======================test.txt===========================
a line with no match
spa should match
spam should match
so should all of spaspamspammspammm
and so should all of spa spam spamm spammm
no match again.
=======================test.txt===========================
it produces: 
input line: a line with no match
input line: spa should match
==> spa
input line: spam should match
==> spam
input line: so should all of spaspamspammspammm
==> spammm
input line: and so should all of spa spam spamm spammm
==> spa
==> spam
==> spamm
==> spammm
input line: no match again.
so obviously there's a problem with greedy matching where there are no 
separators between adjacent matching strings. I tried non-greedy 
matching, e.g. r'(spam*?)*', but this was worse, so I'll be interested to 
see how the real regex mavens do it.
-- 
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /