Message 307441 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	serhiy.storchaka
Recipients	Alcolo Alcolo, ezio.melotti, martin.panter, mrabarnett, r.david.murray, serhiy.storchaka
Date	2017年12月02日.17:37:24
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1512236244.88.0.213398074469.issue25054@psf.upfronthosting.co.za>

Content
Good point. Neither old nor new (which matches regex) behaviors conform the documentation: "Empty matches are included in the result unless they touch the beginning of another match." It is easy to exclude empty matches that touch the ending of another match. This would be consistent with the new behavior of split() and sub(). But this would break a one existing test for issue817234. Though that issue shouldn't rely on this detail. The test should just test that iterating doesn't hang. And this would break a regular expression in pprint. PR 4678 implements this version. I don't know what version is better. >>> list(re.finditer(r"\b\|:+", "a::bc")) [<re.Match object; span=(0, 0), match=''>, <re.Match object; span=(1, 1), match=''>, <re.Match object; span=(1, 3), match='::'>, <re.Match object; span=(5, 5), match=''>] >>> re.sub(r"(\b\|:+)", r"[1円]", "a::bc") '[]a[][::]bc[]' With PR 4471 the result of re.sub() is the same, but the result of re.finditer() is as in msg307424.

Content

Good point. Neither old nor new (which matches regex) behaviors conform the documentation: "Empty matches are included in the result unless they touch the beginning of another match." It is easy to exclude empty matches that touch the *ending* of another match. This would be consistent with the new behavior of split() and sub().
But this would break a one existing test for issue817234. Though that issue shouldn't rely on this detail. The test should just test that iterating doesn't hang.
And this would break a regular expression in pprint.
PR 4678 implements this version. I don't know what version is better.
>>> list(re.finditer(r"\b|:+", "a::bc"))
[<re.Match object; span=(0, 0), match=''>, <re.Match object; span=(1, 1), match=''>, <re.Match object; span=(1, 3), match='::'>, <re.Match object; span=(5, 5), match=''>]
>>> re.sub(r"(\b|:+)", r"[1円]", "a::bc")
'[]a[][::]bc[]'
With PR 4471 the result of re.sub() is the same, but the result of re.finditer() is as in msg307424.

History
Date	User	Action	Args
2017年12月02日 17:37:24	serhiy.storchaka	set	recipients: + serhiy.storchaka, ezio.melotti, mrabarnett, r.david.murray, martin.panter, Alcolo Alcolo
2017年12月02日 17:37:24	serhiy.storchaka	set	messageid: <1512236244.88.0.213398074469.issue25054@psf.upfronthosting.co.za>
2017年12月02日 17:37:24	serhiy.storchaka	link	issue25054 messages
2017年12月02日 17:37:24	serhiy.storchaka	create

homepage