Message 227508 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	serhiy.storchaka
Recipients	ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
Date	2014年09月25日.06:56:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1411628187.04.0.177569242065.issue22491@psf.upfronthosting.co.za>

Content
Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly recommended to support the '\R' pattern which matches all line separators (equivalent to '(?:\\r\n\|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]'). >>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)] [1, 2, 4] # should be [0, 2, 3, 4] >>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)] [0, 2, 3] # should be [0, 2, 3, 4] >>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M\|re.S)] ['\r', '\n', '\n', '\r'] # should be ['\r\n', '\n', '\r'] >>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')] [] # should be ['\r\n', '\n', '\r'] [1] http://www.unicode.org/reports/tr18/#RL1.6

Content

Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly recommended to support the '\R' pattern which matches all line separators (equivalent to '(?:\\r\n|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]').
>>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)]
[1, 2, 4] # should be [0, 2, 3, 4]
>>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)]
[0, 2, 3] # should be [0, 2, 3, 4]
>>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M|re.S)]
['\r', '\n', '\n', '\r'] # should be ['\r\n', '\n', '\r']
>>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')]
[] # should be ['\r\n', '\n', '\r']
[1] http://www.unicode.org/reports/tr18/#RL1.6

History
Date	User	Action	Args
2014年09月25日 06:56:27	serhiy.storchaka	set	recipients: + serhiy.storchaka, pitrou, ezio.melotti, mrabarnett
2014年09月25日 06:56:27	serhiy.storchaka	set	messageid: <1411628187.04.0.177569242065.issue22491@psf.upfronthosting.co.za>
2014年09月25日 06:56:26	serhiy.storchaka	link	issue22491 messages
2014年09月25日 06:56:26	serhiy.storchaka	create

homepage