Message227508
| Author |
serhiy.storchaka |
| Recipients |
ezio.melotti, mrabarnett, pitrou, serhiy.storchaka |
| Date |
2014年09月25日.06:56:26 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1411628187.04.0.177569242065.issue22491@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly recommended to support the '\R' pattern which matches all line separators (equivalent to '(?:\\r\n|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]').
>>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)]
[1, 2, 4] # should be [0, 2, 3, 4]
>>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)]
[0, 2, 3] # should be [0, 2, 3, 4]
>>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M|re.S)]
['\r', '\n', '\n', '\r'] # should be ['\r\n', '\n', '\r']
>>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')]
[] # should be ['\r\n', '\n', '\r']
[1] http://www.unicode.org/reports/tr18/#RL1.6 |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2014年09月25日 06:56:27 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, pitrou, ezio.melotti, mrabarnett |
| 2014年09月25日 06:56:27 | serhiy.storchaka | set | messageid: <1411628187.04.0.177569242065.issue22491@psf.upfronthosting.co.za> |
| 2014年09月25日 06:56:26 | serhiy.storchaka | link | issue22491 messages |
| 2014年09月25日 06:56:26 | serhiy.storchaka | create |
|