This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年09月25日 06:56 by serhiy.storchaka, last changed 2022年04月11日 14:58 by admin.
| Messages (4) | |||
|---|---|---|---|
| msg227508 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年09月25日 06:56 | |
Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly recommended to support the '\R' pattern which matches all line separators (equivalent to '(?:\\r\n|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]').
>>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)]
[1, 2, 4] # should be [0, 2, 3, 4]
>>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)]
[0, 2, 3] # should be [0, 2, 3, 4]
>>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M|re.S)]
['\r', '\n', '\n', '\r'] # should be ['\r\n', '\n', '\r']
>>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')]
[] # should be ['\r\n', '\n', '\r']
[1] http://www.unicode.org/reports/tr18/#RL1.6
|
|||
| msg227523 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2014年09月25日 11:04 | |
For reference, the regex module normally considers the line ending to be '\n', but it has a WORD flag ('(?w)') that turns on the Unicode definition of a 'word' character as well as Unicode line separator.
|
|||
| msg348310 - (view) | Author: Zackery Spytz (ZackerySpytz) * (Python triager) | Date: 2019年07月22日 23:33 | |
> To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: It seems that large portions of Modules/_sre.c would have to be rewritten in order to do this. |
|||
| msg355473 - (view) | Author: Lewis Gaul (LewisGaul) * | Date: 2019年10月27日 15:32 | |
Hi there, I'm running 'EnHackathon' in a couple of weeks, and was wondering if this could be a good issue for a small team of first-time contributors with experience in C to work on. Would anyone be able to offer any guidance for where to start in Modules/_sre.c? |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:08 | admin | set | github: 66681 |
| 2019年10月27日 15:32:36 | LewisGaul | set | nosy:
+ LewisGaul messages: + msg355473 |
| 2019年07月22日 23:33:37 | ZackerySpytz | set | nosy:
+ ZackerySpytz messages: + msg348310 |
| 2014年09月25日 11:04:06 | mrabarnett | set | messages: + msg227523 |
| 2014年09月25日 06:56:27 | serhiy.storchaka | create | |