homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unicode word boundries
Type: behavior Stage: resolved
Components: Regular Expressions Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, ezio.melotti, mrabarnett, revo
Priority: normal Keywords:

Created on 2016年08月27日 14:36 by revo, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (2)
msg273782 - (view) Author: mohammad (revo) Date: 2016年08月27日 14:36
According to [UAX #29](http://unicode.org/reports/tr29) - unicode word boundaries (rule WB5a), an apostrophe includes U+0027 ( ' ) APOSTROPHE and U+2019 ( ’ ) RIGHT SINGLE QUOTATION MARK (curly apostrophe).
However regex module only implements U+0027 and the second kind (U+2019) is missing:
/* Break between apostrophe and vowels (French, Italian). */
/* WB5a */
if (pos_m1 >= 0 && char_at(state->text, pos_m1) == '\'' &&
 is_unicode_vowel(char_at(state->text, text_pos)))
 return TRUE;
[Source code](https://bitbucket.org/mrabarnett/mrab-regex/src/f21447bf288780d8dd9b1633820480484ce8f677/regex_3/regex/_regex.c?at=default&fileviewer=file-view-default#_regex.c-1657)
msg273783 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016年08月27日 14:56
regex module is not in standard library, on the latest 3.6 branch re module breaks on curly apostrophe just fine. Perhaps, try reporting this issue on the bitbucket tracker?
History
Date User Action Args
2022年04月11日 14:58:35adminsetgithub: 72065
2016年08月27日 14:56:48SilentGhostsetstatus: open -> closed

versions: - Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
nosy: + SilentGhost

messages: + msg273783
resolution: not a bug
stage: resolved
2016年08月27日 14:36:09revocreate

AltStyle によって変換されたページ (->オリジナル) /