homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Better explain re.LOCALE and re.UNICODE for \S and \W
Type: behavior Stage: resolved
Components: Documentation Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: ezio.melotti, orsenthil, python-dev
Priority: low Keywords: patch

Created on 2012年03月12日 03:22 by orsenthil, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue14258.diff orsenthil, 2012年04月06日 05:38
Messages (5)
msg155434 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012年03月12日 03:22
Opening the this bug following this discussion - http://mail.python.org/pipermail/docs/2012-March/007829.html
library/re.html
\S
When the LOCALE and UNICODE flags are not specified, matches any non-whitespace character; this is equivalent to the set [^ \t\n\r\f\v] With LOCALE, it will match any character not in this set, and not defined as space in the current locale. If UNICODE is set, this will match anything other than [ \t\n\r\f\v] and characters marked as space in the Unicode character properties database.
This is wrong. With LOCALE set, it should be [^ \t\n\r\f\v] plus any non-space character in that locale.
msg155435 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年03月12日 03:37
New changeset 2d2a972b7523 by Senthil Kumaran in branch '2.7':
Fix closes issue14258 - added clarification to \W and \S flags
http://hg.python.org/cpython/rev/2d2a972b7523 
msg155437 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012年03月12日 03:44
This clarification is specific to Python 2.7. 
For Python3, the use of LOCALE flag is explicitly discouraged and
confusing references to it's meaning is not present in the docs.
msg157645 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012年04月06日 05:38
Well, I would like to correct this further and add clarification based on the current implementation (_sre.c)
The definition of LOCALE Space is this -
 #define SRE_LOC_IS_SPACE(ch) (!((ch) & ~255) ? isspace((ch)) : 0)
And the definition of NON_SPACE category is a negation of space. That's it.
Now, given that definition, we see for the character values higher than 255, the check is not made at all. Is it simple ascii isspace is considered when the LOCALE flag is set. And in effect, re.LOCALE flag has not extra effect on matching of space or non-white space character.
After realizing this, I propose the following changes attached in the patch as a documentation fix.
msg157978 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年04月10日 19:23
New changeset 4d49a2415ced by Senthil Kumaran in branch '2.7':
Fix closes Issue14258 - Clarify the re.LOCALE and re.UNICODE flags for \S class
http://hg.python.org/cpython/rev/4d49a2415ced 
History
Date User Action Args
2022年04月11日 14:57:27adminsetgithub: 58466
2012年04月10日 19:23:22python-devsetstatus: open -> closed
resolution: fixed
messages: + msg157978
2012年04月06日 05:38:26orsenthilsetstatus: closed -> open
files: + issue14258.diff
messages: + msg157645

keywords: + patch
resolution: fixed -> (no value)
2012年03月12日 03:44:20orsenthilsetmessages: + msg155437
2012年03月12日 03:37:58python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg155435

resolution: fixed
stage: resolved
2012年03月12日 03:23:41ezio.melottisetnosy: + ezio.melotti
2012年03月12日 03:22:06orsenthilcreate

AltStyle によって変換されたページ (->オリジナル) /