homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re module doesn't describe string boundaries for \b
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: Ron.Ridley, docs@python, eric.araujo, ezio.melotti, poolie, python-dev, ralph.corderoy
Priority: normal Keywords: easy, patch

Created on 2010年12月16日 01:05 by ralph.corderoy, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
20110822-1604-re-docs.diff poolie, 2011年08月22日 06:05
issue10713.diff ezio.melotti, 2012年02月27日 12:24 Patch against 3.2. review
Messages (8)
msg124097 - (view) Author: Ralph Corderoy (ralph.corderoy) Date: 2010年12月16日 01:05
The re module defines \b in a regexp to need \w one side and \W the other. What about when the end of the string or line is involved? perlre(1) says that's treated as a \W. Python should precisely document that case too.
msg135466 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年05月07日 15:09
Thanks for the report. Would you be interested in experimenting and/or reading the code to find the anwser and propose a doc patch?
msg135524 - (view) Author: Ralph Corderoy (ralph.corderoy) Date: 2011年05月08日 14:27
Examining the source of Ubuntu's python2.6 2.6.6-5ubuntu1 package
suggests beyond the limits of the string is considered \W, like Perl.
 Modules/_sre.c:
 336 LOCAL(int)
 337 SRE_AT(SRE_STATE* state, SRE_CHAR* ptr, SRE_CODE at)
 338 {
 339 /* check if pointer is at given position */
 340
 341 Py_ssize_t thisp, thatp;
 ...
 365 case SRE_AT_BOUNDARY:
 366 if (state->beginning == state->end)
 367 return 0;
 368 thatp = ((void*) ptr > state->beginning) ?
 369 SRE_IS_WORD((int) ptr[-1]) : 0;
 370 thisp = ((void*) ptr < state->end) ?
 371 SRE_IS_WORD((int) ptr[0]) : 0;
 372 return thisp != thatp;
SRE_IS_WORD() returns 16 for the 63 \w characters, 0 otherwise.
This is born out by tests.
Note, 366 above confirms it's never true for an empty string. The
documentation states that \B "is just the opposite of \b" yet
re.match(r'\b', '') returns None and so does \B so \B isn't the opposite
of \b in all cases.
msg142679 - (view) Author: Martin Pool (poolie) Date: 2011年08月22日 06:05
> Note, 366 above confirms it's never true for an empty string. The
documentation states that \B "is just the opposite of \b" yet
re.match(r'\b', '') returns None and so does \B so \B isn't the opposite
of \b in all cases.
This is also a bit strange if you follow the Perl line of reasoning of imagining there are non-word characters outside the string. And, indeed, in Perl, 
 "" =~ /\B/
is true.
So this patch adds some tests for \b behaviour and some docs. I think possible \B should actually change, but that would be a bigger (perhaps impossible?) change.
msg154470 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年02月27日 12:24
This is a new patch based on Martin work.
I don't think it's necessary to explain what happens while using r'\b' or r'\B' on an empty string in the doc -- that's not a common case and it might end up confusing users. I think however that a couple of examples might help them figuring out what they are useful for.
Mentioning that they work with the beginning/end of the string too is a reasonable request, so I tweaked the doc to point that out.
msg154479 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年02月27日 13:28
Like it.
msg154607 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年02月29日 09:50
New changeset fc89e09ca2fc by Ezio Melotti in branch '2.7':
#10713: Improve documentation for \b and \B and add a few tests. Initial patch and tests by Martin Pool.
http://hg.python.org/cpython/rev/fc89e09ca2fc
New changeset cde7fa40b289 by Ezio Melotti in branch '3.2':
#10713: Improve documentation for \b and \B and add a few tests. Initial patch and tests by Martin Pool.
http://hg.python.org/cpython/rev/cde7fa40b289
New changeset b78ca038e468 by Ezio Melotti in branch 'default':
#10713: merge with 3.2.
http://hg.python.org/cpython/rev/b78ca038e468 
msg154608 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年02月29日 09:51
Fixed, thanks for the patch!
History
Date User Action Args
2022年04月11日 14:57:10adminsetgithub: 54922
2012年02月29日 09:51:34ezio.melottisetstatus: open -> closed
messages: + msg154608

assignee: docs@python -> ezio.melotti
resolution: fixed
stage: patch review -> resolved
2012年02月29日 09:50:09python-devsetnosy: + python-dev
messages: + msg154607
2012年02月27日 13:28:43eric.araujosetmessages: + msg154479
2012年02月27日 12:24:21ezio.melottisetfiles: + issue10713.diff
versions: - Python 3.1
messages: + msg154470

type: enhancement
stage: needs patch -> patch review
2011年08月22日 06:05:57pooliesetfiles: + 20110822-1604-re-docs.diff

nosy: + poolie
messages: + msg142679

keywords: + patch
2011年05月12日 17:29:03Ron.Ridleysetnosy: + Ron.Ridley
2011年05月08日 14:27:09ralph.corderoysetmessages: + msg135524
2011年05月07日 15:10:31ezio.melottisetnosy: + ezio.melotti
2011年05月07日 15:09:45eric.araujosetversions: + Python 3.1, Python 2.7, Python 3.2, Python 3.3
nosy: + eric.araujo

messages: + msg135466

keywords: + easy
stage: needs patch
2010年12月16日 01:05:33ralph.corderoycreate

AltStyle によって変換されたページ (->オリジナル) /