This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年01月30日 23:53 by rhettinger, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| sre.patch | Claudiu.Popa, 2013年08月20日 13:23 | review | ||
| sre_repr2.patch | Claudiu.Popa, 2013年09月13日 16:53 | review | ||
| sre_repr3.patch | Claudiu.Popa, 2013年09月13日 17:33 | review | ||
| sre_repr4.patch | Claudiu.Popa, 2013年10月16日 11:28 | review | ||
| sre_repr5.patch | Claudiu.Popa, 2013年10月16日 14:26 | review | ||
| sre_match_repr.patch | serhiy.storchaka, 2013年10月17日 19:57 | review | ||
| sre_repr6.patch | Claudiu.Popa, 2013年10月19日 06:57 | review | ||
| Messages (24) | |||
|---|---|---|---|
| msg180999 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2013年01月30日 23:53 | |
Experience teaching Python has shown that people have a hard time learning to work with match objects. A contributing cause is the opaque repr: >>> import re >>> s = 'On 3/14/2013, Python celebrate Pi day.' >>> mo = re.search(r'\d+/\d+/\d+', s) >>> mo <_sre.SRE_Match object at 0x100456100> They could explore the match object with dir() and help() and the matchobject methods and attributes: >>> dir(mo) ['__class__', '__copy__', '__deepcopy__', ... 'end', 'endpos', 'expand', 'group', ... ] >>> mo.start() 3 >>> mo.end() 12 >>> mo.group(0) '3/14/2013' However, this gets old when experimenting with alternative regular expressions. A better solution is to improve the repr: >>> re.search(r'\d+/\d+/\d+', s) <SRE Match object: start=3, stop=12, group(0)='3/14/2013'> This would make the regular expression module much easier to work with. |
|||
| msg181001 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2013年01月31日 00:07 | |
Showing start and stop would be OK, but there might be many groups and they might contain lot of text, so they can't simply be included in the repr as they are. FWIW there was another issue about changing _sre.SRE_Match to something better, but I can't find it right now. |
|||
| msg181002 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2013年01月31日 00:59 | |
Is this a duplicate of issue 13592? |
|||
| msg181004 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2013年01月31日 02:05 | |
Just showing group(0) should be helpful. And perhaps the number of groups. If a string is really long, we can truncate it like reprlib does. The main goal is to make it easier to work with match objects at the interactive prompt. They are currently too opaque. |
|||
| msg181043 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2013年01月31日 20:57 | |
#13592 is indeed the issue I was thinking about, but apparently that's about _sre.SRE_Pattern, so it's not the same thing. > Just showing group(0) should be helpful. Often the interesting group is group(1), so showing only group(0) seems a bit arbitrary. > And perhaps the number of groups. If we show only group(0), this might be useful as an indication that there are(n't) other groups. > If a string is really long, we can truncate it like reprlib does. That's certainly an option. FWIW I don't usually care about the start/end, and, if included, these values could be included as span=(3,12). |
|||
| msg195687 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年08月20日 13:23 | |
Here's my patch attempt. The repr of a match object has the following format: (groups=\d+, span=(start, end), group0=the entire group or the first X characters, where X is represented by a new constant in sre_constants.h, SRE_MATCH_REPR_SIZE). |
|||
| msg197546 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年09月13日 04:16 | |
What about such output?
>>> re.search('p((a)|(b))(c)?', 'unpack')
<SRE Match object: [2: 5]: 'p'(('a')())('c')>
Or may be ('p', [['a'], []], ['c']) if you prefer legal Python expression.
|
|||
| msg197579 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年09月13日 14:15 | |
Serhiy, at the first glance, that repr doesn't make sense to me, thus it seems a little difficult to comprehend. |
|||
| msg197609 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年09月13日 16:23 | |
Well, then first will commit a simpler patch. I left comments on Rietveld. |
|||
| msg197610 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年09月13日 16:53 | |
Here's the new version. I added a few replies on the Rietveld. |
|||
| msg197618 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年09月13日 17:33 | |
Added the new version. |
|||
| msg200042 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年10月16日 09:02 | |
Serhiy, are there any left issues with my latest patch? It would be nice if we could get this into 3.4. |
|||
| msg200053 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年10月16日 11:28 | |
Added the new patch, which addresses Serhiy's comments. Also, this approach fails when bytes are involved: >>> import re >>> re.search(b"a", b"a") Assertion failed: (PyUnicode_Check(op)), function _PyUnicode_CheckConsistency, file Objects/unicodeobject.c, line 309. Should a check be added for this also? |
|||
| msg200055 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月16日 12:26 | |
Use correct first argument to getslice(). |
|||
| msg200059 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年10月16日 14:26 | |
Latest patch attached. |
|||
| msg200065 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月16日 17:45 | |
It is too complicated (and perhaps erroneous). Why not use just self->pattern->logical_charsize? |
|||
| msg200111 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年10月17日 06:19 | |
I could use self->pattern->logical_size, but it seems that I still need the call to getstring for bytes & co, to obtain the view to the underlying buffer (otherwise the group0 part from the repr will contain random bytes). I didn't find a simpler way to achieve this. |
|||
| msg200157 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月17日 19:57 | |
Well. Here is a patch. I have changed repr() a little. repr() now contains match type qualified name (_sre.SRE_Match). "groups" now equals len(m.groups()). "span" representation now contains a comma (as repr(m.span())). Raymond, Ezio, is it good to you? |
|||
| msg200356 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2013年10月19日 01:40 | |
I discussed this briefly with Serhiy on IRC and I think the repr can be improved.
Currently it looks like:
>>> re.compile(r'[/\\]([.]svn)').match('/.svn')
<_sre.SRE_Match object: groups=1, span=(0, 5), group0='/.svn'>
One problem is that the group count doesn't include group 0, so from the example repr one would expect that the info are about the 1 (and only) group in "groups=", whereas that is actually group 0 and there's an additional group 1 that is not included in the repr.
A possible solution is to separate the group count from the info about group 0:
<_sre.SRE_Match object (1 group); group0='/.svn', span=(0, 5)>
To make things even less confusing we could avoid calling it group0 and use something like "match=", or alternatively remove the group count (doesn't the count depend only on the regex, and not on the string?).
|
|||
| msg200378 - (view) | Author: PCManticore (Claudiu.Popa) * (Python triager) | Date: 2013年10月19日 06:57 | |
Added patch based on Serhiy's, which addresses your comments. It drops the group count and renames group0 to `match`. |
|||
| msg200549 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月20日 07:33 | |
LGTM (except unrelated empty line at the end of Modules/_sre.c). |
|||
| msg200558 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年10月20日 10:13 | |
New changeset 29764a7bd6ba by Serhiy Storchaka in branch 'default': Issue #17087: Improved the repr for regular expression match objects. http://hg.python.org/cpython/rev/29764a7bd6ba |
|||
| msg200559 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月20日 10:16 | |
Thanks all participants for the discussion. |
|||
| msg204412 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年11月25日 21:20 | |
New changeset 4ba7a29fe02c by Ezio Melotti in branch 'default': #13592, #17087: add whatsnew entry about regex/match object repr improvements. http://hg.python.org/cpython/rev/4ba7a29fe02c |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:41 | admin | set | github: 61289 |
| 2013年11月25日 21:20:38 | python-dev | set | messages: + msg204412 |
| 2013年10月20日 10:16:20 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg200559 stage: patch review -> resolved |
| 2013年10月20日 10:13:52 | python-dev | set | nosy:
+ python-dev messages: + msg200558 |
| 2013年10月20日 07:33:33 | serhiy.storchaka | set | messages: + msg200549 |
| 2013年10月19日 06:57:08 | Claudiu.Popa | set | files:
+ sre_repr6.patch messages: + msg200378 |
| 2013年10月19日 01:40:32 | ezio.melotti | set | messages: + msg200356 |
| 2013年10月17日 19:57:36 | serhiy.storchaka | set | files:
+ sre_match_repr.patch assignee: serhiy.storchaka messages: + msg200157 |
| 2013年10月17日 06:19:16 | Claudiu.Popa | set | messages: + msg200111 |
| 2013年10月16日 17:45:55 | serhiy.storchaka | set | messages: + msg200065 |
| 2013年10月16日 14:26:12 | Claudiu.Popa | set | files:
+ sre_repr5.patch messages: + msg200059 |
| 2013年10月16日 12:26:45 | serhiy.storchaka | set | messages: + msg200055 |
| 2013年10月16日 11:28:34 | Claudiu.Popa | set | files:
+ sre_repr4.patch messages: + msg200053 |
| 2013年10月16日 09:02:33 | Claudiu.Popa | set | messages: + msg200042 |
| 2013年09月13日 17:33:25 | Claudiu.Popa | set | files:
+ sre_repr3.patch messages: + msg197618 |
| 2013年09月13日 16:53:36 | Claudiu.Popa | set | files:
+ sre_repr2.patch messages: + msg197610 |
| 2013年09月13日 16:23:14 | serhiy.storchaka | set | messages: + msg197609 |
| 2013年09月13日 14:15:36 | Claudiu.Popa | set | messages: + msg197579 |
| 2013年09月13日 04:16:39 | serhiy.storchaka | set | messages: + msg197546 |
| 2013年08月20日 14:08:37 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka stage: needs patch -> patch review |
| 2013年08月20日 13:23:35 | Claudiu.Popa | set | files:
+ sre.patch nosy: + Claudiu.Popa messages: + msg195687 keywords: + patch |
| 2013年01月31日 20:57:32 | ezio.melotti | set | messages: + msg181043 |
| 2013年01月31日 02:05:11 | rhettinger | set | messages: + msg181004 |
| 2013年01月31日 00:59:43 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages: + msg181002 |
| 2013年01月31日 00:07:35 | ezio.melotti | set | nosy:
+ ezio.melotti, mrabarnett messages: + msg181001 components: + Regular Expressions stage: needs patch |
| 2013年01月30日 23:53:02 | rhettinger | create | |