This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年06月15日 03:47 by rhettinger, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue-24454-0.diff | eric.smith, 2015年12月07日 13:31 | review | ||
| issue-24454-1.diff | eric.smith, 2016年09月10日 03:30 | review | ||
| Messages (19) | |||
|---|---|---|---|
| msg245361 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年06月15日 03:47 | |
The usability, learnability, and readability of match object code would be improved by giving it a more Pythonic API (inspired by ElementTree).
Given a search like:
data = 'Answer found on row 8 column 12.'
mo = re.search(r'row (?P<row>\d+) column (?P<col>\d+)', data)
We currently access results with:
print(mo.group('col'))
print(len(mo.groups())
A better way would look like this:
print(mo['col'])
print(len(mo))
This would work nicely with string formatting:
print('Located coordinate at (%(row)s, %(col)s)' % mo)
print('Located coordinate at ({row}, {col})'.format_map(mo))
|
|||
| msg245362 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年06月15日 04:29 | |
You can use mo.groupdict().
print('Located coordinate at (%(row)s, %(col)s)' % mo.groupdict())
print('Located coordinate at ({row}, {col})'.format_map(mo.groupdict()))
As for len(mo), this is ambiguous, as well as indexing with integer indices. You suggest len(mo) be equal len(mo.groups()) and this looks reasonable to me, but in regex len(mo) equals to len(mo.groups())+1 (because mo[1] equals to mo.group(1) or mo.groups()[0]). If indexing will work only with named groups, it would be expected that len(mo) will be equal to len(mo.groupdict()).
|
|||
| msg245363 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2015年06月15日 04:39 | |
> print(mo['col']) > print(len(mo)) This has already been discussed in another issue. As Serhiy mentioned, len(mo) and mo[num] would be ambiguous because of the group 0, but mo[name] might be ok. |
|||
| msg245375 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2015年06月15日 13:50 | |
I'd definitely be for mo['col']. I can't say I've ever used len(mo.groups()).
I do have lots of code like:
return mo.group('col'), mo.group('row'), mo.group('foo')
Using groupdict there is doable but not great. But:
return mo['col'], mo['row'], mo['foo']
would be a definite improvement.
|
|||
| msg245376 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2015年06月15日 14:14 | |
I agree that it would be nice if len(mo) == len(mo.groups()), but Serhiy has explained why that's not the case in the regex module.
The regex module does support mo[name], so:
print('Located coordinate at (%(row)s, %(col)s)' % mo)
print('Located coordinate at ({row}, {col})'.format_map(mo))
already work.
|
|||
| msg245379 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年06月15日 14:47 | |
The disadvantage of supporting len() is its ambiguousness. Supporting indexing with group name also has disadvantages (benefits already was mentioned above). 1. Indexing with string keys is a part of mapping protocol, and it would be expected that other parts of the protocol if not all are supported (at least len() and iteration), but they are not. 2. If indexing with group names would be supported, it would be expected the support of integer indexes. But this is ambiguous too. This feature would improve the access to named groups (6 characters less to type for every case and better readability), but may be implementing access via attributes would be even better? mo.groupnamespace().col or mo.ns.col? |
|||
| msg245381 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年06月15日 15:36 | |
> but may be implementing access via attributes would > be even better? mo.groupnamespace().col or mo.ns.col? The whole point is to eliminate the unnecessary extra level. Contrast using DOM with using ElementTree. The difference in usability and readability is huge. If you do much in the way of regex work, this will be a win: mo['name'], mo['rank'], mo['serialnumber'] There are several problems with trying to turn this into attribute access. One of the usual ones are the conflict between the user fieldnames and the actual methods and attributes of the objects (that is why named tuples have the irritating leading underscore for its own attributes and methods). The other problem is that it interferes with usability when the fieldname is stored in a variable. Contrast, "fieldname='rank'; print(mo[fieldname])" with "fieldname='rank'; print(getattr(mo, fieldname))". I'm happy to abandon the "len(mo)" suggestion, but mo[groupname] would be really nice. |
|||
| msg256030 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2015年12月06日 20:54 | |
I've been playing around with this. My implementation is basically the naive:
def __getitem__(self, value):
return self.group(value)
I have the following tests passing:
def test_match_getitem(self):
pat = re.compile('(?:(?P<a1>a)|(?P<b2>b))(?P<c3>c)?')
m = pat.match('a')
self.assertEqual(m['a1'], 'a')
self.assertEqual(m['b2'], None)
self.assertEqual(m['c3'], None)
self.assertEqual(m[0], 'a')
self.assertEqual(m[1], 'a')
self.assertEqual(m[2], None)
self.assertEqual(m[3], None)
with self.assertRaises(IndexError):
m['X']
m = pat.match('ac')
self.assertEqual(m['a1'], 'a')
self.assertEqual(m['b2'], None)
self.assertEqual(m['c3'], 'c')
self.assertEqual(m[0], 'ac')
self.assertEqual(m[1], 'a')
self.assertEqual(m[2], None)
self.assertEqual(m[3], 'c')
with self.assertRaises(IndexError):
m['X']
# Cannot assign.
with self.assertRaises(TypeError):
m[0] = 1
# No len().
self.assertRaises(TypeError, len, m)
But because I'm just calling group(), you'll notice a few oddities. Namely:
- indexing with integers works, as would group(i). See the discussion of omitting len().
- for defined but missing group names, you get None instead of KeyError
- for invalid group names, you get IndexError instead of KeyError
I can't decide if these are good (because they're consistent with group()), or bad (because they're surprising. I'm interested in your opinions.
I'll attach the patch when I'm at a better computer.
|
|||
| msg256061 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2015年12月07日 13:31 | |
Here's the patch. I added some more tests, including tests for ''.format_map(). I think the format_map() tests convince me that keeping None for non-matched groups makes sense. |
|||
| msg275557 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2016年09月10日 03:30 | |
Updated patch, with docs. I'd like to get this in to 3.6. Can anyone take a look? |
|||
| msg275785 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年09月11日 12:55 | |
New changeset ac0643314d12 by Eric V. Smith in branch 'default': Issue 24454: Improve the usability of the re match object named group API https://hg.python.org/cpython/rev/ac0643314d12 |
|||
| msg275788 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年09月11日 13:19 | |
Please document this feature in What's News Eric. There is no need to add the __getitem__ method explicitly the the list of methods match_methods. It would be added automatically. |
|||
| msg275789 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2016年09月11日 13:27 | |
I added a note in Misc/NEWS. Is that not what you mean? I'll look at match_methods. |
|||
| msg275790 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年09月11日 13:40 | |
I meant Doc/whatsnew/3.6.rst. |
|||
| msg275792 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年09月11日 13:50 | |
New changeset 3265247e08f0 by Eric V. Smith in branch 'default': Issue 24454: Added whatsnew entry, removed __getitem__ from match_methods. Thanks Serhiy Storchaka. https://hg.python.org/cpython/rev/3265247e08f0 |
|||
| msg275793 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2016年09月11日 13:51 | |
Fixed. Thanks, Serhiy. |
|||
| msg275794 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年09月11日 14:05 | |
./Modules/_sre.c:2425:14: warning: ‘match_getitem_doc’ defined but not used [- Wunused-variable] |
|||
| msg275797 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2016年09月11日 14:18 | |
On 9/11/2016 10:05 AM, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > ./Modules/_sre.c:2425:14: warning: ‘match_getitem_doc’ defined but not used [- > Wunused-variable] Not in Visual Studio! Standby. |
|||
| msg275798 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年09月11日 14:20 | |
New changeset 9eb38e0f1cad by Eric V. Smith in branch 'default': Issue 24454: Removed unused match_getitem_doc. https://hg.python.org/cpython/rev/9eb38e0f1cad |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:18 | admin | set | github: 68642 |
| 2016年10月06日 08:43:48 | THRlWiTi | set | nosy:
+ THRlWiTi |
| 2016年09月11日 14:20:39 | python-dev | set | messages: + msg275798 |
| 2016年09月11日 14:18:16 | eric.smith | set | messages: + msg275797 |
| 2016年09月11日 14:05:13 | serhiy.storchaka | set | messages: + msg275794 |
| 2016年09月11日 13:51:17 | eric.smith | set | messages: + msg275793 |
| 2016年09月11日 13:50:56 | python-dev | set | messages: + msg275792 |
| 2016年09月11日 13:40:33 | serhiy.storchaka | set | messages: + msg275790 |
| 2016年09月11日 13:27:10 | eric.smith | set | messages: + msg275789 |
| 2016年09月11日 13:19:28 | serhiy.storchaka | set | messages: + msg275788 |
| 2016年09月11日 12:59:03 | eric.smith | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2016年09月11日 12:55:49 | python-dev | set | nosy:
+ python-dev messages: + msg275785 |
| 2016年09月10日 03:30:12 | eric.smith | set | files:
+ issue-24454-1.diff messages: + msg275557 |
| 2016年06月27日 06:41:43 | berker.peksag | link | issue19536 superseder |
| 2015年12月07日 13:31:21 | eric.smith | set | files:
+ issue-24454-0.diff messages: + msg256061 assignee: eric.smith keywords: + patch, easy, needs review stage: patch review |
| 2015年12月06日 20:54:29 | eric.smith | set | messages: + msg256030 |
| 2015年06月15日 15:36:54 | rhettinger | set | messages: + msg245381 |
| 2015年06月15日 14:47:43 | serhiy.storchaka | set | messages: + msg245379 |
| 2015年06月15日 14:14:00 | mrabarnett | set | messages: + msg245376 |
| 2015年06月15日 13:50:13 | eric.smith | set | nosy:
+ eric.smith messages: + msg245375 |
| 2015年06月15日 11:20:35 | barry | set | nosy:
+ barry |
| 2015年06月15日 04:39:52 | ezio.melotti | set | messages: + msg245363 |
| 2015年06月15日 04:29:28 | serhiy.storchaka | set | nosy:
+ ezio.melotti, serhiy.storchaka, mrabarnett messages: + msg245362 components: + Regular Expressions |
| 2015年06月15日 03:47:52 | rhettinger | create | |