This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016年11月21日 15:57 by vstinner, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| sre_types.patch | vstinner, 2016年11月21日 15:57 | review | ||
| sre-concrete.patch | serhiy.storchaka, 2016年11月23日 08:09 | review | ||
| sre-concrete-2.patch | serhiy.storchaka, 2016年11月23日 13:03 | review | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 1009 | merged | serhiy.storchaka, 2017年04月05日 18:35 | |
| Messages (21) | |||
|---|---|---|---|
| msg281373 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月21日 15:57 | |
Attached patch makes _sre.compile() more strict on types: groupindex must be a dictionary and indexgroup must be a tuple. Currently, indexgroup is passed as a list. I chose to convert it to a tuple to use less memory and to prevent unwanted changes. For unwanted changed, I'm not sure because groupindex remains a mutable dictionary. Do you think that it's worth it to require a tuple? Another option is to accept a list but to convert it to a list, but this change is more specific to the current implementation. |
|||
| msg281379 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年11月21日 16:35 | |
If make groupindex and indexgroup a dict and a tuple, we could use concrete dict and tuple API instead of generic mapping and sequence APIs. Note that groups, groupindex and indexgroup are not independent. Actually indexgroup is derived from groups and groupindex (in Python code), but groups and groupindex could be derived from indexgroup. groups could be derived from code. The interface of _sre.compile() can be simplified by passing only groupindex or indexgroup. Current interface is only for backward compatibility. I don't know whether this still is needed. |
|||
| msg281524 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年11月22日 22:20 | |
New changeset 1addc5d2c246 by Victor Stinner in branch 'default': Issue #28765: _sre.compile() now checks the type of groupindex and indexgroup https://hg.python.org/cpython/rev/1addc5d2c246 |
|||
| msg281525 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月22日 22:21 | |
sre_compile_remove_groups.patch removes the groups parameter from _sre.compile(). A first step to simplify the API. I prefer to keep most of the code in pure Python, to have code easier to maintain. So I prefer to not accept only groupindex. I prefer to build both (indexgroup, groupindex) in pure Python and pass them to the C code. I pushed my patch sre_types.patch. |
|||
| msg281542 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年11月23日 08:09 | |
Following patch replaces abstract types API with concrete types API and makes the memory consumption of the pattern object smaller if there are no named groups.
$ ./python -m perf timeit -s "import re; m = re.match('(?P<first>first) (?P<second>second)', 'first second')" -- "m.lastgroup"
Unpatched: Median +- std dev: 89.8 ns +- 1.8 ns
Patched: Median +- std dev: 80.5 ns +- 3.3 ns
$ ./python -m perf timeit -s "import re; m = re.match('(?P<first>first) (?P<second>second)', 'first second')" -- "m.groupdict()"
Unpatched: Median +- std dev: 803 ns +- 16 ns
Patched: Median +- std dev: 711 ns +- 16 ns
$ ./python -m perf timeit -s "import re; m = re.match('(?P<first>first) (?P<second>second)', 'first second')" -- "m['first']"
Unpatched: Median +- std dev: 228 ns +- 14 ns
Patched: Median +- std dev: 217 ns +- 11 ns
|
|||
| msg281547 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月23日 09:21 | |
sre-concrete.patch is wrong: if groupindex is a subtype of dict, you should use PyObject_GetItem. I like the idea of using PyDict_GetItem, you should just implement stricter checks in _sre.compile(). Maybe using PyDict_CheckExact? Since it's a private module, we don't have to support all types. It's ok to implement further optimizations. |
|||
| msg281548 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年11月23日 09:40 | |
1addc5d2c246 broke support of non-dict mappings, this patch brakes support of dict subclasses with overridden __getitem__. I think we can ignore this as well. |
|||
| msg281549 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月23日 09:47 | |
Sorry, my comment is unclear: I'm only asking your to add PyDict_CheckExact check in _sre.compile() for sre-concrete.patch. |
|||
| msg281550 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年11月23日 10:05 | |
Should we check that code is exact list? |
|||
| msg281551 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月23日 10:23 | |
> Should we check that code is exact list? Do you mean a tuple for indexgroup? Yeah, it also seems worth it. I just checked: --- class Tuple(tuple): def __getitem__(self, index): return 9 a = Tuple([1, 2, 3]) print(a[0]) --- Output: --- 9 --- |
|||
| msg281553 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年11月23日 10:35 | |
It is too. But I meant the code parameter. |
|||
| msg281559 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年11月23日 13:03 | |
Updated patch restricts _sre.compile() to accepting only exact collection types. I don't like this patch. |
|||
| msg281561 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月23日 13:15 | |
> It is too. But I meant the code parameter. Oh ok. It seems like we use PyList_xxx() functions, so yeah, it seems safer (to respect the Python semantics) to only accept exactly the list type. > Updated patch restricts _sre.compile() to accepting only exact collection types. I don't like this patch. Why not? Do you prefer to accept subtypes? sre-concrete-2.patch LGTM, I just added a minor comment. |
|||
| msg281564 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年11月23日 13:33 | |
I don't like it because it adds a cumbersome code for guarding against a case that doesn't happen in practice. This is just unpythonic. _sre.compile() is a private function and it is called only with exact types. Even if it would called with subtypes, most subtypes don't override __len__ and __getitem__. This restriction is too strong. |
|||
| msg281588 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月23日 22:03 | |
> I don't like it because it adds a cumbersome code for guarding against a case that doesn't happen in practice. Hum, sorry, I don't understand what is the issue with adding a few addition checks in the constructor? Is it a matter of speed? > This is just unpythonic. Wait, what? I asked you to add checks to not break the Python semantics. If you want to be "Pythonic": you must support the Python semantics, so support when __getitem__, __len__, etc. are replaced. I don't understand your "unpythonic" argument. > _sre.compile() is a private function and it is called only with exact types. Right. So what is the problem with adding more sanity checks? > Even if it would called with subtypes, most subtypes don't override __len__ and __getitem__. This restriction is too strong. It's a matter of respecting the Python semantics. |
|||
| msg281589 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年11月23日 22:04 | |
An optimization must not change the behaviour of Python. |
|||
| msg282433 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年12月05日 16:43 | |
sre-concrete-2.patch LGTM, so do you want to push it Serhiy? |
|||
| msg283441 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年12月16日 21:30 | |
> Hum, sorry, I don't understand what is the issue with adding a few addition checks in the constructor? Is it a matter of speed? It is a matter of readability and maintainability. Additional checks adds 21 lines of the code, and they hardcode error messages. I copied currently produced error messages, with hardcoded function name and argument numbers. Changes in PyArg* functions or Argument Clinic that could globally enhance error messages would not affect these ones. Changing the function name or the arguments order would need to change error messages. > I don't understand your "unpythonic" argument. Sorry, I used bad wording. I meant that this doesn't follow the practice of CPython code. Builtins and extensions often ignore subclassing and use concrete API for simplicity and speed. Especially in internal functions. The interface of _sre.compile() already is made less general, it no longer works with general sequences and mappings. Ignoring subclassing don't make it much worse. |
|||
| msg291191 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年04月05日 18:36 | |
The patch makes some operations slightly faster.
$ ./python -m perf timeit -s 'import re; m = re.match(r"(?P<int>\d+)(?:\.(?P<frac>\d*))?", "12.34")' -- --duplicate 100 'm.lastgroup'
Unpatched: Median +- std dev: 204 ns +- 1 ns
Patched: Median +- std dev: 167 ns +- 2 ns
$ ./python -m perf timeit -s 'import re; m = re.match(r"(?P<int>\d+)(?:\.(?P<frac>\d*))?", "12.34")' -- --duplicate 100 'm.groupdict()'
Unpatched: Median +- std dev: 2.78 us +- 0.05 us
Patched: Median +- std dev: 1.98 us +- 0.04 us
$ ./python -m perf timeit -s 'import re; m = re.match(r"(?P<int>\d+)(?:\.(?P<frac>\d*))?", "12.34")' -- --duplicate 100 'm.start("frac")'
Unpatched: Median +- std dev: 638 ns +- 22 ns
Patched: Median +- std dev: 576 ns +- 16 ns
$ ./python -m perf timeit -s 'import re; m = re.match(r"(?P<int>\d+)(?:\.(?P<frac>\d*))?", "12.34")' -- --duplicate 100 'm.group("frac")'
Unpatched: Median +- std dev: 1.15 us +- 0.03 us
Patched: Median +- std dev: 1.07 us +- 0.03 us
|
|||
| msg291632 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年04月13日 21:14 | |
Since _sre.compile is not a public API, and is called only with exact list/dict/tuple types in the stdlib, I prefer to ignore possible differences between base classes and subclasses. This makes the code cleaner. Searching on GitHub shows that usages of _sre.compile() in third-party code are not compatible with current _sre.compile() since all arguments now are mandatory and indexgroup must be a tuple, not None. |
|||
| msg291741 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年04月16日 06:39 | |
New changeset cd85d0b90b39310c8ca7329bd35e82c2c1c8f4ad by Serhiy Storchaka in branch 'master': bpo-28765: Use concrete types API in _sre.c. (#1009) https://github.com/python/cpython/commit/cd85d0b90b39310c8ca7329bd35e82c2c1c8f4ad |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:39 | admin | set | github: 72951 |
| 2017年04月16日 07:06:52 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017年04月16日 06:39:32 | serhiy.storchaka | set | messages: + msg291741 |
| 2017年04月13日 21:14:50 | serhiy.storchaka | set | assignee: serhiy.storchaka messages: + msg291632 |
| 2017年04月05日 18:36:50 | serhiy.storchaka | set | messages: + msg291191 |
| 2017年04月05日 18:35:18 | serhiy.storchaka | set | pull_requests: + pull_request1177 |
| 2016年12月16日 21:30:51 | serhiy.storchaka | set | messages: + msg283441 |
| 2016年12月05日 16:57:07 | serhiy.storchaka | set | superseder: _sre.compile(): be more strict on types of indexgroup and groupindex -> |
| 2016年12月05日 16:57:07 | serhiy.storchaka | unlink | issue28765 superseder |
| 2016年12月05日 16:56:58 | serhiy.storchaka | set | superseder: _sre.compile(): be more strict on types of indexgroup and groupindex |
| 2016年12月05日 16:56:58 | serhiy.storchaka | link | issue28765 superseder |
| 2016年12月05日 16:43:43 | vstinner | set | messages: + msg282433 |
| 2016年11月23日 22:04:10 | vstinner | set | messages: + msg281589 |
| 2016年11月23日 22:03:43 | vstinner | set | messages: + msg281588 |
| 2016年11月23日 13:33:40 | serhiy.storchaka | set | messages: + msg281564 |
| 2016年11月23日 13:15:46 | vstinner | set | messages: + msg281561 |
| 2016年11月23日 13:03:45 | serhiy.storchaka | set | files:
+ sre-concrete-2.patch messages: + msg281559 |
| 2016年11月23日 10:35:12 | serhiy.storchaka | set | messages: + msg281553 |
| 2016年11月23日 10:23:19 | vstinner | set | messages: + msg281551 |
| 2016年11月23日 10:05:07 | serhiy.storchaka | set | messages: + msg281550 |
| 2016年11月23日 09:47:37 | vstinner | set | messages: + msg281549 |
| 2016年11月23日 09:40:48 | serhiy.storchaka | set | messages: + msg281548 |
| 2016年11月23日 09:21:43 | vstinner | set | messages: + msg281547 |
| 2016年11月23日 08:09:29 | serhiy.storchaka | set | files:
+ sre-concrete.patch nosy: + ezio.melotti, mrabarnett messages: + msg281542 components: + Regular Expressions stage: patch review |
| 2016年11月22日 22:21:36 | vstinner | set | messages: + msg281525 |
| 2016年11月22日 22:20:35 | python-dev | set | nosy:
+ python-dev messages: + msg281524 |
| 2016年11月21日 16:35:56 | serhiy.storchaka | set | messages: + msg281379 |
| 2016年11月21日 15:57:37 | vstinner | create | |