homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.groupindex is available for modification and continues to work, having incorrect data inside it
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Abel.Farias, eric.araujo, eric.snow, ezio.melotti, georg.brandl, gvanrossum, mrabarnett, py.user, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: needs review, patch

Created on 2012年03月12日 05:28 by py.user, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re_groupindex_copy.patch serhiy.storchaka, 2014年11月01日 15:27 review
re_groupindex_proxy.patch serhiy.storchaka, 2014年11月01日 15:27 review
Messages (18)
msg155442 - (view) Author: py.user (py.user) * Date: 2012年03月12日 05:28
>>> import re
>>> p = re.compile(r'abc(?P<n>def)')
>>> p.sub(r'\g<n>', 'abcdef123abcdef')
'def123def'
>>> p.groupindex['n'] = 2
>>> p.sub(r'\g<n>', 'abcdef123abcdef')
'def123def'
>>> p.groupindex
{'n': 2}
>>>
msg155459 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012年03月12日 18:16
The re module creates the dict purely for the benefit of the user, and as it's a normal dict, it's mutable.
An alternative would to use an immutable dict or dict-like object, but Python doesn't have such a class, and it's probably not worth writing one just for this use-case.
msg155484 - (view) Author: py.user (py.user) * Date: 2012年03月12日 21:37
Matthew Barnett wrote:
> The re module creates the dict purely for the benefit of the user
this dict affects on regex.sub()
>>> import re
>>> p = re.compile(r'abc(?P<n>def)')
>>> p.groupindex
{'n': 1}
>>> p.groupindex['n'] = 2
>>> p.sub(r'\g<n>', 'abcdef')
Traceback (most recent call last):
 File "/usr/local/lib/python3.2/sre_parse.py", line 811, in expand_template
 literals[index] = s = g(group)
IndexError: no such group
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/local/lib/python3.2/re.py", line 286, in filter
 return sre_parse.expand_template(template, match)
 File "/usr/local/lib/python3.2/sre_parse.py", line 815, in expand_template
 raise error("invalid group reference")
sre_constants.error: invalid group reference
>>>
msg155546 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012年03月13日 00:52
It appears I was wrong. :-(
The simplest solution in that case is for it to return a _copy_ of the dict.
msg155560 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年03月13日 02:34
But regex.sub is affected only if you manually muck with the dict, right? If so, then it looks like a case of "it hurts when I do this" (the doctor’s reply: "Don’t do this.")
msg155570 - (view) Author: py.user (py.user) * Date: 2012年03月13日 05:02
the first message shows how it can work with a broken dict
Éric Araujo wrote:
> But regex.sub is affected only if you manually muck with the dict, right?
I can get code from anywhere
msg155593 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年03月13日 11:54
> I can get code from anywhere
I am afraid I don’t understand. Could you start again and explain what bug you ran into, i.e. what behavior does not match what the docs say? At present this report looks like it is saying "when I put random things in an internal data structures then bad things happen", and I don‘t think Python promises to not break when people do random editions to internal data structures.
msg155702 - (view) Author: py.user (py.user) * Date: 2012年03月14日 00:56
I take someone's code
make tests for its behavior
all tests say "the code is working"
I continue to write the code
make tests for its behavior
all tests say "the code is working"
I install it somewhere and it crashes
now it is depending on the cache, when this exception is raised
Éric Araujo wrote:
>and I don‘t think Python promises to not break when people do random editions
when people do something wrong, python should raise an exception
msg155734 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年03月14日 07:26
Looks like a case for a read-only dict/dictproxy :)
msg175494 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月13日 10:14
I fully agree with Éric. Just don't do this.
msg175497 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年11月13日 11:06
I'm not so sure. If dicts or classes are used for configuration
or informational purposes, I prefer them to be locked down.
An example of the first is the decimal context, where it was possible
to write context.emax = 9 instead of context.Emax = 9 without getting
an error. This is an easy mistake to make and can be hard to track
down in a large program.
The mistake here is maybe less likely, but I agree with Georg that
it's a case for a read-only dict/dictproxy.
msg175502 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年11月13日 16:01
I propose using a MappingProxy type in 3.4 and add an example to the docs for stable versions.
msg175505 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月13日 16:48
Copy or proxy may affect performance. We will need to make benchmarks to see how much.
msg230450 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年11月01日 15:27
Here are two patches which implement two alternative solutions. They are based on regex code.
Dict copying patch matches current regex behavior and needs modifying other code to avoid small slowdown. Artificial example:
$ ./python -m timeit -s 'import re; n = 100; m = re.match("".join("(?P<g%d>.)" % g for g in range(n)), "x" * n); t = ",".join(r"\g<g%d>" % g for g in range(n))' -- 'm.expand(t)'
Without patch: 7.48 msec per loop
With re_groupindex_copy.patch but without modifying _expand: 9.61 msec per loop
With re_groupindex_copy.patch and with modifying _expand: 7.41 msec per loop
While stdlib code can be modified, this patch can cause small slowdown of some third-party code.
Dict proxying patch has no performance effect, but it is slightly less compatible. Some code can accept dict but not dict-like object.
msg234878 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年01月28日 09:23
Ping.
msg239041 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月23日 15:27
What approach looks better, a copy or a read-only proxy?
msg239094 - (view) Author: py.user (py.user) * Date: 2015年03月24日 07:35
@Serhiy Storchaka
> What approach looks better, a copy or a read-only proxy?
ISTM, your proxy patch is better, because it expects an exception rather than silence.
msg239526 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年03月29日 22:02
New changeset 4d5826fa77a1 by Serhiy Storchaka in branch 'default':
Issue #14260: The groupindex attribute of regular expression pattern object
https://hg.python.org/cpython/rev/4d5826fa77a1 
History
Date User Action Args
2022年04月11日 14:57:27adminsetgithub: 58468
2015年03月30日 07:08:36serhiy.storchakasetstatus: open -> closed
assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2015年03月29日 22:02:50python-devsetnosy: + python-dev
messages: + msg239526
2015年03月24日 07:35:37py.usersetmessages: + msg239094
2015年03月23日 15:27:01serhiy.storchakasetmessages: + msg239041
2015年01月28日 09:23:56serhiy.storchakasetkeywords: + needs review

messages: + msg234878
2014年11月05日 20:17:53serhiy.storchakasetstage: needs patch -> patch review
versions: + Python 3.5, - Python 2.7, Python 3.2, Python 3.3, Python 3.4
2014年11月01日 15:27:02serhiy.storchakasetfiles: + re_groupindex_copy.patch, re_groupindex_proxy.patch
keywords: + patch
messages: + msg230450
2014年10月14日 15:13:45skrahsetnosy: - skrah
2012年11月13日 22:03:11py.usersettitle: re.groupindex available for modification and continues to work, having incorrect data inside it -> re.groupindex is available for modification and continues to work, having incorrect data inside it
2012年11月13日 16:48:56serhiy.storchakasetmessages: + msg175505
2012年11月13日 16:01:30eric.araujosetresolution: not a bug -> (no value)
stage: needs patch
messages: + msg175502
versions: + Python 2.7, Python 3.3, Python 3.4
2012年11月13日 11:18:05skrahsettitle: re.grupindex available for modification and continues to work, having incorrect data inside it -> re.groupindex available for modification and continues to work, having incorrect data inside it
2012年11月13日 11:12:13Abel.Fariassetnosy: + Abel.Farias
2012年11月13日 11:11:28Abel.Fariassettitle: re.groupindex available for modification and continues to work, having incorrect data inside it -> re.grupindex available for modification and continues to work, having incorrect data inside it
2012年11月13日 11:06:20skrahsetstatus: pending -> open
nosy: + skrah
messages: + msg175497

2012年11月13日 10:14:55serhiy.storchakasetstatus: open -> pending

nosy: + serhiy.storchaka
messages: + msg175494

resolution: not a bug
2012年11月13日 03:04:49eric.snowsetnosy: + eric.snow
2012年03月14日 12:15:04vstinnersetnosy: + gvanrossum
2012年03月14日 07:26:21georg.brandlsetnosy: + georg.brandl, vstinner
messages: + msg155734
2012年03月14日 00:56:37py.usersetmessages: + msg155702
2012年03月13日 11:54:14eric.araujosetmessages: + msg155593
2012年03月13日 05:02:23py.usersetmessages: + msg155570
2012年03月13日 02:34:27eric.araujosetnosy: + eric.araujo
messages: + msg155560
2012年03月13日 00:52:18mrabarnettsetmessages: + msg155546
2012年03月12日 21:37:18py.usersetmessages: + msg155484
2012年03月12日 18:16:40mrabarnettsetmessages: + msg155459
2012年03月12日 05:35:27eric.smithsettitle: regex.groupindex available for modification and continues to work, having incorrect data inside it -> re.groupindex available for modification and continues to work, having incorrect data inside it
2012年03月12日 05:28:22py.usercreate

AltStyle によって変換されたページ (->オリジナル) /