homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unicode-width dependent optimization leads to non-portable pyc file
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: Arfrever, amaury.forgeotdarc, arigo, barry, ezio.melotti, lemburg, pitrou, python-dev, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords: needs review, patch

Created on 2009年01月25日 16:16 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue5057.diff ezio.melotti, 2011年04月14日 11:32 Proof of concept against py3k. review
issue5057-2.diff ezio.melotti, 2011年04月14日 12:56 review
issue5057-3.diff ezio.melotti, 2012年05月14日 12:32 Patch against 3.2.
Messages (25)
msg80514 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年01月25日 16:16
The peephole optimizer can optimize indexed access to an unicode
constant which does not give the same result in UCS-2 and UCS-4 builds.
As a result, the pyc file is not portable across those builds.
This is something I witnessed when recompiling in UCS-2 rather than
UCS-4 mode, and having a strange failure in test_multibytecodec. Erasing
test_multibytecodec.pyc suppressed the failure.
Here is a small demonstration of the problem:
>>> def f():
... return '\U00012345'[0]
... 
>>> import dis
>>> dis.dis(f)
 2 0 LOAD_CONST 3 ('\ud808') 
 3 RETURN_VALUE 
For reference, here is the error I had in test_multibytecodec:
======================================================================
FAIL: test_gb18030 (test.test_multibytecodec.Test_StreamWriter)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home/antoine/py3k/__svn__/Lib/test/test_multibytecodec.py",
line 185, in test_gb18030
 self.assertEqual(s.getvalue(), b'123\x907\x959')
AssertionError: b'123\x907\x959\x907\x959' != b'123\x907\x959'
----------------------------------------------------------------------
msg80719 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009年01月28日 17:46
Is this related to issue3297 ?
msg80720 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年01月28日 18:33
I don't think so. Issue 3297 seems related to the way unicode objects
are marshalled/unmarshalled, even if the build settings don't change.
msg95367 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009年11月17日 08:23
I have the same failure on trunk (narrow build).
msg133724 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月14日 11:32
The attached patch skips the peepholer optimizations for BINARY_SUBSCR if the resulting char is a surrogate on narrow builds or a non-bmp char in wide builds.
Note that this affects the optimization of lone surrogates on narrow builds too, but I think it's not worth to adding more complexity on the peepholer and check if they are part of a surrogate pair.
The patch still lacks comments and could have better tests.
msg133726 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年04月14日 11:48
Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> The attached patch skips the peepholer optimizations for BINARY_SUBSCR if the resulting char is a surrogate on narrow builds or a non-bmp char in wide builds.
> Note that this affects the optimization of lone surrogates on narrow builds too, but I think it's not worth to adding more complexity on the peepholer and check if they are part of a surrogate pair.
> The patch still lacks comments and could have better tests.
 newconst = PyObject_GetItem(v, w);
+ if (PyUnicode_Check(v)) {
+ Py_UNICODE ch = PyUnicode_AS_UNICODE(newconst)[0];
Without checking, you shouldn't assume that newconst is a PyUnicodeObject.
Other than that the patch looks fine.
msg133727 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月14日 11:52
Are there any cases where v[w] -- where v is a unicode object -- returns a non-unicode object?
msg133729 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年04月14日 11:58
Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> Are there any cases where v[w] -- where v is a unicode object -- returns a non-unicode object?
There could be: either from subclasses or from buggy code. In any case,
macros should only be used if you're certain that the object cannot
be anything else.
Also note that newconst can well be NULL.
msg133733 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月14日 12:56
Here's a new patch that checks that newconst is not NULL and that it's a unicode object.
I added a test for the case where it's NULL. I don't think it's possible to test the case when newconst is not unicode though, because unicode subclasses are not literals and don't get optimized in the first place.
msg133734 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年04月14日 13:02
> ... which does not give the same result in UCS-2 and UCS-4 builds.
> As a result, the pyc file is not portable across those builds.
Since Python 3.2, the pyc filename contains a tag (u) to indicate wide build (sys.maxunicode==0x10FFFF), instead of narrow (sys.maxunicode==0xFFFF).
I think we can keep the optimizer for Python >= 3.2. I suppose that Python 3.1 has the bug.
msg133738 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年04月14日 13:15
Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> Here's a new patch that checks that newconst is not NULL and that it's a unicode object.
> I added a test for the case where it's NULL. I don't think it's possible to test the case when newconst is not unicode though, because unicode subclasses are not literals and don't get optimized in the first place.
Thank you.
msg133781 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月15日 04:33
PEP 3147 says[0]:
"""
For backward compatibility, Python will still support pyc-only distributions, however it will only do so when the pyc file lives in the directory where the py file would have been, i.e. not in the __pycache__ directory. pyc file outside of __pycache__ will only be imported if the py source file is missing.
"""
Does that mean that there could be cases where untagged pyc files are used in 3.2+?
In that case the patch should be ported to 3.2 and 3.3 too.
[0]: http://www.python.org/dev/peps/pep-3147/#rationale 
msg133796 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年04月15日 07:58
Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> PEP 3147 says[0]:
> """
> For backward compatibility, Python will still support pyc-only distributions, however it will only do so when the pyc file lives in the directory where the py file would have been, i.e. not in the __pycache__ directory. pyc file outside of __pycache__ will only be imported if the py source file is missing.
> """
> 
> Does that mean that there could be cases where untagged pyc files are used in 3.2+?
Yes... even though we did discuss using the same tagging support
in that scenario as well, at least for 3.3.
> In that case the patch should be ported to 3.2 and 3.3 too.
> 
> [0]: http://www.python.org/dev/peps/pep-3147/#rationale 
msg133810 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月15日 11:53
Do you think this should go in 3.1 too?
msg133811 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年04月15日 12:00
Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> Do you think this should go in 3.1 too?
If the problem triggers there as well: Yes.
Is the problem also visible on Python 2.7 ?
msg133812 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年04月15日 12:02
Yes. The original report was for 2.6.
I will apply the patch on all the 4 branches then.
msg133824 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年04月15日 13:53
New changeset 3cffa2009a92 by Ezio Melotti in branch '2.7':
Issue #5057: fix a bug in the peepholer that led to non-portable pyc files between narrow and wide builds while optimizing BINARY_SUBSCR on non-BMP chars (e.g. u"\U00012345"[0]).
http://hg.python.org/cpython/rev/3cffa2009a92
New changeset 4679d0fef389 by Ezio Melotti in branch '3.1':
Issue #5057: fix a bug in the peepholer that led to non-portable pyc files between narrow and wide builds while optimizing BINARY_SUBSCR on non-BMP chars (e.g. "\U00012345"[0]).
http://hg.python.org/cpython/rev/4679d0fef389
New changeset 503578ddf286 by Ezio Melotti in branch '3.2':
#5057: Merge with 3.1.
http://hg.python.org/cpython/rev/503578ddf286
New changeset 9801e1f78264 by Ezio Melotti in branch 'default':
#5057: Merge with 3.2.
http://hg.python.org/cpython/rev/9801e1f78264 
msg159179 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2012年04月24日 18:03
Sorry to re-open this issue. The following example shows that it was not fully resolved:
 def f():
 return u'\U00023456abcdef'[3]
 import dis; dis.dis(f)
 print f()
On a wide build it should print 'c' and on a narrow build it should print 'b'. But if the .pyc file was created on the other platform, it behaves like the other platform would.
msg160612 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年05月14日 12:20
This whole issue doesn't affect 3.3.
For 2.7/3.2 there are three possible options:
 1) remove constant folding altogether on unicode (this is the solution adopted by PyPy);
 2) scan the string up to the index looking for non-BMP chars and disable the constant folding if they are found (probably not very efficient);
 3) leave the "buggy" code there (might lead to obscure failures in remote cases);
Any opinions?
msg160613 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年05月14日 12:32
Attached a patch that implements option 1).
msg160614 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年05月14日 12:39
Option 2) would have my preference.
msg160622 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2012年05月14日 13:08
Did anyone ever show that this particular detail, which looks like a completely obscure case to me, has any measurable effect on any code whatsoever? Just coming up with numbers, but I'm sure it gives you 5% on the most specially tuned micro-benchmark, and nothing at all in all other cases.
Just saying my vote goes for option 1, but I won't argue if people feel that it's a good investment of their time :-)
msg170561 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年09月16日 17:14
I prefer option (1), remove the buggy optimization. Python 3.3 does solve correctly this issue.
msg174833 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月04日 21:40
issue5057-3.diff LGTM.
I added debug output in peepholer, ran tests and found that this optimization happened for unicode strings only in test_multibytecodec (where it used deliberately) and test_peepholer. Seems as this is very rare case.
msg174835 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年11月04日 22:15
New changeset 9481e062fe26 by Ezio Melotti in branch '2.7':
#5057: the peepholer no longer optimizes subscription on unicode literals (e.g. u"foo"[0]) in order to produce compatible pyc files between narrow and wide builds.
http://hg.python.org/cpython/rev/9481e062fe26
New changeset 56bc323288d1 by Ezio Melotti in branch '3.2':
#5057: the peepholer no longer optimizes subscription on unicode literals (e.g. u"foo"[0]) in order to produce compatible pyc files between narrow and wide builds.
http://hg.python.org/cpython/rev/56bc323288d1
New changeset 3b4f2f9272b4 by Ezio Melotti in branch '3.3':
#5057: null merge with 3.2 (only add tests).
http://hg.python.org/cpython/rev/3b4f2f9272b4
New changeset 0790c16bb275 by Ezio Melotti in branch 'default':
#5057: null merge with 3.3 (only add tests).
http://hg.python.org/cpython/rev/0790c16bb275 
History
Date User Action Args
2022年04月11日 14:56:44adminsetgithub: 49307
2012年11月04日 22:16:20ezio.melottisetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2012年11月04日 22:15:07python-devsetmessages: + msg174835
2012年11月04日 21:40:09serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg174833
2012年09月16日 19:32:13Arfreversetnosy: + Arfrever
2012年09月16日 17:14:15vstinnersetmessages: + msg170561
2012年05月14日 13:08:42arigosetmessages: + msg160622
2012年05月14日 12:39:56pitrousetmessages: + msg160614
2012年05月14日 12:32:02ezio.melottisetfiles: + issue5057-3.diff

messages: + msg160613
stage: needs patch -> commit review
2012年05月14日 12:20:07ezio.melottisetmessages: + msg160612
versions: - Python 3.1, Python 3.3
2012年04月24日 18:03:12arigosetstatus: closed -> open

nosy: + arigo
messages: + msg159179

resolution: fixed -> (no value)
stage: resolved -> needs patch
2011年04月15日 13:54:25ezio.melottisetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2011年04月15日 13:53:02python-devsetnosy: + python-dev
messages: + msg133824
2011年04月15日 12:02:24ezio.melottisetmessages: + msg133812
2011年04月15日 12:00:48lemburgsetmessages: + msg133811
title: Unicode-width dependent optimization leads to non-portable pyc file -> Unicode-width dependent optimization leads to non-portable pyc file
2011年04月15日 11:53:45ezio.melottisetmessages: + msg133810
versions: + Python 3.2, Python 3.3
2011年04月15日 07:58:27lemburgsetmessages: + msg133796
title: Unicode-width dependent optimization leads to non-portable pyc file -> Unicode-width dependent optimization leads to non-portable pyc file
2011年04月15日 04:33:27ezio.melottisetassignee: ezio.melotti

messages: + msg133781
nosy: + barry
2011年04月14日 13:15:55lemburgsetmessages: + msg133738
title: Unicode-width dependent optimization leads to non-portable pyc file -> Unicode-width dependent optimization leads to non-portable pyc file
2011年04月14日 13:02:41vstinnersetnosy: + vstinner

messages: + msg133734
versions: + Python 3.1, - Python 3.2, Python 3.3
2011年04月14日 12:56:36ezio.melottisetfiles: + issue5057-2.diff

messages: + msg133733
2011年04月14日 11:58:15lemburgsetmessages: + msg133729
title: Unicode-width dependent optimization leads to non-portable pyc file -> Unicode-width dependent optimization leads to non-portable pyc file
2011年04月14日 11:57:14ezio.melottisetversions: + Python 3.3, - Python 2.6, Python 3.1
2011年04月14日 11:52:15ezio.melottisetmessages: + msg133727
2011年04月14日 11:48:48lemburgsetnosy: + lemburg
title: Unicode-width dependent optimization leads to non-portable pyc file -> Unicode-width dependent optimization leads to non-portable pyc file
messages: + msg133726
2011年04月14日 11:32:05ezio.melottisetkeywords: + needs review, patch
files: + issue5057.diff
messages: + msg133724

stage: needs patch -> patch review
2009年11月17日 08:23:40ezio.melottisetpriority: normal

stage: needs patch
messages: + msg95367
versions: + Python 3.2, - Python 3.0
2009年01月28日 18:33:20pitrousetmessages: + msg80720
2009年01月28日 17:46:14amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg80719
2009年01月26日 11:17:53ezio.melottisetnosy: + ezio.melotti
2009年01月25日 16:21:33pitrousetversions: + Python 2.6, Python 3.0, Python 2.7
2009年01月25日 16:16:11pitroucreate

AltStyle によって変換されたページ (->オリジナル) /