homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: distutils command build_scripts fails with UnicodeDecodeError
Type: behavior Stage: resolved
Components: Distutils, Distutils2 Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7, 3rd party
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: tarek Nosy List: Arfrever, alexis, benjamin.peterson, eric.araujo, georg.brandl, hagen, lemburg, mgorny, python-dev, tarek, vstinner
Priority: release blocker Keywords: patch

Created on 2010年11月14日 20:32 by hagen, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
surrogateescape.patch hagen, 2010年11月14日 20:32 use surrogateescape for reading and writing script files review
build_scripts-binary_mode.patch Arfrever, 2011年04月28日 14:29 Use binary mode for reading and writing script files
Messages (20)
msg121207 - (view) Author: Hagen Fürstenau (hagen) Date: 2010年11月14日 20:32
As suggested in issue 9561, I'm creating a new bug for the encoding problem in build_scripts: If a script file can't be decoded with the (locale dependent) standard encoding, then "build_scripts" fails with UnicodeDecodeError. Reproducable e.g. with LANG=C and a script file containing non ASCII chars near the beginning (so that they're read on a single readline()).
Attaching a patch that uses "surrogateescape", as proposed for issue 6011.
msg134630 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年04月27日 23:38
I’m not sure how I feel about using surrogateescape. The distutils source is very similar across 2.7, 3.1, 3.2 and default, especially after the Great Revert and freeze last year to restore buggy-but-known behavior while the distutils2 project was created and allowed to fix things and break stuff. Haypo added a fix using surrogateescape in 3.2, so it couldn’t be backported to all stable branches. You may say that at least it was fixed in one version, which is something good. I don’t know if I’d prefer to apply the patch (if a test is provided) or to raise an exception instead of silently changing behavior.
msg134661 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年04月28日 08:20
Éric Araujo wrote:
> 
> Éric Araujo <merwok@netwok.org> added the comment:
> 
> I’m not sure how I feel about using surrogateescape. The distutils source is very similar across 2.7, 3.1, 3.2 and default, especially after the Great Revert and freeze last year to restore buggy-but-known behavior while the distutils2 project was created and allowed to fix things and break stuff. Haypo added a fix using surrogateescape in 3.2, so it couldn’t be backported to all stable branches. You may say that at least it was fixed in one version, which is something good. I don’t know if I’d prefer to apply the patch (if a test is provided) or to raise an exception instead of silently changing behavior.
I think this patch should be applied to all 3.x versions, since
all of them are affected by the same problem: reading a file with
unknown encoding, adding a shebang and writing it back again.
Python shouldn't really care about the script file's encoding and
since the "surrogateescape" error handler is the only way to
more or less cleanly get around the problem, I'm +1 on adding the
patch to the 3.x series.
I don't think this is needed for 2.7, since Python 2.x's open()
doesn't care about the file encoding anyway.
msg134678 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011年04月28日 14:29
Alternatively it's possible to use binary mode. I'm attaching the patch, which shows this possibility.
msg134680 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年04月28日 14:48
Was the patch tested in 2.7 only? I think the first_line_re needs to be changed to bytes too. (3.x would have disallowed mixing bytes and str for a regex.)
msg134681 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011年04月28日 14:52
Which patch do you mean?
(My patch already changes first_line_re to bytes. My patch was tested only with 3.2. Lib/distutils/command/build_scripts.py is currently identical in 3.1, 3.2 and 3.3.)
msg134773 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年04月29日 15:19
Indeed, I missed those two lines.
msg134894 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011年04月30日 23:43
Apparently setuptools.command.easy_install.get_script_header() imports distutils.command.build_scripts.first_line_re and checks if this regex matches a str object, which results in TypeError. If breaking compatibility is not acceptable, then the surrogateescape patch should be applied.
msg134934 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年05月01日 22:21
Hey, I had already this bug and I also wrote a patch: copy_script-2.patch attached to #6011. It is very similar to build_scripts-binary_mode.patch (read the file in binary mode to avoid the encode/decode dance). But it checks also that the path to Python program is decodable from UTF-8 and from the script encoding.
Éric Araujo doesn't want to apply copy_script-2.patch on Python 3 before distutils2 is ported to Python 3 and included into Python (3.3): read msg124648. Five months later: distutils2 is not yet included to Python 3, the patch is not commited yet, and we have now a duplicate issue (and 3 patches for a single bug) :-)
This situation sucks. How can we move forward? What is the status of distutils2? Is it ported to Python3? Is it ready for an inclusion into Python3?
When distutils2 will be part of Python 3.3, should we fix distutils bugs or not? I suppose that few people use Python 3.3, maybe because it will not be released before August 2012 (PEP 398) :-) So users will continue to have this bug until everybody moves to 3.3 (or later)...
I think that we should fix this bug today. I don't really care of distutils2 today because it is not yet part of Python.
msg134936 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年05月01日 22:27
> Apparently setuptools.command.easy_install.get_script_header() imports
> distutils.command.build_scripts.first_line_re and checks if this regex
> matches a str object, which results in TypeError. If breaking
> compatibility is not acceptable, then the surrogateescape patch should
> be applied.
Setuptools is not compatible with 3.x TTBOMK; distribute is, but could
be fixed quickly, so there is no compat problem with this (these)
library(ries). However, the public/private status of first_line_re is
unclear, so there could be other projects out there depending on its
type. Given that there is already one patch in distutils that uses
surrogateescape, I think we could accept another similar patch.
msg134937 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年05月01日 22:35
is not commited yet,
> and we have now a duplicate issue (and 3 patches for a single bug) :-)
Feel free to close duplicate issues.
Looks like you’re not following PyCon reports, or Tarek’s mails to
python-dev. distutils2 has been ported to 3.3 under the name
"packaging"; there is a repo on bitbucket (tarek/cpython) with this
code. Tarek will produce a patch from this repo and push it to the main
repository soon.
Yes: we’ll fix bugs in packaging and distutils. Packaging releases will
be backported for 2.4-3.2 under the name "distutils2".
msg134971 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011年05月02日 13:51
copy_script-2.patch uses os.fsencode(), which doesn't exist in Python 3.1.
msg134972 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年05月02日 13:53
> copy_script-2.patch uses os.fsencode(), which doesn't exist in Python 3.1.
Correct, with Python 3.1, you can use filename.encode(sys.getfilesystemencoding(), 'surrogateescape'). But you must use os.fsencode() with Python >= 3.2 because on Windows, you cannot use surrogateescape with MBCS (you should use the strict error handler).
msg135374 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011年05月06日 22:15
Please commit any patch before releases of Python 3.1.4 and 3.2.1. (3.2.1 rc1 is planned on 2011年05月14日.)
msg135749 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年05月10日 22:15
New changeset 6ad356525381 by Victor Stinner in branch 'default':
Close #10419, issue #6011: build_scripts command of distutils handles correctly
http://hg.python.org/cpython/rev/6ad356525381 
msg135752 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年05月10日 22:32
New changeset 47236a0cfb15 by Victor Stinner in branch '3.2':
Close #10419, issue #6011: build_scripts command of distutils handles correctly
http://hg.python.org/cpython/rev/47236a0cfb15 
msg135754 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年05月10日 22:59
New changeset fd7d4639dae2 by Victor Stinner in branch '3.1':
Issue #10419: Fix build_scripts command of distutils to handle correctly
http://hg.python.org/cpython/rev/fd7d4639dae2 
msg135756 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年05月10日 23:06
Issue fixed in Python 3.1, 3.2, 3.3.
Thanks to Arfrever, I realized that this issue not only concerns the compilation of Python itself with a non-ASCII prefix (issue #6011), but the installation of any Python script containing a non-ASCII character. So I also fixed it in Python 3.1. I replaced os.fsencode(name) by name.encode(sys.getfilesystemencoding(), 'surrogateescape') in 3.1.
msg135786 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2011年05月11日 17:10
I have committed the fix for Distribute:
https://bitbucket.org/tarek/distribute/changeset/97f12f8f6bf1
(However Distribute would fail to create entry points scripts if sys.executable contained unencodable characters.)
msg136289 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年05月19日 13:18
New changeset cc5cfeaa4a8d by Victor Stinner in branch 'default':
Issue #10419, issue #6011: port 6ad356525381 fix from distutils to packaging
http://hg.python.org/cpython/rev/cc5cfeaa4a8d 
History
Date User Action Args
2022年04月11日 14:57:08adminsetgithub: 54628
2011年05月19日 13:18:46python-devsetmessages: + msg136289
2011年05月11日 17:10:22Arfreversetmessages: + msg135786
2011年05月10日 23:06:21vstinnersetmessages: + msg135756
2011年05月10日 22:59:44python-devsetmessages: + msg135754
2011年05月10日 22:32:15python-devsetmessages: + msg135752
2011年05月10日 22:15:39python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg135749

resolution: fixed
stage: resolved
2011年05月07日 09:49:32vstinnersetpriority: normal -> release blocker
nosy: + benjamin.peterson, georg.brandl
2011年05月06日 22:15:08Arfreversetmessages: + msg135374
2011年05月02日 13:53:21vstinnersetmessages: + msg134972
2011年05月02日 13:51:24Arfreversetmessages: + msg134971
2011年05月01日 22:35:59eric.araujosetmessages: + msg134937
2011年05月01日 22:27:47eric.araujosetmessages: + msg134936
2011年05月01日 22:21:05vstinnersetmessages: + msg134934
2011年04月30日 23:43:37Arfreversetmessages: + msg134894
2011年04月29日 15:19:19eric.araujosetmessages: + msg134773
2011年04月28日 14:52:40Arfreversetmessages: + msg134681
2011年04月28日 14:48:59eric.araujosetmessages: + msg134680
2011年04月28日 14:29:28Arfreversetfiles: + build_scripts-binary_mode.patch

messages: + msg134678
title: distutils command build_scripts fails with UnicodeDecodeError -> distutils command build_scripts fails with UnicodeDecodeError
2011年04月28日 08:20:33lemburgsetnosy: + lemburg
title: distutils command build_scripts fails with UnicodeDecodeError -> distutils command build_scripts fails with UnicodeDecodeError
messages: + msg134661
2011年04月27日 23:38:56eric.araujosetversions: + 3rd party, Python 2.7
nosy: + alexis

messages: + msg134630

components: + Distutils2
2011年04月27日 17:16:50Arfreversetnosy: + vstinner, Arfrever

versions: + Python 3.3
2011年02月04日 03:44:00belopolskysetnosy: tarek, eric.araujo, hagen, mgorny
type: crash -> behavior
2010年11月15日 08:13:31mgornysetnosy: + mgorny
2010年11月14日 20:32:31hagencreate

AltStyle によって変換されたページ (->オリジナル) /