homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Py3k fails to parse a file with an iso-8859-1 string
Type: behavior Stage: test needed
Components: 2to3 (2.x to 3.x conversion tool), Unicode Versions: Python 3.1, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, azverkan, benjamin.peterson, collinwinter, vstinner
Priority: high Keywords: patch

Created on 2008年04月19日 21:04 by azverkan, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
2to3bug.py azverkan, 2008年04月19日 21:04 testcase
2to3_encoding.patch vstinner, 2009年05月04日 20:55
Messages (8)
msg65637 - (view) Author: Brandon Ehle (azverkan) Date: 2008年04月19日 21:04
While running the 2to3 script on the scons codebase, I ran into an
UnicodeDecodeError.
Attached is just the portion of the script that causes the error.
2to3 throws an error on the string regardless of whether the unicode
string literal is prepended on the front.
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: ws_comma
Traceback (most recent call last):
 File "/usr/local/bin/2to3", line 5, in <module>
 sys.exit(refactor.main())
 File "/usr/local/lib/python3.0/lib2to3/refactor.py", line 81, in main
 rt.refactor_args(args)
 File "/usr/local/lib/python3.0/lib2to3/refactor.py", line 188, in
refactor_args
 self.refactor_file(arg)
 File "/usr/local/lib/python3.0/lib2to3/refactor.py", line 217, in
refactor_file
 input = f.read() + "\n" # Silence certain parse errors
 File "/usr/local/lib/python3.0/io.py", line 1611, in read
 decoder.decode(self.buffer.read(), final=True))
 File "/usr/local/lib/python3.0/io.py", line 1199, in decode
 output = self.decoder.decode(input, final=final)
 File "/usr/local/lib/python3.0/codecs.py", line 300, in decode
 (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 59-60:
invalid data
msg65638 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2008年04月19日 21:48
2to3 running under Python 2.5.1 handles this file just fine. 2to3
running under 3.0a4+ (r62404) fails as detailed below. However, that
file doesn't run correctly under Python itself:
collinwinter@Silves:~/src/python/py3k$ ./python
/home/collinwinter/Desktop/2to3bug.py 
 File "/home/collinwinter/Desktop/2to3bug.py", line 3
 collinwinter@Silves:~/src/python/py3k
This suggests this problem isn't 2to3-specific. Refiling this issue
against py3k's Unicode support.
msg65641 - (view) Author: Brandon Ehle (azverkan) Date: 2008年04月20日 01:38
Someone on the #python IRC channel suggested that the default for python
3.0 for unicode string literals is reversed from python 2.5.
If you remove the unicode string literal (u'') from the front of the
string, it runs fine under python 3.0 and fails under 2.5 and 2.6 instead.
msg65642 - (view) Author: Brandon Ehle (azverkan) Date: 2008年04月20日 01:40
Also, I can confirm that running 2to3 with Python 2.6 correctly converts
the script but running 2to3 with Python 3.0 results in a
UnicodeDecodeError exception.
msg86641 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009年04月27日 01:42
Confirmed in py3k on rev71995.
msg86643 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009年04月27日 02:39
The problem is that 2to3 just reads the file with whatever
locale.getpreferredencoding() returns. It should use
tokenize.detect_encoding() to discover the correct encoding to open it with.
msg87175 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年05月04日 20:55
Patch using tokenize.detect_encoding() to read the encoding of Python 
scripts instead of using default io.open() encoding (utf-8).
We might write unit test.
See also related issue: #5093 
msg87481 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009年05月09日 00:33
Fixed in r72491.
History
Date User Action Args
2022年04月11日 14:56:33adminsetgithub: 46912
2009年05月09日 00:33:45benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg87481
2009年05月04日 20:55:19vstinnersetfiles: + 2to3_encoding.patch

nosy: + vstinner
messages: + msg87175

keywords: + patch
2009年04月27日 02:39:29benjamin.petersonsetmessages: + msg86643
2009年04月27日 01:42:30ajaksu2settype: behavior
components: + 2to3 (2.x to 3.x conversion tool)
versions: + Python 2.6, Python 3.1, - Python 3.0
nosy: + ajaksu2, benjamin.peterson

messages: + msg86641
stage: test needed
2008年04月20日 01:40:01azverkansetmessages: + msg65642
2008年04月20日 01:38:09azverkansetmessages: + msg65641
2008年04月19日 22:16:59collinwintersettitle: 2to3 throws a utf8 decode error on a iso-8859-1 string -> Py3k fails to parse a file with an iso-8859-1 string
2008年04月19日 21:48:49collinwintersetpriority: high
assignee: collinwinter ->
messages: + msg65638
components: + Unicode, - 2to3 (2.x to 3.x conversion tool)
2008年04月19日 21:04:59azverkancreate

AltStyle によって変換されたページ (->オリジナル) /