This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年12月24日 22:39 by sjmachin, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| py3encbug2.zip | sjmachin, 2008年12月24日 23:28 | |||
| x9d.py | sjmachin, 2008年12月30日 12:20 | |||
| encoding.issue.patch | tarek, 2008年12月30日 12:45 | |||
| Messages (8) | |||
|---|---|---|---|
| msg78273 - (view) | Author: John Machin (sjmachin) | Date: 2008年12月24日 22:39 | |
File foo3.py is [cut down (orig 87Kb)] output of 2to3 conversion tool and (coincidentally) is still valid 2.x syntax. There are no syntax errors reported by any of the following: \python26\python -c "import foo3" \python26\python foo3.py \python26\python setup.py install \python30\python -c "import foo3" \python30\python foo3.py However 3.0 install \python30\python setup.py install produces: """ [snip] running install_lib copying build\lib\foo3.py -> C:\python30\Lib\site-packages byte-compiling C:\python30\Lib\site-packages\foo3.py to foo3.pyc File "C:\python30\Lib\site-packages\foo3.py", line 0 ### Note also "line 0" above ### SyntaxError: unknown encoding: cp1252 """ Same happens if alternative name windows-1252 is used instead of cp1252. NOTE: file foo3.py actually does have some non-ASCII characters (\xa0, \x93, \x94), in comments. Another file (bar3.py) from the same package contains \xb7 twice, but doesn't have the unknown encoding problem. There are several other files in the same package that start with "# -*- coding: windows-1252 -*-" (or cp1252, or even cp1251(!)) but have no non-ASCII characters in them. They don't get this incorrect error message either. |
|||
| msg78275 - (view) | Author: John Machin (sjmachin) | Date: 2008年12月24日 23:28 | |
A clue:
>>> print(ascii(b'\xa0\x93\x94\xb7'.decode('cp1252')))
'\xa0\u201c\u201d\xb7'
Could be that it only happens where there's a cp1252 character that's
not in latin1; see files x93.py and x94.py (have problem) and xa0b7.py
(doesn't have problem).
|
|||
| msg78518 - (view) | Author: Tarek Ziadé (tarek) * (Python committer) | Date: 2008年12月30日 10:42 | |
Here's a status: The problem is located in the codec that decodes the data (called by the compile builtin). It throws an error : *** UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 853: character maps to <undefined> Which is caught by compile and translated into: SyntaxError: unknown encoding: cp1252 So I see two problems: 1/ why compile throws such an error when there's an UnicodeDecodeError 2/ why compile works well under Py2 since 0x9d is not part of the cp1252 mapping I have written a test that reproduces the problem, and I am still investigating. If I can't find the problem I will ask for help on python-dev because I have no knowledge in the compiler internals yet. |
|||
| msg78522 - (view) | Author: John Machin (sjmachin) | Date: 2008年12月30日 12:20 | |
TWO POINTS: (1) I am not very concerned about chars like \x9d which are not valid in the declared encoding; I am more concerned with chars like \x93 and \x94 which *ARE* valid in the declared encoding. Please ensure that these cases are included in tests. (2) Please check your test data and test results. I get different results. I have created a file x9d.py by making the minimal changes to x94.py. For me, this blows up on bytecompiling with *both* 3.0 (UnicodeDecodeError, as expected) and 2.x (Syntax Error unknown encoding cp1252, wrong message) -- see below. byte-compiling C:\python30\Lib\site-packages\x9d.py to x9d.pyc Traceback (most recent call last): File "setup.py", line 5, in <module> py_modules = ["foo3", "bar3", "x93", "x94", "x9d", "xa0b7"] File "C:\python30\lib\distutils\core.py", line 149, in setup dist.run_commands() File "C:\python30\lib\distutils\dist.py", line 942, in run_commands self.run_command(cmd) File "C:\python30\lib\distutils\dist.py", line 962, in run_command cmd_obj.run() File "C:\python30\lib\distutils\command\install.py", line 571, in run self.run_command(cmd_name) File "C:\python30\lib\distutils\cmd.py", line 317, in run_command self.distribution.run_command(command) File "C:\python30\lib\distutils\dist.py", line 962, in run_command cmd_obj.run() File "C:\python30\lib\distutils\command\install_lib.py", line 91, in run self.byte_compile(outfiles) File "C:\python30\lib\distutils\command\install_lib.py", line 125, in byte_compile dry_run=self.dry_run) File "C:\python30\lib\distutils\util.py", line 520, in byte_compile compile(file, cfile, dfile) File "C:\python30\lib\py_compile.py", line 137, in compile codestring = f.read() File "C:\python30\lib\io.py", line 1724, in read decoder.decode(self.buffer.read(), final=True)) File "C:\python30\lib\io.py", line 1295, in decode output = self.decoder.decode(input, final=final) File "C:\python30\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 64: character maps to <undefined> byte-compiling C:\python26\Lib\site-packages\x9d.py to x9d.pyc SyntaxError: ('unknown encoding: cp1252', ('C:\\python26\\Lib\\site-packages\\x9d.py', 0, 0, None)) byte-compiling c:\python25\Lib\site-packages\x9d.py to x9d.pyc File "c:\python25\Lib\site-packages\x9d.py", line 0 SyntaxError: ('unknown encoding: cp1252', ('c:\\python25\\Lib\\site-packages\\x9d.py', 0, 0, None)) |
|||
| msg78524 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2008年12月30日 12:41 | |
On 2008年12月30日 13:20, John Machin wrote:
> byte-compiling C:\python26\Lib\site-packages\x9d.py to x9d.pyc
> SyntaxError: ('unknown encoding: cp1252',
> ('C:\\python26\\Lib\\site-packages\\x9d.py', 0, 0, None))
>
> byte-compiling c:\python25\Lib\site-packages\x9d.py to x9d.pyc
> File "c:\python25\Lib\site-packages\x9d.py", line 0
> SyntaxError: ('unknown encoding: cp1252',
> ('c:\\python25\\Lib\\site-packages\\x9d.py', 0, 0, None))
>
> Added file: http://bugs.python.org/file12492/x9d.py
FWIW, I've tried that file with Python 2.5 and 2.6 on my machine:
lemburg/tmp> python2.5 ~/bin/pycompile.py x9d.py
compiling x9d.py -> x9d.pyc
XXX <type 'exceptions.SyntaxError'>: unknown encoding: cp1252 (x9d.py, line 0)
lemburg/tmp> python2.6 ~/bin/pycompile.py x9d.py
compiling x9d.py -> x9d.pyc
XXX <type 'exceptions.SyntaxError'>: unknown encoding: cp1252 (x9d.py, line 0)
Note that the line number is wrong in both messages.
It is interesting that simply running the files gives a more correct
error message:
lemburg/tmp> python2.5 x9d.py
File "x9d.py", line 2
SyntaxError: 'charmap' codec can't decode byte 0x9d in position 0: character
maps to <undefined>
lemburg/tmp> python2.6 x9d.py
File "x9d.py", line 2
SyntaxError: 'charmap' codec can't decode byte 0x9d in position 0: character
maps to <undefined>
The character position is wrong again in both messages.
Needless to say that the encoding "cp1252" is *not* unknown. It looks
like compile() causes the decoding error to be overwritten with a
misleading error message.
|
|||
| msg78525 - (view) | Author: Tarek Ziadé (tarek) * (Python committer) | Date: 2008年12月30日 12:44 | |
yup, here's the test I have written to demonstrate the problem. In any case, compile doesn't behave right way in the first place. |
|||
| msg78528 - (view) | Author: John Machin (sjmachin) | Date: 2008年12月30日 13:16 | |
(1) what am I supposed to infer from "Yup"?? That all of that \x9d stuff was a mistake? (2) + def tearDown(self): + pyc_file = os.path.join(os.path.dirname(__file__), 'cp1252.pyc') + if os.path.exists(pyc_file): + os.patth.remove(pyc_file) os.patth is novel :-) |
|||
| msg78535 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年12月30日 14:37 | |
This is a duplicate of issue4626. Here, the content is correctly decoded with cp1252, then passed to compile(); but compile() works on the internal utf-8 representation, and tries to decode it again with cp1252! Yes, the error message is overwritten. If I remove the code that sets the "unknown encoding" exception, I get: >>> compile(open("c:/temp/t1252.py", encoding="cp1252").read(), "t1252.py", "exec") SyntaxError: 'charmap' codec can't decode byte 0x9d in position 35: character maps to <undefined> The 0x9d explains easily: >>> b"\x94".decode('cp1252').encode('utf8') b'\xe2\x80\x9d' |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:43 | admin | set | github: 48992 |
| 2009年01月01日 12:32:40 | georg.brandl | set | status: open -> closed |
| 2008年12月30日 17:13:37 | tarek | set | resolution: duplicate |
| 2008年12月30日 14:37:58 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc superseder: compile() doesn't ignore the source encoding when a string is passed in messages: + msg78535 |
| 2008年12月30日 13:16:39 | sjmachin | set | messages: + msg78528 |
| 2008年12月30日 12:45:00 | tarek | set | files: + encoding.issue.patch |
| 2008年12月30日 12:44:43 | tarek | set | files: - encoding.issue.patch |
| 2008年12月30日 12:44:05 | tarek | set | files:
+ encoding.issue.patch keywords: + patch messages: + msg78525 |
| 2008年12月30日 12:41:36 | lemburg | set | nosy:
+ lemburg messages: + msg78524 |
| 2008年12月30日 12:20:30 | sjmachin | set | files:
+ x9d.py messages: + msg78522 |
| 2008年12月30日 10:42:22 | tarek | set | messages: + msg78518 |
| 2008年12月30日 09:41:23 | tarek | set | priority: normal assignee: tarek type: crash nosy: + tarek |
| 2008年12月24日 23:28:40 | sjmachin | set | files: - py3encbug.zip |
| 2008年12月24日 23:28:07 | sjmachin | set | files:
+ py3encbug2.zip messages: + msg78275 |
| 2008年12月24日 22:39:26 | sjmachin | create | |