This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年11月08日 02:49 by shidot, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| profile_encoding.patch | vstinner, 2008年11月10日 00:40 | profile module: open input file (from the command line) in binary mode | ||
| profile_encoding-2.patch | vstinner, 2009年03月20日 01:44 | |||
| Messages (9) | |||
|---|---|---|---|
| msg75627 - (view) | Author: Takafumi SHIDO (shidot) | Date: 2008年11月08日 02:49 | |
The profile module of Python3 deesn't understand the character set of the script. When a profile is executed (like $python -m profile -o prof.dat foo.py) on a code (say foo.py) which defines its character set in the second line (like #coding:utf-8), the profile crashes with an error message like: "SyntaxError: unknown encoding: utf-8" |
|||
| msg75676 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年11月10日 00:40 | |
exec() doesn't work if the argument is an unicode string. Here is a workaround for the profile module (open the file in binary mode), but it doesn't fix the exec() problem. |
|||
| msg75677 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年11月10日 01:03 | |
Exemple of the problem: exec('#header\n# encoding:
ISO-8859-1\nprint("h\xe9 h\xe9")\n')
exec(unicode) calls source_as_string() which converts unicode to bytes
using _PyUnicode_AsDefaultEncodedString() (UTF-8 charset). Then
PyRun_StringFlags() is called with the UTF-8 byte string with
PyCF_SOURCE_IS_UTF8 flag. But in the parser, get_coding_spec() recognize
the "#coding:" header and convert bytes to unicode using the specified
charset (which may be different than UTF-8).
The problem is in the function PyAST_FromNode(): the flag in not used in
the tokenizer but only in the AST parser. I also see:
if (flags && flags->cf_flags & PyCF_SOURCE_IS_UTF8) {
c.c_encoding = "utf-8";
if (TYPE(n) == encoding_decl) {
#if 0
ast_error(n, "encoding declaration in Unicode string");
goto error;
#endif
n = CHILD(n, 0);
}
} else if (TYPE(n) == encoding_decl) {
c.c_encoding = STR(n);
n = CHILD(n, 0);
} else {
/* PEP 3120 */
c.c_encoding = "utf-8";
}
The ast_error() may be uncommented.
|
|||
| msg83842 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月20日 01:25 | |
This bug was a duplicate of #4626 which was fixed by r70113 ;-) |
|||
| msg83843 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月20日 01:30 | |
Oops, i misread this issue (wrong title!). #4626 is related, but this issue is about the profile module. The problem is that profile open the source code as text (with the default charset: UTF-8). Attached patch fixes the problem. Example: --- x.py (ISO-8859-1 text file) --- #coding: ISO-8859-1 print("hé hé") ----------------------------------- Run: python -m profile x.py Current result: (...) File ".../py3k/Lib/profile.py", line 614, in main script = fp.read() File ".../Lib/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode bytes (...) With my patch, it works as expected. |
|||
| msg83844 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月20日 01:44 | |
Oops, benjamin noticed that it doesn't work with Windows end of line (\r\n). New patch reads the file encoding instead of reading file content as bytes. |
|||
| msg83846 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月20日 01:56 | |
This regression was introduced by the removal of execfile() in Python3. The proposed replacement of execfile() is wrong. I propose a generic fix in the issue #5524. |
|||
| msg83933 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月21日 10:51 | |
After some discussions, I think that my first patch (profile_encoding.patch) was correct but we also have to fix #4628. |
|||
| msg101477 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年03月22日 02:00 | |
Fixed by r79271 (py3k), r79272 (3.1). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:41 | admin | set | github: 48532 |
| 2010年03月22日 02:00:33 | vstinner | set | status: open -> closed resolution: fixed messages: + msg101477 |
| 2009年03月21日 10:51:36 | vstinner | set | dependencies:
+ No universal newline support for compile() when using bytes messages: + msg83933 |
| 2009年03月20日 01:56:57 | vstinner | set | messages: + msg83846 |
| 2009年03月20日 01:44:43 | vstinner | set | files:
+ profile_encoding-2.patch keywords: + patch messages: + msg83844 |
| 2009年03月20日 01:38:47 | brett.cannon | set | keywords:
- patch stage: test needed -> patch review |
| 2009年03月20日 01:30:45 | vstinner | set | keywords: + needs review |
| 2009年03月20日 01:30:35 | vstinner | set | status: closed -> open title: exec(unicode): invalid charset when #coding:xxx spec is used -> profile doesn't support non-UTF8 source code messages: + msg83843 dependencies: + compile() doesn't ignore the source encoding when a string is passed in resolution: fixed -> (no value) |
| 2009年03月20日 01:25:22 | vstinner | set | status: open -> closed resolution: fixed messages: + msg83842 |
| 2008年11月10日 09:48:00 | vstinner | set | title: (Python3) The profile module deesn't understand the character set definition -> exec(unicode): invalid charset when #coding:xxx spec is used |
| 2008年11月10日 09:46:37 | vstinner | set | nosy: + brett.cannon |
| 2008年11月10日 01:03:28 | vstinner | set | messages: + msg75677 |
| 2008年11月10日 00:40:37 | vstinner | set | files:
+ profile_encoding.patch keywords: + patch messages: + msg75676 nosy: + vstinner |
| 2008年11月09日 17:39:21 | christian.heimes | set | priority: normal nosy: + christian.heimes type: crash -> behavior components: + Library (Lib) stage: test needed |
| 2008年11月08日 02:49:31 | shidot | create | |