Message75677
| Author |
vstinner |
| Recipients |
christian.heimes, shidot, vstinner |
| Date |
2008年11月10日.01:03:25 |
| SpamBayes Score |
2.9232743e-07 |
| Marked as misclassified |
No |
| Message-id |
<1226279009.92.0.547150221285.issue4282@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Exemple of the problem: exec('#header\n# encoding:
ISO-8859-1\nprint("h\xe9 h\xe9")\n')
exec(unicode) calls source_as_string() which converts unicode to bytes
using _PyUnicode_AsDefaultEncodedString() (UTF-8 charset). Then
PyRun_StringFlags() is called with the UTF-8 byte string with
PyCF_SOURCE_IS_UTF8 flag. But in the parser, get_coding_spec() recognize
the "#coding:" header and convert bytes to unicode using the specified
charset (which may be different than UTF-8).
The problem is in the function PyAST_FromNode(): the flag in not used in
the tokenizer but only in the AST parser. I also see:
if (flags && flags->cf_flags & PyCF_SOURCE_IS_UTF8) {
c.c_encoding = "utf-8";
if (TYPE(n) == encoding_decl) {
#if 0
ast_error(n, "encoding declaration in Unicode string");
goto error;
#endif
n = CHILD(n, 0);
}
} else if (TYPE(n) == encoding_decl) {
c.c_encoding = STR(n);
n = CHILD(n, 0);
} else {
/* PEP 3120 */
c.c_encoding = "utf-8";
}
The ast_error() may be uncommented. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2008年11月10日 01:03:30 | vstinner | set | recipients:
+ vstinner, christian.heimes, shidot |
| 2008年11月10日 01:03:29 | vstinner | set | messageid: <1226279009.92.0.547150221285.issue4282@psf.upfronthosting.co.za> |
| 2008年11月10日 01:03:28 | vstinner | link | issue4282 messages |
| 2008年11月10日 01:03:26 | vstinner | create |
|