Message 75677 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	vstinner
Recipients	christian.heimes, shidot, vstinner
Date	2008年11月10日.01:03:25
SpamBayes Score	2.9232743e-07
Marked as misclassified	No
Message-id	<1226279009.92.0.547150221285.issue4282@psf.upfronthosting.co.za>

Content
Exemple of the problem: exec('#header\n# encoding: ISO-8859-1\nprint("h\xe9 h\xe9")\n') exec(unicode) calls source_as_string() which converts unicode to bytes using _PyUnicode_AsDefaultEncodedString() (UTF-8 charset). Then PyRun_StringFlags() is called with the UTF-8 byte string with PyCF_SOURCE_IS_UTF8 flag. But in the parser, get_coding_spec() recognize the "#coding:" header and convert bytes to unicode using the specified charset (which may be different than UTF-8). The problem is in the function PyAST_FromNode(): the flag in not used in the tokenizer but only in the AST parser. I also see: if (flags && flags->cf_flags & PyCF_SOURCE_IS_UTF8) { c.c_encoding = "utf-8"; if (TYPE(n) == encoding_decl) { #if 0 ast_error(n, "encoding declaration in Unicode string"); goto error; #endif n = CHILD(n, 0); } } else if (TYPE(n) == encoding_decl) { c.c_encoding = STR(n); n = CHILD(n, 0); } else { /* PEP 3120 */ c.c_encoding = "utf-8"; } The ast_error() may be uncommented.

Content

Exemple of the problem: exec('#header\n# encoding:
ISO-8859-1\nprint("h\xe9 h\xe9")\n')
exec(unicode) calls source_as_string() which converts unicode to bytes
using _PyUnicode_AsDefaultEncodedString() (UTF-8 charset). Then
PyRun_StringFlags() is called with the UTF-8 byte string with
PyCF_SOURCE_IS_UTF8 flag. But in the parser, get_coding_spec() recognize
the "#coding:" header and convert bytes to unicode using the specified
charset (which may be different than UTF-8).
The problem is in the function PyAST_FromNode(): the flag in not used in
the tokenizer but only in the AST parser. I also see:
 if (flags && flags->cf_flags & PyCF_SOURCE_IS_UTF8) {
 c.c_encoding = "utf-8";
 if (TYPE(n) == encoding_decl) {
#if 0
 ast_error(n, "encoding declaration in Unicode string");
 goto error;
#endif
 n = CHILD(n, 0);
 }
 } else if (TYPE(n) == encoding_decl) {
 c.c_encoding = STR(n);
 n = CHILD(n, 0);
 } else {
	/* PEP 3120 */
 c.c_encoding = "utf-8";
 }
The ast_error() may be uncommented.

History
Date	User	Action	Args
2008年11月10日 01:03:30	vstinner	set	recipients: + vstinner, christian.heimes, shidot
2008年11月10日 01:03:29	vstinner	set	messageid: <1226279009.92.0.547150221285.issue4282@psf.upfronthosting.co.za>
2008年11月10日 01:03:28	vstinner	link	issue4282 messages
2008年11月10日 01:03:26	vstinner	create

homepage