Message115282
| Author |
vstinner |
| Recipients |
ideasman42, vstinner |
| Date |
2010年08月31日.22:29:49 |
| SpamBayes Score |
4.0656367e-12 |
| Marked as misclassified |
No |
| Message-id |
<1283293792.55.0.37414088951.issue9713@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
The problem is not specific to Py_CompileString(): all functions based (indirectly) on PyParser_ASTFromString() and PyParser_ASTFromFile() expect filenames encoded in utf-8 with the strict error handler.
If we choose to use something else than utf-8 in strict mode, here is an incomplete list of functions that have to be patched:
- parser:
* initerr()
* err_input()
- ast
* ast_error_finish()
And the list of impacted functions (parsing functions accepting filenames):
- PyParser_ParseStringFlagsFilename()
- PyParser_ParseFile*()
- PyParser_ASTFromString(), PyParser_ASTFromFile()
- PyAST_FromNode()
- PyRun_SimpleFile*()
- PyRun_AnyFile*()
- PyRun_InteractiveOneFlags()
- etc.
All these functions are public and I don't think that it would be a good idea to change the encoding (eg. to iso-8859-1). We can use a different error handler (especially surrogateespace, as suggested in the initial message) and/or create new functions accepting unicode filenames.
--
I'm working on undecodable filenames in issues #8611 and #9425, especially on the import machinery part. When the import machinery will be fully unicode compliant, the last part will be the "parser machinery" (Parser/*.c). It is a little bit more complex to patch the parser because there is the bootstrap problem: the parser is compiled twice, once with a small subset of the C Python API (using some mockups), once with the full API. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2010年08月31日 22:29:52 | vstinner | set | recipients:
+ vstinner, ideasman42 |
| 2010年08月31日 22:29:52 | vstinner | set | messageid: <1283293792.55.0.37414088951.issue9713@psf.upfronthosting.co.za> |
| 2010年08月31日 22:29:51 | vstinner | link | issue9713 messages |
| 2010年08月31日 22:29:49 | vstinner | create |
|