This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年08月30日 07:46 by ideasman42, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (5) | |||
|---|---|---|---|
| msg115202 - (view) | Author: Campbell Barton (ideasman42) * | Date: 2010年08月30日 07:46 | |
On linux I have a path which python reads as...
/data/test/num\udce9ro_bad/untitled.blend
os.listdir("/data/test/") returns this ['num\udce9ro_bad']
But the same path cant be given to the C api's Py_CompileString
Where fn is '/data/test/num\udce9ro_bad/untitled.blend/test.py'
Py_CompileString(buf, fn, Py_file_input);
...gives this error.
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 14-16: invalid data
From this pep, non decode-able paths should use surrogateescape's
http://www.python.org/dev/peps/pep-0383/
|
|||
| msg115282 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年08月31日 22:29 | |
The problem is not specific to Py_CompileString(): all functions based (indirectly) on PyParser_ASTFromString() and PyParser_ASTFromFile() expect filenames encoded in utf-8 with the strict error handler. If we choose to use something else than utf-8 in strict mode, here is an incomplete list of functions that have to be patched: - parser: * initerr() * err_input() - ast * ast_error_finish() And the list of impacted functions (parsing functions accepting filenames): - PyParser_ParseStringFlagsFilename() - PyParser_ParseFile*() - PyParser_ASTFromString(), PyParser_ASTFromFile() - PyAST_FromNode() - PyRun_SimpleFile*() - PyRun_AnyFile*() - PyRun_InteractiveOneFlags() - etc. All these functions are public and I don't think that it would be a good idea to change the encoding (eg. to iso-8859-1). We can use a different error handler (especially surrogateespace, as suggested in the initial message) and/or create new functions accepting unicode filenames. -- I'm working on undecodable filenames in issues #8611 and #9425, especially on the import machinery part. When the import machinery will be fully unicode compliant, the last part will be the "parser machinery" (Parser/*.c). It is a little bit more complex to patch the parser because there is the bootstrap problem: the parser is compiled twice, once with a small subset of the C Python API (using some mockups), once with the full API. |
|||
| msg115943 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年09月09日 12:49 | |
#6543 changed the encoding of the filename argument of PyRun_SimpleFileExFlags() (and all functions based on PyRun_SimpleFileExFlags) and c_filename attribute of the compiler (private) structure in Python 3.1.3: use utf-8 in strict mode instead of filesystem encoding with surrogateescape. |
|||
| msg118838 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年10月15日 22:26 | |
See also issue #10114. |
|||
| msg119103 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年10月19日 02:04 | |
See issue #10114: fixed in Python 3.1 (r85716) and in Python 3.2 (r85569+r85570). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:05 | admin | set | github: 53922 |
| 2010年10月19日 02:04:37 | vstinner | set | status: open -> closed resolution: fixed messages: + msg119103 |
| 2010年10月15日 22:26:46 | vstinner | set | messages: + msg118838 |
| 2010年09月09日 12:49:26 | vstinner | set | messages: + msg115943 |
| 2010年08月31日 22:30:08 | vstinner | set | components:
+ Unicode, - None versions: + Python 3.2 |
| 2010年08月31日 22:29:51 | vstinner | set | messages: + msg115282 |
| 2010年08月30日 12:25:16 | eric.araujo | set | nosy:
+ vstinner |
| 2010年08月30日 07:46:51 | ideasman42 | create | |