Issue 19518: Add new PyRun_xxx() functions to not encode the filename

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/63717

classification

Title:	Add new PyRun_xxx() functions to not encode the filename
Type:	enhancement	Stage:	test needed
Components:	Interpreter Core	Versions:	Python 3.5

process

Dependencies:	Superseder:
Status:	closed	Resolution:	out of date
Assigned To:	Nosy List:	Arfrever, Drekin, eric.snow, georg.brandl, larry, ncoghlan, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2013年11月07日 11:47 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pyrun_object.patch	vstinner, 2013年11月07日 11:47	review
pyrun_object-2.patch	vstinner, 2013年11月07日 22:42	review

Messages (32)
msg202326 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月07日 11:47
The changeset af822a6c9faf of the issue #19512 added the function PyRun_InteractiveOneObject(). By the way, I forgot to document this function. This issue is also a reminder for that. The purpose of the new function is to avoid creation of temporary Unicode strings and useless call to Unicode encoder/decoder. I propose to generalize the change to other PyRun_xxx() functions. Attached patch adds the following functions: - PyRun_AnyFileObject() - PyRun_SimpleFileObject() - PyRun_InteractiveLoopObject() - PyRun_FileObject() On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup). TODO: I should document all these new functions.
msg202329 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年11月07日 12:32
> On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup). Doesn't the surrogateescape error handler solve this issue?
msg202335 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月07日 13:01
2013年11月7日 Serhiy Storchaka <report@bugs.python.org>: >> On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup). > > Doesn't the surrogateescape error handler solve this issue? surrogateescape is very specific to UNIX, or more generally systems using bytes filenames. Windows native type for filename is Unicode. To support any Unicode filename on Windows, you must never encode a filename. surrogateescape avoids decoding errors, here is the problem is an encoding error. For example, "abé" cannot be encoded to ASCII. "abé".encode("ascii", "surrogateescape") doesn't help here.
msg202338 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年11月07日 13:31
I added some comments on Rietveld. Please do not commit without documentation and tests.
msg202392 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月07日 22:42
Updated patch addressing some remarks of Serhiy and adding documentation.
msg202393 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月07日 22:43
> Updated patch addressing some remarks of Serhiy and adding documentation. Oh, and it adds also an unit test. I didn't run the unit test on Windows yet.
msg202397 - (view)	Author: Eric Snow (eric.snow) * (Python committer)	Date: 2013年11月08日 00:05
PEP 432 relates pretty closely here.
msg202398 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月08日 00:07
> PEP 432 relates pretty closely here. What is the relation between this issue and the PEP 432?
msg202399 - (view)	Author: Eric Snow (eric.snow) * (Python committer)	Date: 2013年11月08日 00:27
PEP 432 is all about the PyRun_* API and especially relates to refactoring it with the goal of improving extensibility and maintainability. I'm sure Nick could expound, but the PEP is a response to the cruft that has accumulated over the years in Python/pythonrun.c. The result of that organic growth makes it harder than necessary to do things like adding new commandline options. While I haven't looked closely at the new function you added, I expect PEP 432 would have simplified things or even removed the need for a new function.
msg202411 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2013年11月08日 09:45
PEP 432 doesn't really touch the PyRun_* APIs - it's all about refactoring Py_Initialize so you can use most of the C API during the latter parts of the configuration process (e.g. setting up the path for the import system). pythonrun.c is just a monstrous beast that covers the entire interpreter lifecycle from initialisation through script execution through to termination.
msg203447 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月19日 23:57
> Updated patch addressing some remarks of Serhiy and adding documentation. Anyone for a new review?
msg203464 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年11月20日 07:45
PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object.
msg203474 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月20日 13:38
> PyRun_FileObject() looks misleading, because it works with FILE, not with a file object. I simply replaced the current suffix with Object(). Only filename is converted from char to PyObject*. Do you have a better suggestion for the new name?
msg203476 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年11月20日 13:48
No I have not a better suggestion. But I afraid that one day you will wanted to extend PyRun_File*() function to work with a general Python file object (perhaps there is such issue already) and then you will encountered a problem.
msg203480 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2013年11月20日 14:13
Perhaps we could we use the suffix "Unicode" rather than "Object"? These don't work with arbitrary objects, they expect a unicode string. PyRun_InteractiveOneObject would be updated to use the new suffix as well. That would both be clearer for the user, and address Serhiy's concern about the possible ambiguity: PyRun_FileUnicode still isn't crystal clear, but it's clearer than PyRun_FileObject.
msg203481 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月20日 14:17
FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*. Example: http://hg.python.org/cpython/rev/df2fdd42b375 http://bugs.python.org/issue11619
msg203489 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2013年11月20日 15:03
Hmm, reading more of those and I think Serhiy is definitely right - Object is the wrong suffix. Unicode isn't right either, since the main problem is that ambiguity around which parameter is a Python Unicode object. The API names that end in StringObject or FileObject don't give the right idea at all. The shortest accurate suffix I can come up with at the moment is the verbose "WithUnicodeFilename": PyParser_ParseStringObject vs PyParser_ParseStringWithUnicodeFilename Other possibilities: PyParser_ParseStringUnicode # Huh? PyParser_ParseStringDecodedFilename # Slight fib on Windows, but mostly accurate PyParser_ParseStringAnyFilename Inserting an underscore before the suffix is another option (although I don't think it much matters either way).
msg203490 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年11月20日 15:11
> FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*. Most of them were added in 3.4. Unfortunately several functions were added earlier (e.g. PyImport_ExecCodeModuleObject, PyErr_SetFromErrnoWithFilenameObject).
msg203592 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月21日 09:09
So, which suffix should be used?
msg203593 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年11月21日 09:18
"Unicode" suffix in existing functions means Py_UNICODE argument. May be "*Ex2"? It can't be misinterpreted but looks ugly.
msg203608 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年11月21日 10:36
> "Unicode" suffix in existing functions means Py_UNICODE argument. Yes, this is why I chose Object() suffix. Are you still opposed to "Object" suffix? (Yes, "*Ex2" is really ugly.)
msg203618 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2013年11月21日 12:04
How about "ExName"? This patch: PyRun_AnyFileExName PyRun_SimpleFileExName PyRun_InteractiveOneExName PyRun_InteractiveLoopExName PyRun_FileExName Previous patch: Py_CompileStringExName PyAST_FromNodeExName PyAST_CompileExName PyFuture_FromASTExName PyParser_ParseFileExName PyParser_ParseStringExName PyErr_SyntaxLocationExName PyErr_ProgramTextExName PyParser_ASTFromStringExName PyParser_ASTFromFileExName - "Ex" has precedent as indicating a largely functionally equivalent API with a different signature - "Name" suggests strongly that we're tinkering with the filename (since this APIs don't accept another name) - "ExName" is the same length as "Object" but far more explicit Thoughts?
msg206391 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年12月16日 23:22
Sorry, but because of the bikeshedding, I'm not more interested to work on this issue. Don't hesitate to re-work my patch if you want to fix the bug ("On Windows, these changes should allow to pass an unencodable filename on the command line").
msg206396 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2013年12月17日 02:50
Just getting this on Larry's radar and summarising the current position. The original problem: using "char " to pass filenames around doesn't work properly on Windows, we need to use Unicode objects. The solution: parallel APIs that accept PyObject rather than char * for the filename parameters. The new problem: both Serhiy and I find the Object() suffix currently used for those "filename as Unicode object instead of C string" parallel APIs to be ambiguous and confusing. However, the problem the parallel APIs solve is real, and reverting or excessively modifying any of the work Victor has already done would be silly. That means we're now in a situation where we have to either: accept Object as the suffix for all of these APIs indefinitely, even though it's ambiguous and confusing choose a new suffix and use that for the APIs already added in 3.4 and add compatibility aliases for the older APIs to make them consistent * change the public API additions already made for 3.4 to new private APIs by adding an underscore prefix, and then reconsider the public API naming question for 3.5 * accept *Object as the suffix for the moment, but aim to replace it with something more descriptive in Python 3.5 Neither Serhiy nor I are comfortable with the first option, and making a decision in haste for the second option doesn't seem like a good idea. Option 3 seems like far too much work to make things less useful (a capability that works, but has an ambiguous and confusing name, is better than a capability that isn't provided at all) That leaves option number 4: don't change anything further now, but revisit it for 3.5, including changing the preferred name of the existing APIs. I like that approach, so I'm assigning to myself to take a closer look at how some of the suggestions above read in the docs once 3.4 is out the door.
msg206449 - (view)	Author: Larry Hastings (larry) * (Python committer)	Date: 2013年12月17日 14:38
So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet? Option 4 is silly--I don't think we should ship them as public APIs in 3.4 if we're planning to rename them. I prefer the previous options. p.s. fwiw I hate "ExName".
msg206453 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年12月17日 14:55
> So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet? Not all. Only following functions are new in 3.4: Parser/parsetok.c:PyParser_ParseStringObject Parser/parsetok.c:PyParser_ParseFileObject Python/future.c:PyFuture_FromASTObject Python/symtable.c:PySymtable_BuildObject Python/compile.c:PyAST_CompileObject Python/_warnings.c:PyErr_WarnExplicitObject Python/ast.c:PyAST_FromNodeObject Python/errors.c:PyErr_SyntaxLocationObject Python/errors.c:PyErr_ProgramTextObject Python/pythonrun.c:PyRun_InteractiveOneObject Python/pythonrun.c:Py_CompileStringObject Python/pythonrun.c:Py_SymtableStringObject Python/pythonrun.c:PyParser_ASTFromStringObject Python/pythonrun.c:PyParser_ASTFromFileObject Following functions existed in 3.3: Objects/moduleobject.c:PyModule_NewObject Objects/moduleobject.c:PyModule_GetNameObject Objects/moduleobject.c:PyModule_GetFilenameObject Objects/abstract.c:PyObject_CallObject Objects/bytesobject.c:PyBytes_FromObject Objects/fileobject.c:PyFile_WriteObject Objects/memoryobject.c:PyMemoryView_FromObject Objects/longobject.c:PyLong_FromUnicodeObject Objects/weakrefobject.c:PyWeakref_GetObject Objects/exceptions.c:PyUnicodeEncodeError_GetObject Objects/exceptions.c:PyUnicodeDecodeError_GetObject Objects/exceptions.c:PyUnicodeTranslateError_GetObject Objects/unicodeobject.c:PyUnicode_FromObject Objects/unicodeobject.c:PyUnicode_FromEncodedObject Objects/unicodeobject.c:PyUnicode_AsDecodedObject Objects/unicodeobject.c:PyUnicode_AsEncodedObject Objects/bytearrayobject.c:PyByteArray_FromObject Python/sysmodule.c:PySys_GetObject Python/sysmodule.c:PySys_SetObject Python/errors.c:PyErr_SetObject Python/errors.c:PyErr_SetFromErrnoWithFilenameObject Python/import.c:_PyImport_FixupExtensionObject Python/import.c:_PyImport_FindExtensionObject Python/import.c:PyImport_AddModuleObject Python/import.c:PyImport_ExecCodeModuleObject Python/import.c:PyImport_ImportFrozenModuleObject Python/import.c:PyImport_ImportModuleLevelObject Python/modsupport.c:PyModule_AddObject Python/pyarena.c:PyArena_AddPyObject
msg206456 - (view)	Author: Larry Hastings (larry) * (Python committer)	Date: 2013年12月17日 14:58
Are all the functions that use "Object" to indicate "Unicode object instead of string" new in 3.4? Of those, how many are undocumented?
msg206460 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年12月17日 15:16
> Are all the functions that use "Object" to indicate "Unicode object instead > of string" new in 3.4? Of those, how many are undocumented? Following 5 functions work with PyObject* filenames and have Object-less variants which works with char * filenames: Python/errors.c:PyErr_SetFromErrnoWithFilenameObject Python/import.c:PyImport_AddModuleObject Python/import.c:PyImport_ExecCodeModuleObject Python/import.c:PyImport_ImportFrozenModuleObject Python/import.c:PyImport_ImportModuleLevelObject Private _PyImport_FixupExtensionObject and _PyImport_FindExtensionObject have no Object-less variants. All other *Object functions are unrelated.
msg206462 - (view)	Author: Larry Hastings (larry) * (Python committer)	Date: 2013年12月17日 15:33
Are those five functions new in 3.4 and undocumented?
msg206464 - (view)	Author: Larry Hastings (larry) * (Python committer)	Date: 2013年12月17日 15:34
Are we proposing renaming any functions that are either a) not new in 3.4, or b) were documented as of 3.4 beta 1?
msg206466 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年12月17日 15:45
> Are those five functions new in 3.4 and undocumented? PyErr_SetFromErrnoWithFilenameObject exists even in 2.7. Other 4 PyImport_*Object functions all added in 3.3 (see issue3080). All 5 functions are documented. 14 new functions were added in 3.4.
msg247988 - (view)	Author: Adam Bartoš (Drekin) *	Date: 2015年08月04日 12:20
I'm not sure this is the right issue. The support for Unicode filenames is not (at least on Windows) ideal. Let α.py be a Python script with invalid syntax. > py α.py File "<encoding error>", line 2 as as compile error ^ SyntaxError: invalid syntax On the other hand, if run.py is does something like path = sys.argv[1] with tokenize.open(path) as f: source = f.read() code = compile(source, path, "exec") exec(code, __main__.__dict__) we get > py run.py α.py File "Python Unicode\\u03b1.py", line 2 as as compile error ^ SyntaxError: invalid syntax (or 'File "Python Unicode\α.py", line 2' depending on whether sys.stdout can encode the string). So the "<encoding error>" in the first example is unfortunate as it is easy to get better result even by a simple pure Python approach.

History
Date	User	Action	Args
2022年04月11日 14:57:53	admin	set	github: 63717
2015年10月02日 21:09:44	vstinner	set	status: open -> closed resolution: out of date
2015年08月04日 12:20:07	Drekin	set	nosy: + Drekin messages: + msg247988
2015年06月28日 03:03:46	ncoghlan	set	assignee: ncoghlan ->
2013年12月17日 15:45:44	serhiy.storchaka	set	messages: + msg206466
2013年12月17日 15:34:42	larry	set	messages: + msg206464
2013年12月17日 15:33:16	larry	set	messages: + msg206462
2013年12月17日 15:16:12	serhiy.storchaka	set	messages: + msg206460
2013年12月17日 14:58:23	larry	set	messages: + msg206456
2013年12月17日 14:55:48	serhiy.storchaka	set	messages: + msg206453
2013年12月17日 14:38:11	larry	set	messages: + msg206449
2013年12月17日 02:54:09	ncoghlan	set	priority: normal
2013年12月17日 02:50:48	ncoghlan	set	priority: normal -> (no value) nosy: + larry versions: + Python 3.5, - Python 3.4 messages: + msg206396 assignee: ncoghlan
2013年12月16日 23:22:32	vstinner	set	nosy: - vstinner
2013年12月16日 23:22:20	vstinner	set	nosy: georg.brandl, ncoghlan, vstinner, Arfrever, eric.snow, serhiy.storchaka messages: + msg206391
2013年11月21日 12:04:49	ncoghlan	set	messages: + msg203618
2013年11月21日 10:36:47	vstinner	set	messages: + msg203608
2013年11月21日 09:18:38	serhiy.storchaka	set	messages: + msg203593
2013年11月21日 09:09:13	vstinner	set	messages: + msg203592
2013年11月20日 15:11:51	serhiy.storchaka	set	messages: + msg203490
2013年11月20日 15:03:34	ncoghlan	set	messages: + msg203489
2013年11月20日 14:17:03	vstinner	set	messages: + msg203481
2013年11月20日 14:13:57	ncoghlan	set	messages: + msg203480
2013年11月20日 13:48:25	serhiy.storchaka	set	messages: + msg203476
2013年11月20日 13:38:59	vstinner	set	messages: + msg203474
2013年11月20日 07:45:57	serhiy.storchaka	set	messages: + msg203464
2013年11月19日 23:57:54	vstinner	set	messages: + msg203447
2013年11月08日 09:45:35	ncoghlan	set	messages: + msg202411
2013年11月08日 00:27:12	eric.snow	set	messages: + msg202399
2013年11月08日 00:07:30	vstinner	set	messages: + msg202398
2013年11月08日 00:05:16	eric.snow	set	nosy: + eric.snow, ncoghlan messages: + msg202397
2013年11月07日 22:43:22	vstinner	set	messages: + msg202393
2013年11月07日 22:42:52	vstinner	set	files: + pyrun_object-2.patch messages: + msg202392
2013年11月07日 16:48:24	Arfrever	set	nosy: + Arfrever
2013年11月07日 13:31:46	serhiy.storchaka	set	messages: + msg202338
2013年11月07日 13:02:54	vstinner	set	nosy: + georg.brandl
2013年11月07日 13:01:19	vstinner	set	messages: + msg202335
2013年11月07日 12:32:41	serhiy.storchaka	set	messages: + msg202329
2013年11月07日 12:30:46	serhiy.storchaka	set	type: enhancement components: + Interpreter Core stage: test needed
2013年11月07日 11:48:00	vstinner	create

homepage