homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: compile() should not encode 'filename' (at least on Windows)
Type: behavior Stage: test needed
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Drekin, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2012年01月11日 03:46 by terry.reedy, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (8)
msg151034 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012年01月11日 03:46
The 3.2.2 doc for compile() says "The filename argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file ('<string>' is commonly used)."
I am not sure what 'recognizable' is supposed to mean, but as I understand it, it would be user-specific and any string containing a fake 'filename' should be accepted and attached to the output code object as the .co_filename attribute. (At least on Windows.)
In fact, compile() has a hidden restriction: it encodes 'filename' with the local filesystem encoding. It tosses the bytes result (at least on Windows) but lets a UnicodeEncodeError terminate compilation. The effect is to add an undocumented and spurious dependency to code that has nothing to do with real files or the local machine.
In #10114, msg118845, Victor Stinner justified this with 
"co_filename attribute is used to display the traceback: Python opens the related file, read the source code line and display it."
If the filename is fake, it cannot do that. (Perhaps the doc should warn users to make sure that fake filenames do not match any possibly real filenames ;-). The traceback mechanism could ignore UnicodeEncodeErrors just as well as it now ignores IO(?)Errors when open('fakename') does not not work.
Victor continues "On Windows, co_filename is directly used because Windows accepts unicode for filenames." This is not true in that on at least some Windows, compile tries to encode with the mbcs codec, which in turn uses the hidden local codepage. I believe that for most or all codepages, this will even raise errors for some valid Unicode filenames.
I do not know whether the stored .co_filename attribute type for *nix is str, as on Windows, or bytes. If the latter, the doc should say so.
If compile() continues to filter fake filenames, which I oppose, the doc should also say so and say what it does.
This issue came up on python-list when someone used a Chinese filename and mbcs rejected it.
msg151076 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012年01月11日 18:41
My supposition that compile() rejects some real file names appears correct: from python-list
ME: Is this a filename that could be an actual, valid filename on your system?
OP: Yes it is. open works on that file.
msg195954 - (view) Author: Adam Bartoš (Drekin) * Date: 2013年08月23日 09:07
Hello. Will this be fixed? It's really annoying that you cannot pass valid unicode filename to compile(). I'm using a workaround: I just pass "<placeholder>" and then "update" the resulting code object recursively to set the correct co_filename. Afterwards the code object can be executed and produces correct tracebacks. (I'm using Windows.)
Fixing this will probably fix also http://bugs.python.org/issue17588 . It doesn't bother just me. See e.g. http://stackoverflow.com/questions/8798591/unicodeencodeerror-when-using-the-compile-function .
Thank you. Drekin
msg195983 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013年08月23日 15:56
Victor, do you have any opinion on this unicode filename issue?
msg196005 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年08月23日 19:02
> Victor, do you have any opinion on this unicode filename issue?
I closed the issue #11619 in january 2013 before there was no user requesting the feature. I just reopened the issue because users now ask for it.
msg196247 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年08月26日 20:39
This issue has been fixed in issue #11619 by:
New changeset df2fdd42b375 by Victor Stinner in branch 'default':
Close #11619: The parser and the import machinery do not encode Unicode
http://hg.python.org/cpython/rev/df2fdd42b375
Thanks for the report!
(I don't plan to backport the fix to Python 3.3, it's a huge patch for a rare use case.)
msg197706 - (view) Author: Adam Bartoš (Drekin) * Date: 2013年09月14日 12:18
Since this issue was fixed, shouldn't it be marked fixed here?
msg197709 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年09月14日 13:47
Closed.
History
Date User Action Args
2022年04月11日 14:57:25adminsetgithub: 57967
2013年09月14日 13:47:09vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg197709
2013年09月14日 12:18:17Drekinsetmessages: + msg197706
2013年08月26日 20:39:04vstinnersetmessages: + msg196247
versions: + Python 3.4, - Python 3.2, Python 3.3
2013年08月23日 19:02:03vstinnersetmessages: + msg196005
2013年08月23日 15:56:08terry.reedysetnosy: + vstinner
messages: + msg195983
2013年08月23日 09:07:05Drekinsetnosy: + Drekin
messages: + msg195954
2012年01月11日 18:41:30terry.reedysetmessages: + msg151076
2012年01月11日 03:46:43terry.reedycreate

AltStyle によって変換されたページ (->オリジナル) /