Issue 3975: PyTraceBack_Print() doesn't respect # coding: xxx header

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48225

classification

Type:	Stage:
Title:	PyTraceBack_Print() doesn't respect # coding: xxx header
Components:	Versions:	Python 3.0

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	Nosy List:	amaury.forgeotdarc, vstinner
Priority:	normal	Keywords:	patch

Created on 2008年09月26日 15:12 by vstinner, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicode.py	vstinner, 2008年10月08日 10:51
traceback_unicode-5.patch	vstinner, 2008年10月08日 11:16	New version of _Py_DisplaySourceLine() supporting coding header (patch version 5)

Messages (16)
msg73851 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月26日 15:12
PyTraceBack_Print() doesn't take care of the "# coding: xxx" header of a Python script. It calls _Py_DisplaySourceLine() which opens the file as a byte stream (and not an unicode characters stream). Because of this problem, the traceback maybe truncated or invalid. Example (write it into a file and execute the file): ---- from sys import executable from os import execvpe filename = "pouet.py" out = open(filename, "wb") out.write(b"""# -- coding: GBK -- print("--asc\xA1\xA7i--") raise Exception("--asc\xA1\xA7i--")""") out.close() execvpe(executable, [executable, filename], None) ---- This issue depends on issue2384 (line number). Note: Python 2.6 may also has the problem but it doesn't parse "# coding: GBK". So it's a different problem (issue?).
msg73859 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月26日 16:26
Here is a new version of _Py_DisplaySourceLine() using PyTokenizer_FindEncoding() to read the coding header, and PyFile_FromFd() to create an "unicode-awake" file. The code could be optimized, but it least it displays correctly the file line ;-) The code is based on the original _Py_DisplaySourceLine() and call_find_module() (import.c). Notes: * The code is young and new, it might be delayed until python 3.1 * Some functions may raise new exceptions (eg. MemoryError). I don't know how Python will react if an exception is raised during PyTraceBack_Print() ? * The return code is 0 for success, but is it -1 or 1 for an error? It looks like error is "result != 0", so both -1 and 1 should be valid. I used -1.
msg73862 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月26日 16:28
(oops, first patch included an useless whitespace change)
msg73928 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月27日 15:27
Ooops, my first version introduces a regression: if file open fails, the traceback printing was stopped. Here is a new version of my patch to support #coding: header in _Py_DisplaySourceLine(). It doesn't print the line of file open fails, but continue to display the end of the traceback. But print still stops on PyFile_WriteObject() or PyFile_WriteString(). If PyFile fails, I guess that next print will also fails. (it's also the current behaviour of PyTraceBack_Print). Example: ---- Python 3.0rc1+ (py3k:66643M, Sep 27 2008, 17:11:51) >>> raise Exception('err') Traceback (most recent call last): File "<stdin>", line 1, in <module> Exception: err ---- The line is not displayed (why? no idea), but the exception ("Exception: err") is still displayed.
msg74441 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月07日 12:21
> The line is not displayed (why? no idea) Well, it's difficult to re-open <stdin>...
msg74443 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月07日 13:05
The patch goes in the good direction, here are some remarks: - The two calls to open() are missing the O_BINARY flag, necessary on Windows to avoid newline translation. This may be important for some codecs. - the GIL should be released around calls to open(); searching the whole sys.path can be long. Besides this, the "char* filename" is relevant only on utf8-encoded filesystems, and will not work on windows with non-ascii filenames. But this is another issue.
msg74450 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月07日 13:50
Le Tuesday 07 October 2008 15:06:01 Amaury Forgeot d'Arc, vous avez écrit : > - The two calls to open() are missing the O_BINARY flag, necessary on > Windows to avoid newline translation. This may be important for some > codecs. Oops, I always forget Windows specific options :-/ > - the GIL should be released around calls to open(); searching the whole > sys.path can be long. Ok, O_BINARY and GIL fixed. I just realized that sys.path is considered as a bytes string (PyBytes_Check) whereas Python3 uses an unicode string! ['', '/home/haypo/prog/python-ptrace', (...)] >>> sys.path.append(b'bytes'); sys.path ['', '/home/haypo/prog/python-ptrace', '(...)', b'bytes'] Oops, so it's possible to mix strings and bytes. But the normal case is a list of unicode strings. Another fix is needed :-/ I don't know how to get into "if (fd < 0)" (code using sys.path). Any clue? > Besides this, the "char* filename" is relevant only on utf8-encoded > filesystems, and will not work on windows with non-ascii filenames. But > this is another issue. I don't know the Windows API enough to answer. Doesn't Python provide "high level" functions to open a file?
msg74455 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月07日 14:54
> Ok, O_BINARY and GIL fixed. which patch? ;-) > I just realized that sys.path is considered as a bytes string > (PyBytes_Check) whereas Python3 uses an unicode string! > > ['', '/home/haypo/prog/python-ptrace', (...)] > >>> sys.path.append(b'bytes'); sys.path > ['', '/home/haypo/prog/python-ptrace', '(...)', b'bytes'] > > Oops, so it's possible to mix strings and bytes. But the normal case is a list > of unicode strings. Another fix is needed :-/ You could do like the import function, which converts unicode with the fs encoding. See the code in import.c, below "PyList_GetItem(path, i)" > I don't know how to get into "if (fd < 0)" (code using sys.path). > Any clue? The .pyc stores the full path of the .py, at compilation time. If the directory is moved, or accessed with another name (via mounts or soft links), the traceback displays the filename as stored in the .pyc file, but the content can still be displayed. It does not work every time, though, for example when the directory is shared between Windows & Unix. > > Besides this, the "char* filename" is relevant only on utf8-encoded > > filesystems, and will not work on windows with non-ascii filenames. > > But this is another issue. > > I don't know the Windows API enough to answer. Windows interprets the char* variables with its system-set "mbcs" encoding. On my machine, it is equivalent to the cp1252 encoding (almost equivalent latin-1). Since the 'filename' comes from a call to _PyUnicode_AsString(tb->tb_frame->f_code->co_filename), it will be wrong with any accented letter. > Doesn't Python provide "high level" functions to open a file? io.open()?
msg74460 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月07日 15:45
I still have remarks about traceback_unicode-3.patch, that I did not see before: - (a minor thing) PyMem_FREE(found_encoding) could appear only once, just after PyFile_FromFd. - I feel it dangerous to call the PyUnicode_AS_UNICODE() macro without checking that PyFile_GetLine() actually returned a PyUnicode object. - If the "truncated" variable is NULL, an exception has been set; it is necessary to exit the function.
msg74493 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月07日 22:45
Thanks for your remarks amaury. I improved my patch: - PyMem_FREE(found_encoding) is called just after PyFile_FromFd() - Create static subfunction _Py_FindSourceFile(): find a file in sys.path - Consider that sys.path contains only unicode (and not bytes) string - Clear exception on "truncated = PyUnicode_FromUnicode(p, len);" error, but continue the execution - I added PyUnicode_check(lineobj) - Replace open(filename) by open(namebuf) (while searching the full path of the file) <= fix a regression introduced by my patch Should I stop on the first error instead of using PyErr_Clear()? I would like to display the traceback even if an function failed. Sum up of the patch version 4: - use open(O_RDONLY \| O_BINARY) + PyFile_FromFd() instead of fopen() - open the file in unicode mode using the Python encoding found in the "#coding:..." header - consider sys.path as a list of unicode strings (and not of bytes strings) I used _PyUnicode_AsStringAndSize() to convert unicode to string to be consistent with tb_printinternal(). As you noticed, it uses UTF-8 which should is on Windows :-/ I propose to open a new issue when this one will be closed :-)
msg74499 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月08日 00:07
One last thing: there is a path where lineobj is not freed (when PyUnicode_Check(lineobj) is false); I suggest to move Py_XDECREF(lineobj) just before the final return statement. Reference counting is fun ;-) > Should I stop on the first error instead of using PyErr_Clear()? > I would like to display the traceback even if an function failed. You are right. Common failures should clear the error and return 0. In your patch, there is one remaining place, the call to PyFile_GetLine(). More fun will arise when my Windows terminal (encoding=cp1252) will try to display Chinese characters. Let's pretend this is yet another issue.
msg74524 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月08日 10:51
> More fun will arise when my Windows terminal (encoding=cp1252) > will try to display Chinese characters. Let's pretend this is > yet another issue. I tried the patch using a script with unicode characters (character not representable in ISO-8859-1 like polish characters ł and Ł). Result in an UTF-8 terminal (my default locale): Traceback (most recent call last): File "unicode.py", line 2, in <module> raise ValueError("unicode: Łł") ValueError: unicode: Łł => correct Result in an ISO-8859-1 terminal (I changed the encoding in my Konsole configuration): Traceback (most recent call last): File "unicode.py", line 2, in <module> raise ValueError("unicode: \u0141\u0142") ValueError: unicode: \u0141\u0142 => correct Why does it work? It's because PyErr_Display() uses sys.stderr instead of sys.stdout and sys.stderr uses a different unicode error mechanism: >>> import sys >>> sys.stdout.errors 'strict' >>> sys.stderr.errors 'backslashreplace' Which is a great idea :-) You can try on Windows using the new attached file unicode.py.
msg74525 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月08日 11:15
@amaury: Oops, yes, I introduced a refleak in the version 4 with the PyUnicode_Check(). Instead of just moved Py_(X)RECREF(lineobj);, I could not not resist to refactor the code to remove one more indentation level (I prefer if (...) return; instead of if (...) { very long block; }). Changes in version 5: - rename 'namebuf' buffer to 'buf', it's used for the filename and to display the indentation space (strcpy(buf, ' ');). - move Py_DECREF(fob); at the end of the GetLine loop - return on lineobj error I think that the new version is easier to read than the current code because they are few indentation and no more local variables (if (...) { local var; ... })
msg74536 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月08日 17:17
The code is indeed easier to follow. I don't have any more remark, thanks to you perseverance! Now, is there some unit test we could provide? #2384 depends on this issue, it should be easy to extract a small test case.
msg74539 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月08日 17:37
My patch for #2384 contains a testcase which require #3975 and #2384 to be fixed (you have to apply both patches to test it).
msg74610 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月09日 23:39
Committed r66867, together with #2384. Thanks for your perseverance ;-)

History
Date	User	Action	Args
2022年04月11日 14:56:39	admin	set	github: 48225
2008年10月09日 23:39:24	amaury.forgeotdarc	set	status: open -> closed resolution: fixed messages: + msg74610
2008年10月08日 17:37:39	vstinner	set	messages: + msg74539
2008年10月08日 17:17:49	amaury.forgeotdarc	set	messages: + msg74536
2008年10月08日 11:16:04	vstinner	set	files: + traceback_unicode-5.patch
2008年10月08日 11:15:54	vstinner	set	files: - traceback_unicode-5.patch
2008年10月08日 11:15:34	vstinner	set	files: - traceback_unicode-4.patch
2008年10月08日 11:15:28	vstinner	set	files: + traceback_unicode-5.patch messages: + msg74525
2008年10月08日 10:51:41	vstinner	set	files: + unicode.py messages: + msg74524
2008年10月08日 00:12:54	vstinner	set	files: - traceback_unicode-3.patch
2008年10月08日 00:12:52	vstinner	set	files: - traceback_unicode-2.patch
2008年10月08日 00:12:49	vstinner	set	files: - traceback_unicode.patch
2008年10月08日 00:07:35	amaury.forgeotdarc	set	messages: + msg74499
2008年10月07日 23:44:00	amaury.forgeotdarc	link	issue2384 dependencies
2008年10月07日 22:45:52	vstinner	set	files: + traceback_unicode-4.patch messages: + msg74493
2008年10月07日 15:45:29	amaury.forgeotdarc	set	messages: + msg74460
2008年10月07日 15:00:32	vstinner	set	files: + traceback_unicode-3.patch
2008年10月07日 14:55:00	amaury.forgeotdarc	set	messages: + msg74455
2008年10月07日 13:50:10	vstinner	set	messages: + msg74450
2008年10月07日 13:05:58	amaury.forgeotdarc	set	messages: + msg74443
2008年10月07日 12:21:36	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg74441
2008年09月27日 15:27:30	vstinner	set	files: + traceback_unicode-2.patch messages: + msg73928
2008年09月26日 16:28:20	vstinner	set	files: + traceback_unicode.patch messages: + msg73862
2008年09月26日 16:27:52	vstinner	set	files: - traceback_unicode.patch
2008年09月26日 16:26:51	vstinner	set	files: + traceback_unicode.patch keywords: + patch messages: + msg73859
2008年09月26日 15:12:28	vstinner	create

homepage