This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年09月26日 15:12 by vstinner, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| unicode.py | vstinner, 2008年10月08日 10:51 | |||
| traceback_unicode-5.patch | vstinner, 2008年10月08日 11:16 | New version of _Py_DisplaySourceLine() supporting coding header (patch version 5) | ||
| Messages (16) | |||
|---|---|---|---|
| msg73851 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年09月26日 15:12 | |
PyTraceBack_Print() doesn't take care of the "# coding: xxx" header of
a Python script. It calls _Py_DisplaySourceLine() which opens the file
as a byte stream (and not an unicode characters stream). Because of
this problem, the traceback maybe truncated or invalid. Example (write
it into a file and execute the file):
----
from sys import executable
from os import execvpe
filename = "pouet.py"
out = open(filename, "wb")
out.write(b"""# -*- coding: GBK -*-
print("--asc\xA1\xA7i--")
raise Exception("--asc\xA1\xA7i--")""")
out.close()
execvpe(executable, [executable, filename], None)
----
This issue depends on issue2384 (line number).
Note: Python 2.6 may also has the problem but it doesn't parse "#
coding: GBK". So it's a different problem (issue?).
|
|||
| msg73859 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年09月26日 16:26 | |
Here is a new version of _Py_DisplaySourceLine() using PyTokenizer_FindEncoding() to read the coding header, and PyFile_FromFd() to create an "unicode-awake" file. The code could be optimized, but it least it displays correctly the file line ;-) The code is based on the original _Py_DisplaySourceLine() and call_find_module() (import.c). Notes: * The code is young and new, it might be delayed until python 3.1 * Some functions may raise new exceptions (eg. MemoryError). I don't know how Python will react if an exception is raised during PyTraceBack_Print() ? * The return code is 0 for success, but is it -1 or 1 for an error? It looks like error is "result != 0", so both -1 and 1 should be valid. I used -1. |
|||
| msg73862 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年09月26日 16:28 | |
(oops, first patch included an useless whitespace change) |
|||
| msg73928 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年09月27日 15:27 | |
Ooops, my first version introduces a regression: if file open fails,
the traceback printing was stopped. Here is a new version of my patch
to support #coding: header in _Py_DisplaySourceLine(). It doesn't
print the line of file open fails, but continue to display the end of
the traceback.
But print still stops on PyFile_WriteObject() or PyFile_WriteString().
If PyFile fails, I guess that next print will also fails. (it's also
the current behaviour of PyTraceBack_Print).
Example:
----
Python 3.0rc1+ (py3k:66643M, Sep 27 2008, 17:11:51)
>>> raise Exception('err')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: err
----
The line is not displayed (why? no idea), but the exception
("Exception: err") is still displayed.
|
|||
| msg74441 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年10月07日 12:21 | |
> The line is not displayed (why? no idea) Well, it's difficult to re-open <stdin>... |
|||
| msg74443 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年10月07日 13:05 | |
The patch goes in the good direction, here are some remarks: - The two calls to open() are missing the O_BINARY flag, necessary on Windows to avoid newline translation. This may be important for some codecs. - the GIL should be released around calls to open(); searching the whole sys.path can be long. Besides this, the "char* filename" is relevant only on utf8-encoded filesystems, and will not work on windows with non-ascii filenames. But this is another issue. |
|||
| msg74450 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年10月07日 13:50 | |
Le Tuesday 07 October 2008 15:06:01 Amaury Forgeot d'Arc, vous avez écrit : > - The two calls to open() are missing the O_BINARY flag, necessary on > Windows to avoid newline translation. This may be important for some > codecs. Oops, I always forget Windows specific options :-/ > - the GIL should be released around calls to open(); searching the whole > sys.path can be long. Ok, O_BINARY and GIL fixed. I just realized that sys.path is considered as a bytes string (PyBytes_Check) whereas Python3 uses an unicode string! ['', '/home/haypo/prog/python-ptrace', (...)] >>> sys.path.append(b'bytes'); sys.path ['', '/home/haypo/prog/python-ptrace', '(...)', b'bytes'] Oops, so it's possible to mix strings and bytes. But the normal case is a list of unicode strings. Another fix is needed :-/ I don't know how to get into "if (fd < 0)" (code using sys.path). Any clue? > Besides this, the "char* filename" is relevant only on utf8-encoded > filesystems, and will not work on windows with non-ascii filenames. But > this is another issue. I don't know the Windows API enough to answer. Doesn't Python provide "high level" functions to open a file? |
|||
| msg74455 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年10月07日 14:54 | |
> Ok, O_BINARY and GIL fixed. which patch? ;-) > I just realized that sys.path is considered as a bytes string > (PyBytes_Check) whereas Python3 uses an unicode string! > > ['', '/home/haypo/prog/python-ptrace', (...)] > >>> sys.path.append(b'bytes'); sys.path > ['', '/home/haypo/prog/python-ptrace', '(...)', b'bytes'] > > Oops, so it's possible to mix strings and bytes. But the normal case is a list > of unicode strings. Another fix is needed :-/ You could do like the import function, which converts unicode with the fs encoding. See the code in import.c, below "PyList_GetItem(path, i)" > I don't know how to get into "if (fd < 0)" (code using sys.path). > Any clue? The .pyc stores the full path of the .py, at compilation time. If the directory is moved, or accessed with another name (via mounts or soft links), the traceback displays the filename as stored in the .pyc file, but the content can still be displayed. It does not work every time, though, for example when the directory is shared between Windows & Unix. > > Besides this, the "char* filename" is relevant only on utf8-encoded > > filesystems, and will not work on windows with non-ascii filenames. > > But this is another issue. > > I don't know the Windows API enough to answer. Windows interprets the char* variables with its system-set "mbcs" encoding. On my machine, it is equivalent to the cp1252 encoding (almost equivalent latin-1). Since the 'filename' comes from a call to _PyUnicode_AsString(tb->tb_frame->f_code->co_filename), it will be wrong with any accented letter. > Doesn't Python provide "high level" functions to open a file? io.open()? |
|||
| msg74460 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年10月07日 15:45 | |
I still have remarks about traceback_unicode-3.patch, that I did not see before: - (a minor thing) PyMem_FREE(found_encoding) could appear only once, just after PyFile_FromFd. - I feel it dangerous to call the PyUnicode_AS_UNICODE() macro without checking that PyFile_GetLine() actually returned a PyUnicode object. - If the "truncated" variable is NULL, an exception has been set; it is necessary to exit the function. |
|||
| msg74493 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年10月07日 22:45 | |
Thanks for your remarks amaury. I improved my patch: - PyMem_FREE(found_encoding) is called just after PyFile_FromFd() - Create static subfunction _Py_FindSourceFile(): find a file in sys.path - Consider that sys.path contains only unicode (and not bytes) string - Clear exception on "truncated = PyUnicode_FromUnicode(p, len);" error, but continue the execution - I added PyUnicode_check(lineobj) - Replace open(filename) by open(namebuf) (while searching the full path of the file) <= fix a regression introduced by my patch Should I stop on the first error instead of using PyErr_Clear()? I would like to display the traceback even if an function failed. Sum up of the patch version 4: - use open(O_RDONLY | O_BINARY) + PyFile_FromFd() instead of fopen() - open the file in unicode mode using the Python encoding found in the "#coding:..." header - consider sys.path as a list of unicode strings (and not of bytes strings) I used _PyUnicode_AsStringAndSize() to convert unicode to string to be consistent with tb_printinternal(). As you noticed, it uses UTF-8 which should is on Windows :-/ I propose to open a new issue when this one will be closed :-) |
|||
| msg74499 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年10月08日 00:07 | |
One last thing: there is a path where lineobj is not freed (when PyUnicode_Check(lineobj) is false); I suggest to move Py_XDECREF(lineobj) just before the final return statement. Reference counting is fun ;-) > Should I stop on the first error instead of using PyErr_Clear()? > I would like to display the traceback even if an function failed. You are right. Common failures should clear the error and return 0. In your patch, there is one remaining place, the call to PyFile_GetLine(). More fun will arise when my Windows terminal (encoding=cp1252) will try to display Chinese characters. Let's pretend this is yet another issue. |
|||
| msg74524 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年10月08日 10:51 | |
> More fun will arise when my Windows terminal (encoding=cp1252)
> will try to display Chinese characters. Let's pretend this is
> yet another issue.
I tried the patch using a script with unicode characters (character
not representable in ISO-8859-1 like polish characters ł and Ł).
Result in an UTF-8 terminal (my default locale):
Traceback (most recent call last):
File "unicode.py", line 2, in <module>
raise ValueError("unicode: Łł")
ValueError: unicode: Łł
=> correct
Result in an ISO-8859-1 terminal (I changed the encoding in my Konsole
configuration):
Traceback (most recent call last):
File "unicode.py", line 2, in <module>
raise ValueError("unicode: \u0141\u0142")
ValueError: unicode: \u0141\u0142
=> correct
Why does it work? It's because PyErr_Display() uses sys.stderr instead
of sys.stdout and sys.stderr uses a different unicode error mechanism:
>>> import sys
>>> sys.stdout.errors
'strict'
>>> sys.stderr.errors
'backslashreplace'
Which is a great idea :-)
You can try on Windows using the new attached file unicode.py.
|
|||
| msg74525 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年10月08日 11:15 | |
@amaury: Oops, yes, I introduced a refleak in the version 4 with the
PyUnicode_Check(). Instead of just moved Py_(X)RECREF(lineobj);, I
could not not resist to refactor the code to remove one more
indentation level (I prefer if (...) return; instead of if (...) {
very long block; }).
Changes in version 5:
- rename 'namebuf' buffer to 'buf', it's used for the filename and to
display the indentation space (strcpy(buf, ' ');).
- move Py_DECREF(fob); at the end of the GetLine loop
- return on lineobj error
I think that the new version is easier to read than the current code
because they are few indentation and no more local variables (if (...)
{ local var; ... })
|
|||
| msg74536 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年10月08日 17:17 | |
The code is indeed easier to follow. I don't have any more remark, thanks to you perseverance! Now, is there some unit test we could provide? #2384 depends on this issue, it should be easy to extract a small test case. |
|||
| msg74539 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年10月08日 17:37 | |
My patch for #2384 contains a testcase which require #3975 and #2384 to be fixed (you have to apply both patches to test it). |
|||
| msg74610 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年10月09日 23:39 | |
Committed r66867, together with #2384. Thanks for your perseverance ;-) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:39 | admin | set | github: 48225 |
| 2008年10月09日 23:39:24 | amaury.forgeotdarc | set | status: open -> closed resolution: fixed messages: + msg74610 |
| 2008年10月08日 17:37:39 | vstinner | set | messages: + msg74539 |
| 2008年10月08日 17:17:49 | amaury.forgeotdarc | set | messages: + msg74536 |
| 2008年10月08日 11:16:04 | vstinner | set | files: + traceback_unicode-5.patch |
| 2008年10月08日 11:15:54 | vstinner | set | files: - traceback_unicode-5.patch |
| 2008年10月08日 11:15:34 | vstinner | set | files: - traceback_unicode-4.patch |
| 2008年10月08日 11:15:28 | vstinner | set | files:
+ traceback_unicode-5.patch messages: + msg74525 |
| 2008年10月08日 10:51:41 | vstinner | set | files:
+ unicode.py messages: + msg74524 |
| 2008年10月08日 00:12:54 | vstinner | set | files: - traceback_unicode-3.patch |
| 2008年10月08日 00:12:52 | vstinner | set | files: - traceback_unicode-2.patch |
| 2008年10月08日 00:12:49 | vstinner | set | files: - traceback_unicode.patch |
| 2008年10月08日 00:07:35 | amaury.forgeotdarc | set | messages: + msg74499 |
| 2008年10月07日 23:44:00 | amaury.forgeotdarc | link | issue2384 dependencies |
| 2008年10月07日 22:45:52 | vstinner | set | files:
+ traceback_unicode-4.patch messages: + msg74493 |
| 2008年10月07日 15:45:29 | amaury.forgeotdarc | set | messages: + msg74460 |
| 2008年10月07日 15:00:32 | vstinner | set | files: + traceback_unicode-3.patch |
| 2008年10月07日 14:55:00 | amaury.forgeotdarc | set | messages: + msg74455 |
| 2008年10月07日 13:50:10 | vstinner | set | messages: + msg74450 |
| 2008年10月07日 13:05:58 | amaury.forgeotdarc | set | messages: + msg74443 |
| 2008年10月07日 12:21:36 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg74441 |
| 2008年09月27日 15:27:30 | vstinner | set | files:
+ traceback_unicode-2.patch messages: + msg73928 |
| 2008年09月26日 16:28:20 | vstinner | set | files:
+ traceback_unicode.patch messages: + msg73862 |
| 2008年09月26日 16:27:52 | vstinner | set | files: - traceback_unicode.patch |
| 2008年09月26日 16:26:51 | vstinner | set | files:
+ traceback_unicode.patch keywords: + patch messages: + msg73859 |
| 2008年09月26日 15:12:28 | vstinner | create | |