homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: parser: store the filename as an unicode object
Type: Stage:
Components: Interpreter Core, Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, benjamin.peterson, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2010年12月28日 02:40 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
parser_filename_obj-3.patch vstinner, 2011年01月05日 04:26
Messages (9)
msg124755 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年12月28日 02:40
The Python parser stores the filename as a byte string. But it decodes the filename on error because most Python functions now use unicode strings. Instead of decoding the filename at error, which may raise a new error, I propose to decode the filename on the creation of the parser object and only store the filename as unicode.
This issue would prepare the last part of the full unicode support (#3080).
msg124823 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年12月28日 22:14
I like the idea, but I don't like the trend that parser code continues to diverge from pgen. I understand that most of the Python runtime is not available to pgen, but maybe a more elegant solution than changing the type conditional on PGEN can be found. For example, maybe filename could be decoded from FS encoding to UTF-8?
msg124826 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年12月28日 23:15
> maybe a more elegant solution than changing the type conditional 
> on PGEN can be found
In pgen, the filename is only used to display the following warning, in indenterror():
 <filename>: inconsistent use of tabs and spaces in indentation
In pratical, this warning never occurs on Grammar/Grammar: this file doesn't use indentation at all, only continuation lines.
A better solution is maybe just to drop the filename for pgen. Anyway, pgen only compiles *one* file (Grammar/Grammar), so we don't need the input filename ;-)
msg124827 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年12月28日 23:16
When testing my patch, I found and fixed two bugs in pgen:
 - r87557: PGEN was not defined to compile pgenmain.c and printgrammar.c
 - r87558: pgen error was ignored on "make Parser/pgen.stamp" (when executing pgen to compile the grammar)
msg124828 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年12月28日 23:32
Version 2 of the patch:
 - remove filename attribute from perrdetail and tok_state structure in PGEN mode, and add a comment to explain why
 - rename filename_obj to filename
 - indenterror() no longer print the input filename in PGEN mode
msg125302 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月04日 11:02
err_clear() should set err->filename to NULL.
msg125409 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月05日 04:26
Version 3 of the patch to fix also #9319.
msg130937 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年03月15日 01:02
@Benjamin: You told me that you don't want two versions of pgen, but I don't remember why. As my work on #3080 is mostly done, I now plan to patch the Python parser to store the filename as Unicode. So could you please review the patch attached to this issue?
msg132990 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年04月04日 23:48
New changeset 6e9dc970ac0e by Victor Stinner in branch 'default':
Issue #10785: Store the filename as Unicode in the Python parser.
http://hg.python.org/cpython/rev/6e9dc970ac0e 
History
Date User Action Args
2022年04月11日 14:57:10adminsetgithub: 54994
2011年04月04日 23:56:24vstinnersetstatus: open -> closed
resolution: fixed
2011年04月04日 23:48:20python-devsetnosy: + python-dev
messages: + msg132990
2011年03月15日 01:02:57vstinnersetnosy: belopolsky, vstinner, benjamin.peterson
messages: + msg130937
2011年01月06日 13:03:32pitrousetnosy: + benjamin.peterson
2011年01月05日 04:26:52vstinnersetfiles: - parser_filename_obj-2.patch
nosy: belopolsky, vstinner
2011年01月05日 04:26:50vstinnersetfiles: - parser_filename_obj.patch
nosy: belopolsky, vstinner
2011年01月05日 04:26:45vstinnersetfiles: + parser_filename_obj-3.patch
nosy: belopolsky, vstinner
messages: + msg125409
2011年01月04日 11:02:42vstinnersetnosy: belopolsky, vstinner
messages: + msg125302
versions: - Python 3.2
2010年12月28日 23:32:43vstinnersetfiles: + parser_filename_obj-2.patch
nosy: belopolsky, vstinner
messages: + msg124828
2010年12月28日 23:16:39vstinnersetnosy: belopolsky, vstinner
messages: + msg124827
2010年12月28日 23:15:11vstinnersetnosy: belopolsky, vstinner
messages: + msg124826
2010年12月28日 22:14:19belopolskysetnosy: + belopolsky
messages: + msg124823
2010年12月28日 02:50:16vstinnersetfiles: + parser_filename_obj.patch
2010年12月28日 02:49:34vstinnersetfiles: - parse_filename_obj.patch
2010年12月28日 02:40:20vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /