homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ast.literal_eval confused by coding declarations
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: davidhalter, jorgenschaefer, python-dev, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2014年08月17日 19:53 by jorgenschaefer, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
source_encoding_second_line-2.7.patch serhiy.storchaka, 2014年08月23日 07:44 review
Messages (8)
msg225464 - (view) Author: Jorgen Schäfer (jorgenschaefer) Date: 2014年08月17日 19:53
The ast module seems to get confused for certain strings which contain coding declarations.
>>> import ast
>>> s = u'"""\\\n# -*- coding: utf-8 -*-\n"""'
>>> print s
"""\
# -*- coding: utf-8 -*-
"""
>>> ast.literal_eval(s)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/forcer/Programs/Python/python2.7/lib/python2.7/ast.py", line 49, in literal_eval
 node_or_string = parse(node_or_string, mode='eval')
 File "/home/forcer/Programs/Python/python2.7/lib/python2.7/ast.py", line 37, in parse
 return compile(source, filename, mode, PyCF_ONLY_AST)
 File "<unknown>", line 0
SyntaxError: encoding declaration in Unicode string
msg225469 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年08月17日 20:31
eval() is affected too. 3.x isn't affected.
msg225701 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014年08月22日 19:59
This issue is about the SyntaxError message for eval functions, not the ast module per se. My first response is that the reported message is not a bug and that this issue should be closed as 'not a bug'.
(General reason) Trying to eval an expression preceded by a comment on its own line or followed by a comment works.
>>> eval("#before\n'string'#after")
'string'
Trying to eval a bare comment *is* a syntax error.
>>> eval("#comment\n")
...
SyntaxError: unexpected EOF while parsing
So the issue as presented is the special-case message. However, messages are not part of the language specification and improving them is often/usually/always? treated as an enhancement. Changing them will break code and tests that depend on the exact wording. 2.7 does not get enhancements.
(Specific reason) In 2.x, the input to (literal-)eval is either latin-1 encoded bytes or unicode. 'Latin-1' input could potentially consist of an encoding declaration on one line followed on the next line by a literal string encoded as indicated.
>>> le("# -*- coding: utf-8 -*-\n'string'")
'string'
Unicode input, the subject of this issue, is encoded to latin-1, which means that any literal string in the expression has to be latin-1 encoded. Therefore, a latin-1 encoding declaration is redundant and anything else is either redundant (if the original unicode only contains characters that encode the same in latin-1, as in the example above) or wrong, with hard to predict behavior. Someone thought it worthwhile to add the special case check. I think it should be left as is.
Jorgen, please either close this or explain why you think not, in light of the above.
msg225704 - (view) Author: Jorgen Schäfer (jorgenschaefer) Date: 2014年08月22日 20:27
I do not understand how your comments apply to this bug. There is no
comment anywhere. There is a single string literal whose contents look
like a comment. The expression parses correctly without syntax error if you
add a few newlines in front. Could you clarify your objection?
On Aug 22, 2014 9:59 PM, "Terry J. Reedy" <report@bugs.python.org> wrote:
>
> Terry J. Reedy added the comment:
>
> This issue is about the SyntaxError message for eval functions, not the
> ast module per se. My first response is that the reported message is not a
> bug and that this issue should be closed as 'not a bug'.
>
> (General reason) Trying to eval an expression preceded by a comment on its
> own line or followed by a comment works.
>
> >>> eval("#before\n'string'#after")
> 'string'
>
> Trying to eval a bare comment *is* a syntax error.
>
> >>> eval("#comment\n")
> ...
> SyntaxError: unexpected EOF while parsing
>
> So the issue as presented is the special-case message. However, messages
> are not part of the language specification and improving them is
> often/usually/always? treated as an enhancement. Changing them will break
> code and tests that depend on the exact wording. 2.7 does not get
> enhancements.
>
> (Specific reason) In 2.x, the input to (literal-)eval is either latin-1
> encoded bytes or unicode. 'Latin-1' input could potentially consist of an
> encoding declaration on one line followed on the next line by a literal
> string encoded as indicated.
>
> >>> le("# -*- coding: utf-8 -*-\n'string'")
> 'string'
>
> Unicode input, the subject of this issue, is encoded to latin-1, which
> means that any literal string in the expression has to be latin-1 encoded.
> Therefore, a latin-1 encoding declaration is redundant and anything else is
> either redundant (if the original unicode only contains characters that
> encode the same in latin-1, as in the example above) or wrong, with hard to
> predict behavior. Someone thought it worthwhile to add the special case
> check. I think it should be left as is.
>
> Jorgen, please either close this or explain why you think not, in light of
> the above.
>
> ----------
> nosy: +terry.reedy
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22221>
> _______________________________________
>
msg225716 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014年08月22日 22:48
[When responding, please do not quote more than a line or two. If responding by email, please delete the rest. Otherwise, the result is extra noise when viewing online.]
You are right, I missed the outer 's, though my examples are not completely irrelevant. Eval looks inside the inner quotes for a coding line in certain circumstances, or maybe it always looks and we do not notice when there is not problem. Here are some of my results on US Win 7, cp1252, 3.4.1, interactive prompt, idle
pass: eval(u'"""# -*- coding: utf-8 -*-\na"""')
fail: eval(u'"""\n# -*- coding: utf-8 -*-\na"""')
 since coding can be on line 1 or 2, these should be same
pass: eval(u'"""\n\n# -*- coding: utf-8 -*-\na"""')
 coding on 3rd line should be ignored
fail: eval(u'"""\\\n# -*- coding: utf-8 -*-\na"""')
 logically, this matches the first example; physically, the second
pass: eval(u'"""# -*- coding: utf-8 -*-\na€"""')
 but € prints as \xc2\x80', its utf-8 encoding as pasted in
From file, saved from Idle editor as cp1252
pass: print(eval("# -*- coding: utf-8 -*-\n'euro€'"))
 no u prefix, € prints as €
fail: print(eval(u"# -*- coding: utf-8 -*-\n'euro€'"))
Save the following two lines in one file as utf-8
pass: print(eval("# -*- coding: utf-8 -*-\n'euro€'"))
print(eval(u"# -*- coding: utf-8 -*-\n'euro∢'"))
 but € & ∢ print as '€' & '∢'
 adding # -*- coding: utf-8 -*- line makes no difference
 adding u prefix fails either way
msg225735 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年08月23日 07:44
This is the same issue as issue18960. Here is backported patch with additional test.
msg226404 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年09月05日 07:26
New changeset dd1e21f17b1c by Serhiy Storchaka in branch '2.7':
Issue #22221: Backported fixes from Python 3 (issue #18960).
http://hg.python.org/cpython/rev/dd1e21f17b1c 
msg226407 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年09月05日 08:11
New changeset 13cd8ea4cafe by Serhiy Storchaka in branch '3.4':
Issue #22221: Add tests for compile() with source encoding cookie.
http://hg.python.org/cpython/rev/13cd8ea4cafe
New changeset 9d335a54d728 by Serhiy Storchaka in branch 'default':
Issue #22221: Add tests for compile() with source encoding cookie.
http://hg.python.org/cpython/rev/9d335a54d728 
History
Date User Action Args
2022年04月11日 14:58:07adminsetgithub: 66417
2014年09月05日 08:28:34serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014年09月05日 08:11:59python-devsetmessages: + msg226407
2014年09月05日 07:26:24python-devsetnosy: + python-dev
messages: + msg226404
2014年08月23日 07:44:39serhiy.storchakasetfiles: + source_encoding_second_line-2.7.patch
keywords: + patch
messages: + msg225735

stage: needs patch -> patch review
2014年08月22日 22:48:40terry.reedysetmessages: + msg225716
2014年08月22日 20:27:04jorgenschaefersetmessages: + msg225704
2014年08月22日 19:59:06terry.reedysetnosy: + terry.reedy
messages: + msg225701
2014年08月17日 20:31:57serhiy.storchakasetassignee: serhiy.storchaka
type: behavior
components: + Interpreter Core, - Library (Lib)

nosy: + serhiy.storchaka
messages: + msg225469
stage: needs patch
2014年08月17日 20:11:27davidhaltersetnosy: + davidhalter
2014年08月17日 19:53:49jorgenschaefercreate

AltStyle によって変換されたページ (->オリジナル) /