This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年09月07日 15:35 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| nonutf8_coding_line.py | serhiy.storchaka, 2013年09月07日 15:35 | |||
| Messages (6) | |||
|---|---|---|---|
| msg197169 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年09月07日 15:35 | |
Here is a file which is accepted by Python interpreter but is rejected by the tokenize module. $ ./python nonutf8_coding_line.py $ ./python -m tokenize nonutf8_coding_line.py nonutf8_coding_line.py: error: invalid or missing encoding declaration for 'nonutf8_coding_line.py' Python itself checks that a line is UTF-8 encoded only if not found a magic comment. The tokenize module checks it before searching (issue14629). |
|||
| msg197621 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2013年09月13日 17:48 | |
Which behavior do you propose to change? Does PEP263 specify the response to a self-contradictory encoding comment? What do you think it should say? I raising is the better behavior. Idle notices that it cannot save the file with iso8859-15 encoding, so it saves it with utf-8 encoding instead. Any Python-aware editor that recognizes and uses the encoding cookie would have to either do the same or refuse to save until the encoding specification were changed. Trying to run the file brings up a error box with SyntaxError: encoding problem: utf-8. Note that Idle runs a syntax check before creating a new subprocess, connecting to it, and trying to actually run the file. Closing the box takes one back to the editor as the spot where the error was noticed, which in this case is at the end of the line. |
|||
| msg197918 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2013年09月16日 16:44 | |
Terry: the comment isn't self-contradictory. Instead, the file constitutes a perfectly fine iso-8859-15 byte sequence (albeit a meaningless one: any byte sequence is perfectly fine iso-8559-15, and, in a comment, any characters are allowed by the Python syntax). AFAICT, the file also follows the wording of PEP 263, as it does not disallow additional text on the encoding line. Therefore, I think tokenize should accept the file. |
|||
| msg197925 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年09月16日 18:27 | |
What about first line? Currently both Python interpreter and the tokenize module decode it from UTF-8 (actually due to bug #18960 Python interprets it twice, in different encodings). PEP 263 says: 1. The complete Python source file should use a single encoding. Embedding of differently encoded data is not allowed and will result in a decoding error during compilation of the Python source code. I conclude that the first line should be decoded with the encoding specified in the second line. We first should read the first line, check if it isn't a comment or contains encoding cookie, otherwise read the second line, determine the encoding, and decode read lines. Perhaps it will untangle issue18960 too. |
|||
| msg197953 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2013年09月16日 23:45 | |
One issue at a time, please, and issue18960 is already its own issue. In any case, item "1" in "Concepts" of PEP 263 is clear that any deviation from the declared encoding should cause a decoding error. |
|||
| msg396104 - (view) | Author: Irit Katriel (iritkatriel) * (Python committer) | Date: 2021年06月18日 22:17 | |
I've reproduced the same in 3.11: > .\python.bat nonutf8_coding_line.py Running Release|x64 interpreter... > .\python.bat -m tokenize nonutf8_coding_line.py Running Release|x64 interpreter... nonutf8_coding_line.py: error: invalid or missing encoding declaration for 'nonutf8_coding_line.py' |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:50 | admin | set | github: 63161 |
| 2021年06月18日 22:17:37 | iritkatriel | set | nosy:
+ iritkatriel messages: + msg396104 versions: + Python 3.11, - Python 3.3, Python 3.4 |
| 2020年03月06日 20:30:24 | brett.cannon | set | nosy:
- brett.cannon |
| 2016年12月06日 12:14:59 | serhiy.storchaka | link | issue28884 dependencies |
| 2013年09月16日 23:45:25 | loewis | set | messages: + msg197953 |
| 2013年09月16日 18:27:41 | serhiy.storchaka | set | assignee: serhiy.storchaka messages: + msg197925 stage: needs patch |
| 2013年09月16日 16:44:24 | loewis | set | messages: + msg197918 |
| 2013年09月13日 22:05:35 | vstinner | set | nosy:
+ vstinner |
| 2013年09月13日 17:48:59 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg197621 |
| 2013年09月07日 15:36:32 | serhiy.storchaka | set | nosy:
+ meador.inge |
| 2013年09月07日 15:35:49 | serhiy.storchaka | create | |