This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019年11月09日 12:26 by Andrew Ushakov, last changed 2022年04月11日 14:59 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tst112.py | Andrew Ushakov, 2019年11月09日 12:26 | |||
| Messages (7) | |||
|---|---|---|---|
| msg356298 - (view) | Author: Andrew Ushakov (Andrew Ushakov) | Date: 2019年11月09日 12:26 | |
Not very long unicode comment #, space and then 170 or more repetitions of the utf8 symbol ░ (b'\xe2\x96\x91'.decode()) # ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ causes syntax error: SyntaxError: Non-UTF-8 code starting with '\xe2' in file tst112.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Python file is attached. Second example is similar, but here unicode string with similar length is used as an argument of a print function. print('\n░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░') Similar Issue34979 was submitted one year ago... |
|||
| msg356709 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2019年11月15日 19:45 | |
I think that this should be closed as a duplicate of #34979 and this example posted there, with the OS and python version included. On Windows, with 3.7, 3.8.0, and master, neither the posted comment, the one in the file, not the initial statement in #34979 give the SyntaxError. |
|||
| msg356715 - (view) | Author: Andrew Ushakov (Andrew Ushakov) | Date: 2019年11月15日 20:16 | |
> On Windows, with 3.7, 3.8.0, and master, neither the posted comment, the one in the file, not the initial statement in #34979 give the SyntaxError. Just tried again on my corporate laptop with the downloaded file from this site: Microsoft Windows [Version 10.0.16299.1451] (c) 2017 Microsoft Corporation. All rights reserved. D:\Downloads>py Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> quit() D:\Downloads>py tst112.py File "tst112.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xe2' in file tst112.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details d:\Downloads>py -3.7 Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> quit() d:\Downloads>py -3.7 tst112.py File "tst112.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xe2' in file tst112.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details |
|||
| msg390931 - (view) | Author: Andrew Ushakov (Andrew Ushakov) | Date: 2021年04月13日 07:09 | |
Just tested again: D:\Downloads>py Python 3.9.4 (tags/v3.9.4:1f2e308, Apr 4 2021, 13:27:16) [MSC v.1928 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or"license" for more information. >>> quit() D:\Downloads>py tst112.py SyntaxError: Non-UTF-8 code starting with '\xe2' in file D:\Downloads\tst112.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS. |
|||
| msg390942 - (view) | Author: Eryk Sun (eryksun) * (Python triager) | Date: 2021年04月13日 09:37 | |
> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS. The issue is that the line length is limited to BUFSIZ, which ends up splitting the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's 8192 bytes in Linux, in which case you need a line that's 16 times longer in order to reproduce the error. For example: $ stat -c "%s" test.py 8194 $ python3.9 test.py SyntaxError: Non-UTF-8 code starting with '\xe2' in file /home/someone/test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR was recently merged into the main branch for 3.10a7+. Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and 3.9. |
|||
| msg391018 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年04月13日 23:52 | |
The bpo-14811 issue was fixed in Python 3.10 by bpo-25643, but is not fixed in Python 3.8 and 3.9. |
|||
| msg391019 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年04月13日 23:54 | |
In 2012, I wrote detect_truncate.patch in bpo-14811. Does someone want to convert it to a PR for Python 3.9? |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:23 | admin | set | github: 82936 |
| 2021年04月13日 23:54:10 | vstinner | set | messages: + msg391019 |
| 2021年04月13日 23:52:57 | vstinner | set | nosy:
+ vstinner messages: + msg391018 |
| 2021年04月13日 09:37:59 | eryksun | set | stage: test needed -> needs patch versions: - Python 3.7 |
| 2021年04月13日 09:37:26 | eryksun | set | nosy:
+ eryksun messages: + msg390942 |
| 2021年04月13日 07:09:58 | Andrew Ushakov | set | messages:
+ msg390931 versions: + Python 3.7, Python 3.9 |
| 2019年11月15日 20:16:55 | Andrew Ushakov | set | messages: + msg356715 |
| 2019年11月15日 19:45:45 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg356709 type: behavior stage: test needed |
| 2019年11月09日 12:43:01 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka |
| 2019年11月09日 12:26:48 | Andrew Ushakov | create | |