Message390942
| Author |
eryksun |
| Recipients |
Andrew Ushakov, eryksun, serhiy.storchaka, terry.reedy |
| Date |
2021年04月13日.09:37:26 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1618306646.43.0.0976888456209.issue38755@roundup.psfhosted.org> |
| In-reply-to |
| Content |
> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS.
The issue is that the line length is limited to BUFSIZ, which ends up splitting the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's 8192 bytes in Linux, in which case you need a line that's 16 times longer in order to reproduce the error. For example:
$ stat -c "%s" test.py
8194
$ python3.9 test.py
SyntaxError: Non-UTF-8 code starting with '\xe2' in file
/home/someone/test.py on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR was recently merged into the main branch for 3.10a7+.
Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and 3.9. |
|