Message313168
| Author |
steve |
| Recipients |
ezio.melotti, steve, vstinner |
| Date |
2018年03月02日.23:32:49 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1520033569.84.0.467229070634.issue32987@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Here is an example involving the unicode character MIDDLE DOT · : The line
ab·cd = 7
is valid Python 3 code and is happily accepted by the CPython interpreter. However, tokenize.py does not like it. It says that the middle-dot is an error token. Here is an example you can run to see that:
import tokenize
from io import BytesIO
test = 'ab·cd = 7'.encode('utf-8')
x = tokenize.tokenize(BytesIO(test).readline)
for i in x: print(i)
For reference, the official definition of identifiers is:
https://docs.python.org/3.6/reference/lexical_analysis.html#identifiers
and details about MIDDLE DOT are at
https://www.unicode.org/Public/10.0.0/ucd/PropList.txt
MIDDLE DOT has the "Other_ID_Continue" property, so I think the interpreter is behaving correctly (i.e. consistent with the documented spec), while tokenize.py is wrong. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2018年03月02日 23:32:49 | steve | set | recipients:
+ steve, vstinner, ezio.melotti |
| 2018年03月02日 23:32:49 | steve | set | messageid: <1520033569.84.0.467229070634.issue32987@psf.upfronthosting.co.za> |
| 2018年03月02日 23:32:49 | steve | link | issue32987 messages |
| 2018年03月02日 23:32:49 | steve | create |
|