This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年08月30日 07:42 by flox, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (5) | |||
|---|---|---|---|
| msg115201 - (view) | Author: Florent Xicluna (flox) * (Python committer) | Date: 2010年08月30日 07:42 | |
from io import BytesIO
from tokenize import tokenize, tok_name
sample = 'éléphants = "un éléphant, deux éléphants, ..."\nprint(éléphants)\n'
sampleb = sample.encode('utf-8')
exec(sample)
# output: un éléphant, deux éléphants, ...
exec(sampleb)
# output: un éléphant, deux éléphants, ...
module = BytesIO()
module.write(sampleb)
module.seek(0)
for line in tokenize(module.readline):
print(tok_name[line.type], line)
# output:
ENCODING TokenInfo(type=57, string='utf-8', start=(0, 0), end=(0, 0), line='')
ERRORTOKEN TokenInfo(type=54, string='é', start=(1, 0), end=(1, 1), line='éléphants = "un éléphant, deux éléphants, ..."\n')
NAME TokenInfo(type=1, string='léphants', start=(1, 1), end=(1, 9), line='éléphants = "un éléphant, deux éléphants, ..."\n')
OP TokenInfo(type=53, string='=', start=(1, 10), end=(1, 11), line='éléphants = "un éléphant, deux éléphants, ..."\n')
STRING TokenInfo(type=3, string='"un éléphant, deux éléphants, ..."', start=(1, 12), end=(1, 46), line='éléphants = "un éléphant, deux éléphants, ..."\n')
NEWLINE TokenInfo(type=4, string='\n', start=(1, 46), end=(1, 47), line='éléphants = "un éléphant, deux éléphants, ..."\n')
NAME TokenInfo(type=1, string='print', start=(2, 0), end=(2, 5), line='print(éléphants)\n')
OP TokenInfo(type=53, string='(', start=(2, 5), end=(2, 6), line='print(éléphants)\n')
ERRORTOKEN TokenInfo(type=54, string='é', start=(2, 6), end=(2, 7), line='print(éléphants)\n')
NAME TokenInfo(type=1, string='léphants', start=(2, 7), end=(2, 15), line='print(éléphants)\n')
OP TokenInfo(type=53, string=')', start=(2, 15), end=(2, 16), line='print(éléphants)\n')
NEWLINE TokenInfo(type=4, string='\n', start=(2, 16), end=(2, 17), line='print(éléphants)\n')
ENDMARKER TokenInfo(type=0, string='', start=(3, 0), end=(3, 0), line='')
|
|||
| msg115218 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2010年08月30日 14:41 | |
r84364 |
|||
| msg240544 - (view) | Author: Joshua Landau (Joshua.Landau) * | Date: 2015年04月12日 06:08 | |
This doesn't seem to be a complete fix; the regex used does not include Other_ID_Start or Other_ID_Continue from https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers Hence tokenize does not accept '℘·'. Credit to modchan from http://stackoverflow.com/a/29586366/1763356. |
|||
| msg313846 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2018年03月15日 00:12 | |
Joshua opened #24194 as a duplicate of this because he could not reopen this. I am leaving it open as the superseder for this as Serhiy has already added two dependencies there, and because this seems to be a duplicate in turn of #1693050 (which I will close along with #32987). |
|||
| msg313847 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2018年03月15日 00:18 | |
Actually, #1693050 and #12731, about \w, are duplicates. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:05 | admin | set | github: 53921 |
| 2018年03月15日 00:18:48 | terry.reedy | set | messages: + msg313847 |
| 2018年03月15日 00:12:21 | terry.reedy | set | superseder: Make tokenize recognize Other_ID_Start and Other_ID_Continue chars messages: + msg313846 nosy: + terry.reedy |
| 2015年04月12日 06:08:01 | Joshua.Landau | set | nosy:
+ Joshua.Landau messages: + msg240544 versions: + Python 3.4 |
| 2010年08月30日 14:41:43 | benjamin.peterson | set | status: open -> closed nosy: + benjamin.peterson messages: + msg115218 resolution: fixed |
| 2010年08月30日 07:42:55 | flox | create | |