Issue 26843: tokenize does not include Other_ID_Start or Other_ID_Continue in identifier

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71030

classification

Title:	tokenize does not include Other_ID_Start or Other_ID_Continue in identifier
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	duplicate
Dependencies:	Superseder:	Make tokenize recognize Other_ID_Start and Other_ID_Continue chars View: 24194
Assigned To:	Nosy List:	Joshua.Landau, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2016年04月25日 01:58 by Joshua.Landau, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (3)
msg264145 - (view)	Author: Joshua Landau (Joshua.Landau) *	Date: 2016年04月25日 01:58
This is effectively a continuation of https://bugs.python.org/issue9712. The line in Lib/tokenize.py Name = r'\w+' must be changed to a regular expression that accepts Other_ID_Start at the start and Other_ID_Continue elsewhere. Hence tokenize does not accept '℘·'. See the reference here: https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers I'm unsure whether unicode normalization (aka the `xid` properties) needs to be dealt with too. Credit to toriningen from http://stackoverflow.com/a/29586366/1763356.
msg264156 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2016年04月25日 06:04
This is a duplicate of issue24194. Yes, there is no progress still.
msg264161 - (view)	Author: Joshua Landau (Joshua.Landau) *	Date: 2016年04月25日 08:03
Sorry, I'd stumbled on my old comment on the closed issue and completely forgot about the last time I did the same thing.

History
Date	User	Action	Args
2022年04月11日 14:58:30	admin	set	github: 71030
2016年04月25日 08:03:59	Joshua.Landau	set	messages: + msg264161
2016年04月25日 06:05:00	serhiy.storchaka	set	status: open -> closed superseder: Make tokenize recognize Other_ID_Start and Other_ID_Continue chars nosy: + serhiy.storchaka messages: + msg264156 resolution: duplicate stage: resolved
2016年04月25日 01:58:44	Joshua.Landau	create

homepage