Message313850
| Author |
terry.reedy |
| Recipients |
Arfrever, docs@python, ezio.melotti, gvanrossum, mrabarnett, pitrou, tchrist, terry.reedy, vstinner |
| Date |
2018年03月15日.00:33:15 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1521073995.86.0.467229070634.issue12731@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Whatever I may have said before, I favor supporting the Unicode standard for \w, which is related to the standard for identifiers.
This is one of 2 issues about \w being defined too narrowly. I am somewhat arbitrarily closing #1693050 as a duplicate of this (fewer digits ;-).
There are 3 issues about tokenize.tokenize failing on valid identifiers, defined as \w sequences whose first char is an identifier itself (and therefore a start char). In msg313814 of #32987, Serhiy indicates which start and continue identifier characters are matched by \W for re and regex. I am leaving #24194 open as the tokenizer name issue. |
|