Message141690
| Author |
gdr@garethrees.org |
| Recipients |
benjamin.peterson, daniel.urban, eric.snow, ezio.melotti, gdr@garethrees.org, r.david.murray, terry.reedy, vladris |
| Date |
2011年08月05日.21:11:50 |
| SpamBayes Score |
1.2516488e-11 |
| Marked as misclassified |
No |
| Message-id |
<1312578711.21.0.396413594125.issue12675@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Terry: agreed. Does anyone actually use this module? Does anyone know what the design goals are for tokenize? If someone can tell me, I'll do my best to make it meet them.
Meanwhile, here's another bug. Each character of trailing whitespace is tokenized as an ERRORTOKEN.
Python 3.3.0a0 (default:c099ba0a278e, Aug 2 2011, 12:35:03)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 23351500)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import tokenize,untokenize
>>> from io import BytesIO
>>> list(tokenize(BytesIO('1 '.encode('utf8')).readline))
[TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''), TokenInfo(type=2 (NUMBER), string='1', start=(1, 0), end=(1, 1), line='1 '), TokenInfo(type=54 (ERRORTOKEN), string=' ', start=(1, 1), end=(1, 2), line='1 '), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')] |
|