[Python-Dev] Small tweak to tokenize.py?

Phillip J. Eby pje at telecommunity.com
Thu Nov 30 19:22:57 CET 2006


At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
>I've got a small tweak to tokenize.py that I'd like to run by folks here.
>>I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
>and my approach is to build a full parse tree with annotations that
>show where the whitespace and comments go. I use the tokenize module
>to scan the input. This is nearly perfect (I can render code from the
>parse tree and it will be an exact match of the input) except for
>continuation lines -- while the tokenize gives me pseudo-tokens for
>comments and "ignored" newlines, it doesn't give me the backslashes at
>all (while it does give me the newline following the backslash).

The following routine will render a token stream, and it automatically 
restores the missing \'s. I don't know if it'll work with your patch, but 
perhaps you could use it instead of changing tokenize. For the 
documentation and examples, see:
http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text
def detokenize(tokens, indent=0):
 """Convert `tokens` iterable back to a string."""
 out = []; add = out.append
 lr,lc,last = 0,0,''
 baseindent = None
 for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
 # Insert trailing line continuation and blanks for skipped lines
 lr = lr or sr # first line of input is first line of output
 if sr>lr:
 if last:
 if len(last)>lc:
 add(last[lc:])
 lr+=1
 if sr>lr:
 add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines
 lc = 0
 # Re-indent first token on line
 if lc==0:
 if tok==INDENT:
 continue # we want to dedent first actual token
 else:
 curindent = len(line[:sc].expandtabs())
 if baseindent is None and tok not in WHITESPACE:
 baseindent = curindent
 elif baseindent is not None and curindent>=baseindent:
 add(' ' * (curindent-baseindent))
 if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
 add(' ' * indent)
 # Not at start of line, handle intraline whitespace by retaining it
 elif sc>lc:
 add(line[lc:sc])
 if val:
 add(val)
 lr,lc,last = er,ec,line
 return ''.join(out)


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /