This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年09月28日 17:31 by Brian.Bossé, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue9974-2.txt | akuchling, 2012年11月04日 00:53 | review | ||
| Messages (13) | |||
|---|---|---|---|
| msg117538 - (view) | Author: Brian Bossé (Brian.Bossé) | Date: 2010年09月28日 17:31 | |
Executing the following code against a py file which contains line continuations generates an assert: import tokenize foofile = open(filename, "r") tokenize.untokenize(list(tokenize.generate_tokens(foofile.readline))) (note, the list() is important due to issue #8478) The assert triggered is: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\tokenize.py", line 262, in untokenize return ut.untokenize(iterable) File "C:\Python27\lib\tokenize.py", line 198, in untokenize self.add_whitespace(start) File "C:\Python27\lib\tokenize.py", line 187, in add_whitespace assert row <= self.prev_row AssertionError I have tested this in 2.6.5, 2.7 and 3.1.2. The line numbers may differ but the stack is otherwise identical between these versions. Example input code: foo = \ 3 If the assert is removed, the code generated is still incorrect. For example, the input: foo = 3 if foo == 5 or \ foo == 1 pass becomes: foo = 3 if foo == 5 orfoo == 1 pass which besides not having the line continuation, is functionally incorrect. I'm wrapping my head around the functionality of this module and am willing to do the legwork to get a fix in. Ideas on how to go about it are more than welcome. Ironic aside: this bug is present when tokenize.py itself is used as input. |
|||
| msg117811 - (view) | Author: Brian Bossé (Brian.Bossé) | Date: 2010年10月01日 16:09 | |
No idea if I'm getting the patch format right here, but tally ho! This is keyed from release27-maint Index: Lib/tokenize.py =================================================================== --- Lib/tokenize.py (revision 85136) +++ Lib/tokenize.py (working copy) @@ -184,8 +184,13 @@ def add_whitespace(self, start): row, col = start - assert row <= self.prev_row col_offset = col - self.prev_col + # Nearly all newlines are handled by the NL and NEWLINE tokens, + # but explicit line continuations are not, so they're handled here. + if row > self.prev_row: + row_offset = row - self.prev_row + self.tokens.append("\\\n" * row_offset) + col_offset = col # Recalculate the column offset from the start of our new line if col_offset: self.tokens.append(" " * col_offset) Two issues remain with this fix, both of which replace the assert with something functional but not exactly what the original text is: 1) Whitespace leading up to a line continuation is not recreated. The information required to do this is not present in the tokenized data. 2) If EOF happens at the end of a line, the untokenized version will have a line continuation on the end, as the ENDMARKER token is represented on a line which does not exist in the original. I spent some time trying to get a unit test written that demonstrates the original bug, but it would seem that doctest (which test_tokenize uses) cannot represent a '\' character properly. The existing unit tests involving line continuations pass due to the '\' characters being interpreted as ERRORTOKEN, which is not as they're done when read from file or interactive prompt. |
|||
| msg117859 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2010年10月02日 04:00 | |
Interesting, is that a separate defect of doctest? |
|||
| msg119083 - (view) | Author: nick caruso (nick.caruso) | Date: 2010年10月18日 21:22 | |
--------------------------
import StringIO
import tokenize
tokens = []
def fnord(*a):
tokens.append(a)
tokenize.tokenize(StringIO.StringIO("a = 1").readline, fnord)
tokenize.untokenize(tokens)
----------------------------------
Generates the same assertion failure, for what it's worth. No line continuation needed.
This does not happen in 2.5 on my machine.
|
|||
| msg119084 - (view) | Author: nick caruso (nick.caruso) | Date: 2010年10月18日 21:28 | |
Additionally, substituting "a=1\n" for "a=1" results in no assertion and successful "untokenizing" to "a = 1\n" |
|||
| msg119129 - (view) | Author: Brian Bossé (Brian.Bossé) | Date: 2010年10月19日 10:11 | |
Yup, that's related to ENDMARKER being tokenized to its own line, even if EOF happens at the end of the last line of actual code. I don't know if anything relies on that behavior so I can't really suggest changing it. My patch handles the described situation, albeit a bit poorly as I mentioned in comment 2. |
|||
| msg120040 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2010年10月31日 08:21 | |
> My patch handles the described situation, albeit a bit poorly ... Let us know when you've got a cleaned-up patch and have run the round-trip tests on a broad selection of files. For your test case, don't feel compelled to use doctest. It's okay to write a regular unittest and add that to the test suite. |
|||
| msg174728 - (view) | Author: A.M. Kuchling (akuchling) * (Python committer) | Date: 2012年11月04日 00:53 | |
I looked at this a bit and made a revised version of the patch that doesn't add any line continuations when the token is ENDMARKER. It works on the example program and a few variations I tried, though I'm not convinced that it'll work for all possible permutations of line continuations, whitespace, and ENDMARKER. (I couldn't find one that failed, though.) Is this worth pursuing? I could put together the necessary test cases. |
|||
| msg197229 - (view) | Author: Dwayne Litzenberger (DLitz) | Date: 2013年09月08日 07:24 | |
@amk: I'd appreciate it if you did. :) I ran into this bug while writing some code that converts b"..." into "..." in PyCrypto's setup.py script (for backward compatibility with Python 2.5 and below). |
|||
| msg209945 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年02月02日 04:31 | |
One could argue that "The guarantee applies only to the token type and token string as the spacing between tokens (column positions) may change." covers merging of lines, but total elimination of needed whitespace is definitely a bug. |
|||
| msg211548 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年02月18日 20:34 | |
The \ continuation bug is one of many covered by #12691 and its patch, but this came first and it focused on only this bug. With respect to this issue, the code patches are basically the same; I will use tests to choose between them. On #12691, Gareth notes that the 5-tuple mode that uses add-whitespace is under tested, so care is needed to not break working uses. Adding a new parameter to a function is a new feature. I will check on pydev that no one objects to calling Untokenizer a private implementation detail. |
|||
| msg212058 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2014年02月24日 04:40 | |
New changeset 0f0e9b7d4f1d by Terry Jan Reedy in branch '2.7': Issue #9974: When untokenizing, use row info to insert backslash+newline. http://hg.python.org/cpython/rev/0f0e9b7d4f1d New changeset 24b4cd5695d9 by Terry Jan Reedy in branch '3.3': Issue #9974: When untokenizing, use row info to insert backslash+newline. http://hg.python.org/cpython/rev/24b4cd5695d9 |
|||
| msg212060 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年02月24日 04:55 | |
I added 5-tuple mode to roundtrip() in #20750. I solved the ENDMARKER problem by breaking out of the token loop if and when it appears. Reconstructing trailing whitespace other than \n is hopeless. The roundtrip test currently only tests equality of token sequences. But my own tests show that code with backslash-newline is reconstructed correctly as long as there is no space before it and something other than ENDMARKER after it. I discovered that tokenize will tokenize '\\' but not '\\\n'. So the latter will never appear as tokenizer output. Even if we did use ENDMARKER to create the latter, it would fail the current roundtrip test. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:07 | admin | set | github: 54183 |
| 2014年02月24日 04:55:49 | terry.reedy | set | status: open -> closed resolution: fixed messages: + msg212060 stage: patch review -> resolved |
| 2014年02月24日 04:40:41 | python-dev | set | nosy:
+ python-dev messages: + msg212058 |
| 2014年02月18日 20:34:43 | terry.reedy | set | priority: low -> normal messages: + msg211548 |
| 2014年02月18日 03:27:06 | terry.reedy | set | assignee: terry.reedy |
| 2014年02月02日 04:31:02 | terry.reedy | set | versions:
+ Python 3.3, Python 3.4, - Python 2.6, Python 3.1 nosy: + terry.reedy messages: + msg209945 stage: patch review |
| 2013年09月08日 07:24:48 | DLitz | set | nosy:
+ DLitz messages: + msg197229 |
| 2012年12月11日 06:09:15 | meador.inge | set | nosy:
+ meador.inge |
| 2012年12月03日 19:49:50 | sfllaw | set | nosy:
+ sfllaw |
| 2012年11月10日 02:07:14 | eric.snow | set | nosy:
+ eric.snow |
| 2012年11月04日 00:53:26 | akuchling | set | files:
+ issue9974-2.txt nosy: + akuchling messages: + msg174728 |
| 2012年11月04日 00:03:15 | akuchling | link | issue14713 superseder |
| 2010年10月31日 08:21:50 | rhettinger | set | priority: normal -> low assignee: rhettinger -> (no value) messages: + msg120040 |
| 2010年10月19日 10:11:02 | Brian.Bossé | set | messages: + msg119129 |
| 2010年10月18日 21:28:18 | nick.caruso | set | messages: + msg119084 |
| 2010年10月18日 21:22:31 | nick.caruso | set | nosy:
+ nick.caruso messages: + msg119083 |
| 2010年10月02日 04:00:05 | kristjan.jonsson | set | messages: + msg117859 |
| 2010年10月01日 20:26:11 | rhettinger | set | assignee: rhettinger nosy: + rhettinger |
| 2010年10月01日 16:09:20 | Brian.Bossé | set | messages: + msg117811 |
| 2010年09月29日 01:23:01 | kristjan.jonsson | set | nosy:
+ kristjan.jonsson |
| 2010年09月28日 17:31:04 | Brian.Bossé | create | |