This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年10月06日 21:09 by nedbat, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| bug16152_v33.patch | nedbat, 2012年10月07日 00:47 | Patch for 3.3 | review | |
| bug16152_v27.patch | nedbat, 2012年10月07日 00:48 | Patch for 2.7 | review | |
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 6572 | closed | lukasz.langa, 2018年04月23日 01:09 | |
| Messages (8) | |||
|---|---|---|---|
| msg172244 - (view) | Author: Ned Batchelder (nedbat) * (Python triager) | Date: 2012年10月06日 21:09 | |
When tokenizing with tokenize.generate_tokens, if the code ends with whitespace (no newline), the tokenizer produces an ERRORTOKEN for each space. Additionally, the regex that fails to find tokens in those spaces is linear in the number of spaces, so the overall performance is O(n**2).
I found this while tokenizing code samples uploaded to a public web site. One sample for some reason ended with 40,000 spaces, which was taking two hours to tokenize.
Demonstration:
{{{
import token
import tokenize
try:
from cStringIO import StringIO
except:
from io import StringIO
code = "@"+" "*10000
code_reader = StringIO(code).readline
for num, (ttyp, ttok, _, _, _) in enumerate(tokenize.generate_tokens(code_reader)):
print("%5d %15s %r" % (num, token.tok_name[ttyp], ttok))
}}}
|
|||
| msg172246 - (view) | Author: Ned Batchelder (nedbat) * (Python triager) | Date: 2012年10月06日 21:15 | |
Here's a patch for 3.3. I would like to also fix 2.7... |
|||
| msg172276 - (view) | Author: Ned Batchelder (nedbat) * (Python triager) | Date: 2012年10月07日 00:49 | |
Updated with new (better) patch, for v2.7 and v3.3. They are the same except for the test. |
|||
| msg172546 - (view) | Author: Jesús Cea Avión (jcea) * (Python committer) | Date: 2012年10月10日 01:04 | |
Ned, could you possibly send a Contributor Form Agreement? http://www.python.org/psf/contrib/ |
|||
| msg172548 - (view) | Author: Ned Batchelder (nedbat) * (Python triager) | Date: 2012年10月10日 02:05 | |
Jesús, done! |
|||
| msg174640 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月03日 15:51 | |
New changeset eb7ea51e658e by Ezio Melotti in branch '2.7': #16152: fix tokenize to ignore whitespace at the end of the code when no newline is found. Patch by Ned Batchelder. http://hg.python.org/cpython/rev/eb7ea51e658e New changeset 3ffff1798ed5 by Ezio Melotti in branch '3.2': #16152: fix tokenize to ignore whitespace at the end of the code when no newline is found. Patch by Ned Batchelder. http://hg.python.org/cpython/rev/3ffff1798ed5 New changeset 1fdeddabddda by Ezio Melotti in branch '3.3': #16152: merge with 3.2. http://hg.python.org/cpython/rev/1fdeddabddda New changeset ed091424f230 by Ezio Melotti in branch 'default': #16152: merge with 3.3. http://hg.python.org/cpython/rev/ed091424f230 |
|||
| msg174641 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年11月03日 15:53 | |
Fixed, thanks for the report and the patch! |
|||
| msg315641 - (view) | Author: Ned Batchelder (nedbat) * (Python triager) | Date: 2018年04月23日 01:26 | |
PR 6273 is mentioned, but I think 6573 is the correct number. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:36 | admin | set | github: 60356 |
| 2018年04月23日 01:26:42 | nedbat | set | messages: + msg315641 |
| 2018年04月23日 01:09:05 | lukasz.langa | set | pull_requests: + pull_request6273 |
| 2012年11月03日 15:53:58 | ezio.melotti | set | status: open -> closed type: behavior assignee: ezio.melotti versions: + Python 3.2 nosy: + ezio.melotti messages: + msg174641 resolution: fixed stage: patch review -> resolved |
| 2012年11月03日 15:51:38 | python-dev | set | nosy:
+ python-dev messages: + msg174640 |
| 2012年10月10日 02:05:40 | nedbat | set | messages: + msg172548 |
| 2012年10月10日 01:07:46 | jcea | set | versions: + Python 3.4 |
| 2012年10月10日 01:04:02 | jcea | set | nosy:
+ jcea messages: + msg172546 |
| 2012年10月07日 00:49:02 | nedbat | set | messages: + msg172276 |
| 2012年10月07日 00:48:04 | nedbat | set | files: + bug16152_v27.patch |
| 2012年10月07日 00:47:40 | nedbat | set | files: + bug16152_v33.patch |
| 2012年10月07日 00:44:39 | nedbat | set | files: - bug16152.patch |
| 2012年10月06日 21:15:50 | nedbat | set | files:
+ bug16152.patch keywords: + patch messages: + msg172246 |
| 2012年10月06日 21:12:12 | vstinner | set | nosy:
+ vstinner |
| 2012年10月06日 21:09:21 | nedbat | create | |