homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python tokenizer rewriting
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: 26581 Superseder:
Assigned To: serhiy.storchaka Nosy List: Jim Fasarakis-Hilliard, brett.cannon, matrixise, pablogsal, python-dev, serhiy.storchaka, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2015年11月17日 01:27 by serhiy.storchaka, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tokenize_input.patch serhiy.storchaka, 2015年11月17日 01:27 review
Pull Requests
URL Status Linked Edit
PR 25050 merged pablogsal, 2021年03月28日 04:12
PR 25080 merged pablogsal, 2021年03月29日 21:53
Messages (7)
msg254778 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年11月17日 01:27
Here is preliminary patch that refactors the lowest level of Python tokenizer, reading and decoding. It splits the code on smaller simpler functions, decreases the source size by 37 lines, and fixes bugs: issue14811, issue18961, and a number of others. Added tests for most of fixed bugs (except leaks and others hardly reproducible). But the fix for other bugs can be harder, especially for issues with null byte (issue1105770, issue20115).
Many bug easily can be fixed if read all Python file in memory instead of reading it line by line. I don't know if it is acceptable.
msg255082 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2015年11月22日 06:29
Hi Serhiy,
Just of your information but I think you know that, the tests pass ;-)
[398/399] test_multiprocessing_spawn (138 sec) -- running: test_tools
(108 sec)
[399/399] test_tools (121 sec)
385 tests OK.
3 tests altered the execution environment:
 test___all__ test_site test_warnings
11 tests skipped:
 test_devpoll test_kqueue test_msilib test_ossaudiodev
 test_startfile test_tix test_tk test_ttk_guionly test_winreg
 test_winsound test_zipfile64
But I am interested by this part of CPython, I am not an expert in
lexing and parsing but how can I help you ? I am a novice in this
domain.
Stephane
msg255355 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年11月25日 14:17
"especially for issues with null byte"
I don't think that we should put to much energy in handling correctly NUL bytes. I see NUL bytes in code as bugs in the code, not in the Python parser. We *might* try to give warnings or better error messages to the user, that's all.
msg262091 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年03月20日 21:30
New changeset 23a7481eafd4 by Serhiy Storchaka in branch 'default':
Issues #25643, #26581: Added new tests for detecting Python source code encoding.
https://hg.python.org/cpython/rev/23a7481eafd4 
msg376742 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020年09月11日 22:10
@serhiy: did you still want to commit this?
msg389654 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021年03月28日 22:48
New changeset 261a452a1300eeeae1428ffd6e6623329c085e2c by Pablo Galindo in branch 'master':
bpo-25643: Refactor the C tokenizer into smaller, logical units (GH-25050)
https://github.com/python/cpython/commit/261a452a1300eeeae1428ffd6e6623329c085e2c
msg389692 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021年03月29日 12:28
Oh, 6 years to fix this bug. Better late than never ;-) Thanks for reporting and for fixing it!
History
Date User Action Args
2022年04月11日 14:58:23adminsetgithub: 69829
2021年04月13日 17:07:04vstinnerlinkissue14811 superseder
2021年03月29日 21:53:38pablogsalsetpull_requests: + pull_request23830
2021年03月29日 12:28:05vstinnersetmessages: + msg389692
2021年03月28日 22:49:06pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021年03月28日 22:48:13pablogsalsetmessages: + msg389654
2021年03月28日 04:12:55pablogsalsetkeywords: + patch
nosy: + pablogsal

pull_requests: + pull_request23799
stage: patch review
2020年09月11日 22:10:31brett.cannonsetmessages: + msg376742
2017年03月14日 14:57:52serhiy.storchakasetkeywords: - patch
versions: + Python 3.7, - Python 3.6
2017年03月14日 14:29:12Jim Fasarakis-Hilliardsetnosy: + Jim Fasarakis-Hilliard
2017年03月14日 13:52:27serhiy.storchakalinkissue3353 dependencies
2016年03月20日 21:30:29python-devsetnosy: + python-dev
messages: + msg262091
2016年03月17日 12:04:22serhiy.storchakasetdependencies: + Double coding cookie
2015年11月25日 14:17:10vstinnersetnosy: + vstinner
messages: + msg255355
2015年11月22日 06:29:50matrixisesetmessages: + msg255082
2015年11月22日 04:47:25matrixisesetnosy: + matrixise
2015年11月17日 17:42:47brett.cannonsetnosy: + brett.cannon
2015年11月17日 17:22:56yselivanovsetnosy: + yselivanov
2015年11月17日 01:27:33serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /