This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016年05月26日 13:13 by serhiy.storchaka, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| wordcode2.patch | serhiy.storchaka, 2016年05月26日 13:13 | review | ||
| word-jump-offsets.patch | serhiy.storchaka, 2016年05月28日 15:46 | review | ||
| wordcode-refactor.patch | serhiy.storchaka, 2016年06月08日 19:42 | review | ||
| wordcode-refactor2.patch | serhiy.storchaka, 2016年06月10日 18:54 | review | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 25069 | merged | Mark.Shannon, 2021年03月29日 15:17 | |
| PR 25172 | merged | Dennis Sweeney, 2021年04月04日 00:29 | |
| Messages (36) | |||
|---|---|---|---|
| msg266431 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年05月26日 13:13 | |
This is the second stage of converting to wordcode (issue26647). Proposed patch makes bytecodecode consisting of code units (16-bit words) instead of bytes. It includes following changes: * Changes meaning of jump offsets. They counts not bytes, but code units. This extends the range addressed by short commands (from 256 bytes to 256 words) and simplifies ceval.c. * Changes f_lasti, tb_lasti etc to count code units instead of bytes. * Changes disassembler to show addresses in code units, not bytes. * Refactores the code. These changes break compatibility (already broken by switching to 16-bit bytecode). The first one breaks compatibility with compiled bytecode and needs incrementing the magic number. |
|||
| msg266435 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年05月26日 13:29 | |
> Changes f_lasti, tb_lasti etc to count code units instead of bytes. I asked Demur to not break f_lasti. I don't understand if this change breaks applications using f_lasti or not. For example, asyncio/coroutines.py uses: if caller.f_code.co_code[caller.f_lasti] != _YIELD_FROM: value = value[0] Does this code still work with your change? Maybe this code is already broken by wordcode, but it doesn't really matter since it should only be used on the exact version 3.4.0. The code works around a bug in Python 3.4.0, fixed in Python 3.4.1 (issue #21209). Other known users of f_lasti are development tools like debuggers (pdb), profilers, code coverage, etc. We should check these tools. |
|||
| msg266446 - (view) | Author: Brett Cannon (brett.cannon) * (Python committer) | Date: 2016年05月26日 18:38 | |
So is avoiding changing f_lasti just to minimize breakage of tools? Aren't they going to have to update to support the wordcode changes anyway? |
|||
| msg266447 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年05月26日 18:55 | |
The patch contains the change of Lib/asyncio/coroutines.py. This is the only change in Python code besides the dis module. I can keep f_lasti to be twice the number of instructions, but this will complicate the patch. The simplest way perhaps is to convert this read-only attribute to the property that multiplies internal f_lasti by 2. |
|||
| msg266489 - (view) | Author: Philip Dubé (Demur Rumed) * | Date: 2016年05月27日 12:06 | |
https://github.com/search?q=f_lasti&type=Code Popular use of f_lasti is checking it for -1, checking the instruction at the byte offset of f_lasti, checking the argument with code[f_lasti+1] (Some bad code checking f_lasti+3 which'll break with 3.6) abarnert discussed how bytecode should be typed to Python code. Ideally it'd be typed as a "(instruction, arg)" tuple. He considered creating a "words" type similar to "bytes" but with 16 bit values. It's a bit niche to introduce a builtin for. So if the co_code object is remaining a bytes object then it seems intuitive to keep f_lasti as a bytes offset. Clashes with jump offsets no longer being a bytes offset even in Python code tho In reality most of the results on github all seem to be copying a few distinct uses. So maybe backwards compatibiltiy isn't so important Other search https://searchcode.com/?q=f_lasti&loc=0&loc2=10000&src=3&src=7&src=1&lan=19 doesn't produce many results either |
|||
| msg266490 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年05月27日 12:29 | |
> if the co_code object is remaining a bytes object then it seems intuitive to keep f_lasti as a bytes offset Right. > In reality most of the results on github all seem to be copying a few distinct uses. So maybe backwards compatibiltiy isn't so important Backwards compatibiltiy is important. |
|||
| msg266556 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年05月28日 14:53 | |
Here is a patch that implements only the first change -- makes jump offsets be in 16-bit units, not bytes. This is minimal change, it doesn't include refactoring. |
|||
| msg266604 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2016年05月29日 17:33 | |
I don't see how this is a simplification. The additional /2 and *2 on the affected lines makes the code a little harder to reason about and it loses some of the cleanness achieved by the last patch. To me, it also increases conceptual complexity because INSTR_OFFSET() no longer gives the byte address adjustment. |
|||
| msg266611 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年05月29日 18:40 | |
word-jump-offsets.patch doesn't simplify the code, but rather complicates it, because this is minimal patch that doesn't include the refactoring. The main benefit of this patch is that it extends the range addressed by short jump instruction. The other benefit is that the target of jump instruction is now always point at instruction boundary. wordcode2.patch includes refactoring and other changes: * Changes jump offsets. This extends addressed range and can make co_code more compact. * Changes co_lnotab. This can make co_lnotab more compact. * Changes f_lasti. * Changes the disassembler. * Refactors the C code (makes it codeunit-oriented instead of byte-oriented). Simplifies the code complicated by other changes. I can provide these steps by separate patches (word-jump-offsets.patch is the first of them), but every separate patch can temporary complicate the code. Three of these changes (except the disassembler changing and refactoring) break the compatibility of pyc-files and need incrementing the magic number. |
|||
| msg267880 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年06月08日 19:42 | |
Here is a patch that just refactors the code. It renames OPCODE and OPCODE to _Py_OPCODE and _Py_OPCODE and moves them to code.h for reusing in other files. Introduces _Py_CODEUNIT as an alias to unsigned short (if it is 16-bit, otherwise a compile error is raised). Makes compiler and peepholer to operate with _Py_CODEUNIT units instead of bytes. Replaces or scale magic numbers with sizeof(_Py_CODEUNIT). Adds fill_nops() for filling specified region with NOPs. Decreases memory consumption for peepholer (doesn't allocate memory for unused odd addresses). |
|||
| msg268132 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2016年06月10日 17:43 | |
I really don't think any of this should be done at all. The current code is clean and fast (and to my eyes at least is very readable). |
|||
| msg268142 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年06月10日 18:49 | |
This is not just about cleaning (to my eyes current code is not very readable, and I read it many times, perhaps more times than any other core developer in last months). There are other benefits. Changing jump offsets allows to get rid of EXTENDED_ARGs for the part of jump opcodes. Changing lnotab makes it more compact and allows the peepholer to optimize the code that it rejects now. Refactoring includes the change that decreases memory consumption of the peepholer (from 4 bytes per bytecode byte to 2 bytes per bytecode byte). Changing jump offsets together with changing f_lasti removes redundant multiplications and divisions by 2. Separate changes can complicate some parts of code, but next changes removes this complication. Only all changes together achieve maximal cleanness. I think that converting to wordcode is not complete without these changes. I approved the wordcode patch only having in mind following changes. It is more painless to make all changes in one Python release than break compatibility during few releases. |
|||
| msg268143 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年06月10日 18:54 | |
update patch replaces yet few magic constants. |
|||
| msg268145 - (view) | Author: Philip Dubé (Demur Rumed) * | Date: 2016年06月10日 19:06 | |
The patches LGTM & seem to be implementation of follow up ideas outlined in the first portion. It'd be good to verify that benchmarks are relatively unaffected |
|||
| msg275771 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年09月11日 10:48 | |
New changeset dd046963bd42 by Serhiy Storchaka in branch 'default': Issue #27129: Replaced wordcode related magic constants with macros. https://hg.python.org/cpython/rev/dd046963bd42 |
|||
| msg275773 - (view) | Author: Berker Peksag (berker.peksag) * (Python committer) | Date: 2016年09月11日 11:49 | |
From http://buildbot.python.org/all/builders/s390x%20Debian%203.x/builds/1811/steps/compile/logs/stdio (I also saw the same compile error on another Linux boxes) _freeze_importlib: Python/peephole.c:524: PyCode_Optimize: Assertion `((codestr[i]) >> 8) == 100' failed. Makefile:735: recipe for target 'Python/importlib.h' failed make: *** [Python/importlib.h] Aborted |
|||
| msg275780 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年09月11日 12:19 | |
New changeset b49a938eaa31 by Serhiy Storchaka in branch 'default': Fixed refactoring bug in dd046963bd42 (issue27129). https://hg.python.org/cpython/rev/b49a938eaa31 |
|||
| msg275781 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年09月11日 12:20 | |
Thanks Berker! |
|||
| msg276044 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年09月12日 13:44 | |
This issue can now be closed, no? |
|||
| msg276049 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年09月12日 14:00 | |
No, only the simpler and safer part was committed. I divided the original patch on four parts, but since the code was significantly evolved, they no longer applied clearly. |
|||
| msg276055 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年09月12日 14:55 | |
Serhiy Storchaka added the comment: > No, only the simpler and safer part was committed. I divided the original patch on four parts, but since the code was significantly evolved, they no longer applied clearly. Oh ok. I saw that a change was pushed, I didn't read the history of the issue. I should take a look after the beta1 release. |
|||
| msg315266 - (view) | Author: Kirill Balunov (godaygo) | Date: 2018年04月13日 21:08 | |
Hello, what is the future of this patch? Such a feeling that the transition to wordcode is still in some half-way state. |
|||
| msg389707 - (view) | Author: Mark Shannon (Mark.Shannon) * (Python committer) | Date: 2021年03月29日 15:03 | |
frame.f_lasti and traceback.tb_lasti are best left as byte offsets. There is no guarantee that we won't go back to variable length instructions. For example, a "LONG_JUMP" instruction which is 4 bytes long and takes a 3 byte offset might well be a worthwhile extension. However, changing bytecode offsets and the internal representation of frame.f_lasti will reduce the number of "EXTENDED_ARG"s by 60% or more and makes interpreter dispatch a tad more efficient. |
|||
| msg389991 - (view) | Author: Mark Shannon (Mark.Shannon) * (Python committer) | Date: 2021年04月01日 15:00 | |
New changeset fcb55c0037baab6f98f91ee38ce84b6f874f034a by Mark Shannon in branch 'master': bpo-27129: Use instruction offsets, not byte offsets, in bytecode and internally. (GH-25069) https://github.com/python/cpython/commit/fcb55c0037baab6f98f91ee38ce84b6f874f034a |
|||
| msg390017 - (view) | Author: David Bolen (db3l) * | Date: 2021年04月01日 23:17 | |
Note that this commit appears to be causing exceptions for the Win10 buildbot, failing the PyCode_Addr2Line assertion in codeobject.c line 1252. The assertion seems to pop up at differing points during each test run, but the builder has yet to complete a full test run successfully. |
|||
| msg390024 - (view) | Author: Mark Shannon (Mark.Shannon) * (Python committer) | Date: 2021年04月02日 00:18 | |
That assertion is correct, and hasn't changed. Do you have a traceback? The buildbot just shows the assertion message with no context. |
|||
| msg390025 - (view) | Author: David Bolen (db3l) * | Date: 2021年04月02日 00:58 | |
Unfortunately, not at the moment - what's in the buildbot log is what's available. The RTL assertion aborts the process. The tests involved (such as test_clinic) do seem reproducible in a few separate tries, though again, all they do is terminate. As the assertion should be correct, I'm guessing it's reflecting an earlier corruption. There's some other oddities, such as the "Leaf" related failures in test_peg_generator that showed up at the same time, in case that offers any hint. Since Leaf and StringLeaf are almost next to each other in grammar.py I can't see how it can be undefined. The worker only has the core build tools version of VS so can't directly debug this further locally. I can look into using a different machine to try to get some details, but I'm not sure as to timing. |
|||
| msg390082 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年04月02日 16:42 | |
I get two crashes on Windows with Python built in debug mode: * I got a crash. I wasn't sure if it was an issue of incremental build, so I rebuilt Python. * On a fresh build, Python crashed on the CALL_FUNCTION_KW opcode, when loading names, names was equal to 0xFFFFFFFFFFFFFFFF: names = POP(); assert(PyTuple_Check(names)); <=== HERE Moreover, f->f_code was equal to 0xCBCBCBCBCBCBCBCB. But it was really weird. I added assertion to ensure that f->f_code was not equal 0xCBCBCBCBCBCBCBCB: the assertion didn't fail. * I ran "git clean -fdx" and built again Python. This time, it went fine, moreover the whole test suite passed cleanly!? "== Tests result: SUCCESS ==" and "386 tests OK." I build Python with: PCbuild\build.bat -d -p x64 -e |
|||
| msg390085 - (view) | Author: David Bolen (db3l) * | Date: 2021年04月02日 18:13 | |
Ah, Victor, that helps. I was having trouble reproducing the problem on a different system. I was suspecting a small difference in compiler version, but I hadn't considered it being because I started fresh. From what I can see, a particular build tree can be successful if it remains on either side of the instruction commit, but you can't cross the boundary - in either direction. There's clearly some build artifact not being reset properly that gets out of sync. I've been able to create odd name errors even in older commits as long as the first one I build is at or after the instruction commit. But a pristine checkout on the buildbot of the latest master works, as does a git clean on the tree. I always use the buildbot scripts for building and they invoke a full clean first. So either the clean process on Windows is missing something, or there's an artifact that is supposed to be kept in sync in the source tree itself that isn't. I'm not familiar enough with the internals to guess at which yet. I suppose given that it's not a problem on other buildbots argues for the clean issue, although I suppose it could also be something that is only kept in the tree to benefit a subset of systems, like Windows. (While resetting the checkouts on the buildbot should therefore fix the current exceptions, I'm going to leave that alone for the moment, since that leaves it positioned to confirm any subsequent fix) |
|||
| msg390088 - (view) | Author: David Bolen (db3l) * | Date: 2021年04月02日 18:53 | |
I'm out of time for a bit, but it appears that the root issue is old pyc files in Tools/clinic/__pycache__ that aren't removed during a clean process, and appear to be the source of all of the errors. Manually pruning that folder fixes things. I believe the regular (non-Windows) makefile automatically prunes all __pycache__ folders in the tree during clean which is probably why that's not an issue on other systems. |
|||
| msg390099 - (view) | Author: David Bolen (db3l) * | Date: 2021年04月02日 20:59 | |
I've opened issue #43709 for fixing the buildbot clean script under Windows. It needs to clean the Tools and Parser trees, not just Lib (and there are a few other folders involved besides clinic) |
|||
| msg390151 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年04月03日 23:15 | |
> New changeset fcb55c0037baab6f98f91ee38ce84b6f874f034a by Mark Shannon in branch 'master': > bpo-27129: Use instruction offsets, not byte offsets, in bytecode and internally. (GH-25069) This change broke buildbots, please revert it to repair buildbots. More and more people are affected: https://bugs.python.org/issue43719 |
|||
| msg390152 - (view) | Author: David Bolen (db3l) * | Date: 2021年04月03日 23:24 | |
I don't think reverting the commit at this point would necessarily be helpful. While it might fix some systems, it could newly break anyone who happened to do their first build since the commit was in place. I didn't want to bug anyone over the weekend, but I've got a PR as part of issue #43709 that I believe would fix this going forward, if anyone with access might have an opportunity to review it. |
|||
| msg390157 - (view) | Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) | Date: 2021年04月03日 23:58 | |
I notice this in _bootstrap_external.py: the magic number did not get changed, only the comment: # Python 3.10a2 3433 (RERAISE restores f_lasti if oparg != 0) # Python 3.10a6 3434 (PEP 634: Structural Pattern Matching) # Python 3.10a7 3435 Use instruction offsets (as opposed to byte offsets). # # MAGIC must change whenever the bytecode emitted by the compiler may no # longer be understood by older implementations of the eval loop (usually # due to the addition of new opcodes). # # Whenever MAGIC_NUMBER is changed, the ranges in the magic_values array # in PC/launcher.c must also be updated. MAGIC_NUMBER = (3434).to_bytes(2, 'little') + b'\r\n' _RAW_MAGIC_NUMBER = int.from_bytes(MAGIC_NUMBER, 'little') # For import.c |
|||
| msg390178 - (view) | Author: Mark Shannon (Mark.Shannon) * (Python committer) | Date: 2021年04月04日 08:33 | |
New changeset c368ce74d2c9bcbf1ec320466819c2d4768252f7 by Dennis Sweeney in branch 'master': bpo-27129: Update magic numbers and bootstrapping for GH-25069 (GH-25172) https://github.com/python/cpython/commit/c368ce74d2c9bcbf1ec320466819c2d4768252f7 |
|||
| msg390242 - (view) | Author: Pablo Galindo Salgado (pablogsal) * (Python committer) | Date: 2021年04月05日 16:25 | |
Closing as this is fixed. Feel free to reopen if there is something missing |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:31 | admin | set | github: 71316 |
| 2021年04月05日 16:25:09 | pablogsal | set | status: open -> closed nosy: + pablogsal messages: + msg390242 resolution: fixed stage: patch review -> resolved |
| 2021年04月04日 08:33:31 | Mark.Shannon | set | messages: + msg390178 |
| 2021年04月04日 02:34:59 | ned.deily | set | priority: normal -> release blocker versions: + Python 3.10, - Python 3.6 |
| 2021年04月04日 00:29:35 | Dennis Sweeney | set | pull_requests: + pull_request23913 |
| 2021年04月03日 23:58:02 | Dennis Sweeney | set | nosy:
+ Dennis Sweeney messages: + msg390157 |
| 2021年04月03日 23:24:12 | db3l | set | messages: + msg390152 |
| 2021年04月03日 23:15:31 | vstinner | set | messages: + msg390151 |
| 2021年04月02日 20:59:25 | db3l | set | messages: + msg390099 |
| 2021年04月02日 18:53:37 | db3l | set | messages: + msg390088 |
| 2021年04月02日 18:13:52 | db3l | set | messages: + msg390085 |
| 2021年04月02日 16:42:31 | vstinner | set | messages: + msg390082 |
| 2021年04月02日 00:58:17 | db3l | set | messages: + msg390025 |
| 2021年04月02日 00:18:12 | Mark.Shannon | set | messages: + msg390024 |
| 2021年04月01日 23:17:45 | db3l | set | nosy:
+ db3l messages: + msg390017 |
| 2021年04月01日 15:00:41 | Mark.Shannon | set | messages: + msg389991 |
| 2021年03月29日 15:17:29 | Mark.Shannon | set | pull_requests: + pull_request23819 |
| 2021年03月29日 15:03:37 | Mark.Shannon | set | messages: + msg389707 |
| 2020年11月06日 19:37:26 | brett.cannon | set | nosy:
- brett.cannon |
| 2018年04月13日 21:08:16 | godaygo | set | nosy:
+ godaygo messages: + msg315266 |
| 2016年11月24日 22:20:31 | vstinner | unlink | issue26647 dependencies |
| 2016年09月12日 14:55:45 | vstinner | set | messages: + msg276055 |
| 2016年09月12日 14:00:22 | serhiy.storchaka | set | messages: + msg276049 |
| 2016年09月12日 13:44:00 | vstinner | set | messages: + msg276044 |
| 2016年09月11日 12:20:42 | serhiy.storchaka | set | messages: + msg275781 |
| 2016年09月11日 12:19:39 | python-dev | set | messages: + msg275780 |
| 2016年09月11日 11:49:30 | berker.peksag | set | nosy:
+ berker.peksag messages: + msg275773 |
| 2016年09月11日 10:48:51 | python-dev | set | nosy:
+ python-dev messages: + msg275771 |
| 2016年06月10日 19:06:44 | Demur Rumed | set | messages: + msg268145 |
| 2016年06月10日 18:54:37 | serhiy.storchaka | set | files:
+ wordcode-refactor2.patch messages: + msg268143 |
| 2016年06月10日 18:49:32 | serhiy.storchaka | set | messages: + msg268142 |
| 2016年06月10日 17:43:39 | rhettinger | set | messages: + msg268132 |
| 2016年06月08日 19:42:24 | serhiy.storchaka | set | files:
+ wordcode-refactor.patch messages: + msg267880 |
| 2016年06月04日 21:05:25 | serhiy.storchaka | set | nosy:
+ Mark.Shannon |
| 2016年06月04日 17:55:43 | eric.fahlgren | set | nosy:
+ eric.fahlgren |
| 2016年05月29日 18:40:47 | serhiy.storchaka | set | messages: + msg266611 |
| 2016年05月29日 18:25:53 | abarry | set | messages: - msg266607 |
| 2016年05月29日 18:25:41 | abarry | set | messages: - msg266608 |
| 2016年05月29日 18:24:18 | Demur Rumed | set | files: - forbegin.patch |
| 2016年05月29日 18:24:07 | Demur Rumed | set | messages: + msg266608 |
| 2016年05月29日 18:23:14 | Demur Rumed | set | files:
+ forbegin.patch messages: + msg266607 |
| 2016年05月29日 17:33:51 | rhettinger | set | nosy:
+ rhettinger messages: + msg266604 |
| 2016年05月28日 15:46:10 | serhiy.storchaka | set | files: + word-jump-offsets.patch |
| 2016年05月28日 15:45:53 | serhiy.storchaka | set | files: - word-jump-offsets.patch |
| 2016年05月28日 14:53:26 | serhiy.storchaka | set | files:
+ word-jump-offsets.patch messages: + msg266556 |
| 2016年05月27日 12:29:50 | vstinner | set | messages: + msg266490 |
| 2016年05月27日 12:06:00 | Demur Rumed | set | nosy:
+ Demur Rumed messages: + msg266489 |
| 2016年05月26日 18:55:33 | serhiy.storchaka | set | messages: + msg266447 |
| 2016年05月26日 18:38:31 | brett.cannon | set | messages: + msg266446 |
| 2016年05月26日 18:07:14 | brett.cannon | set | nosy:
+ brett.cannon |
| 2016年05月26日 13:29:29 | vstinner | set | messages: + msg266435 |
| 2016年05月26日 13:14:27 | serhiy.storchaka | link | issue26647 dependencies |
| 2016年05月26日 13:13:07 | serhiy.storchaka | create | |