homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: support caching for 2to3
Type: enhancement Stage: test needed
Components: 2to3 (2.x to 3.x conversion tool) Versions: Python 3.2
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: benjamin.peterson Nosy List: benjamin.peterson, eric.araujo, ferringb, loewis
Priority: low Keywords: patch

Created on 2010年02月01日 00:59 by ferringb, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
allow_alternate_output_tools.patch ferringb, 2010年02月01日 00:59 allow alternate RefactoringTools w/in main.main
2to3-caching.patch ferringb, 2010年02月01日 09:35 2to3-caching.patch
Messages (14)
msg98639 - (view) Author: Ferringb (ferringb) * Date: 2010年02月01日 00:59
Bit like unittest, right now it's rather hard to extend 2to3 for caching support (and other outputs) w/out duplicating the main function.
Attached is a patch that allows the refactoring tool class to be passed in- at the very least, this patch is enough to remove some of the nastier monkey patching tricks I had to level to inline caching support into 2to3.
Actual caching patch will follow shortly; roughly what it does is track the md5 of the original source and use that as a lookup to the transformed version.
That caching support isn't useful to the majority of users, but for developers w/ buildslaves it's a very useful reduction in runtime- for example, for 6 buildslaves I run for pkgcore (a py2k source that we translate upon install to py3k if the target is py3k), if the cache is primed this is a reduction of 20s to 1.8s. Two minutes cpu time across the slaves brought down to ~11s.
As said, I understand it's a corner case, so getting the caching into 2to3 directly may not be possible.
This initial patch however makes it *way* easier to do inline the caching into 2to3 without having to copy/paste main (or do some heinous monkeypatching).
Patch is against svn trunk; doesn't apply cleanly to 3.1/2.6, but that's just a minor modification to make it so.
msg98641 - (view) Author: Ferringb (ferringb) * Date: 2010年02月01日 01:24
Attached is a derivative of http://pkgcore.org/trac/pkgcore/browser/ferringb/snakeoil-dev/snakeoil/caching_2to3.py
As you can see in that function, some nastyness is required right now to slip it in w/out duplicating code (hence the first patch).
Patch attached here inlines the support directly into a new module lib2to3.caching, and derives on the fly a caching version of the RefactoringTool.
The on the fly bit is somewhat fancy I admit, but was done so that anyone using an alternate RefactoringTool can get the same benefits.
The sole con to this functionality is that it assumes the transformation will always be the same; in other words, it doesn't account for --fix/--nofix, nor changes in the transformation algorithms that would result in differing output.
Personally for my uses this was completely valid- when changing the python version (even a minor upgrade) on my buildslaves I'd wipe the caches to be safe anyways.
That said, if there *is* interest in getting caching into stdlib for this, I can poke through the code and incorporate awareness of nofix/fix/python version into the cache key. I skipped doing that however since I didn't think there was a chance in hell of caching going into mainline however ;)
Either way, for folks targeting py2k and using buildslaves to validate their code and the translation of said code for py3k, either of these patches really is needed to support caching and stop burning cycles.
msg98642 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010年02月01日 01:28
Not sure what your use case is, but I always call the refactor method of my subclassed RefactoringTool, instead of calling main. See distutils.util.run_2to3 for an example (and distutils.command.build_py.build_py_2to3 for an application of that).
msg98646 - (view) Author: Ferringb (ferringb) * Date: 2010年02月01日 03:22
@martin:
Yeah... that's probably the better approach, although seperation of processes (in terms of setup.py triggering conversion) has some benefits.
The reason I used process seperation and invocation of main was to protect my distutils extensions against having to know the innards of lib2to3- either it could invoke the normal 2to3 or invoke the caching version for the buildslaves. Essentially I wanted the caching version to behave the same as normal 2to3 invocations.
Not great I'll admit, but so it goes. I'd still like to see the caching integrated in some form w/in stdlib since it has definite benefits- and folk are more likely to use something from stdlib than go raiding from a backwater pkg like snakeoil. That said, we'll see what folks think ;)
msg98649 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010年02月01日 06:42
If you meant to propose a patch that does caching, you should include the actual caching code in the patch.
msg98651 - (view) Author: Ferringb (ferringb) * Date: 2010年02月01日 09:31
*Cough*.
Ya, going to blame the bugzie on that one. right, it's at fault rather than me being a complete freaking moron.
Patch attached, hopefully with far less idiocy than I've demonstrated thus far....
Pardon.
msg98652 - (view) Author: Ferringb (ferringb) * Date: 2010年02月01日 09:35
Related note, don't be drunk when posting the missing patch- sorry for the noise, here is the caching version,, daftly presumed the early patch contained lib2to3.caching
Now I'm going to go crawl in a corner, if you need me, I'll be there.
Doubly pardon.
As for the algo, sha1 might be wiser- as said, integrating fix/nofix is definitely needed, let alone writing tests. This is more a proof of concept I was using for speed reasons- assuming folk are game, I'm game to write the tests for it.
msg98653 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010年02月01日 09:53
Please submit a contributor agreement, see http://www.python.org/psf/contrib/
I'm still not clear on what the specific use case is: how *exactly* are you going to run 2to3 when you have that patch integrated? In particular, where do you put the output (3.x) files? I'd rather recommend to create a --outdir option (assuming that would also fit your needs).
I'm -0 on using hashing. The whole point of this caching is speed, so wasting time in recomputing hash codes is, well, wasted.
If you do think you need hashing, please read the source file in binary, rather than first decoding and then encoding it.
Don't use print(). If you must output progress, use the same approach as the rest of 2to3 (i.e. logging).
msg99070 - (view) Author: Ferringb (ferringb) * Date: 2010年02月08日 22:16
Contributor agreement form is enroute via snail mail... kind of wish y'all accepted pdf's via email though ;)
As for the print("cache hit"), that should be punted- not very useful in it's current form (doesn't indicate what 'hit') and slipped in accidentally.
msg99081 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010年02月09日 01:25
I'm apprehensive about accepting this patch for several reasons. First of all, it has no tests. Secondly, and more importantly, I dislike the hacks that it adds. I would prefer 2to3 to grow a more general plugin API, which people could use to add new fixer etc as well and customizes behavior like this.
msg99246 - (view) Author: Ferringb (ferringb) * Date: 2010年02月12日 01:05
@benjamin:
Tests can be written; the reason this patch doesn't bundle tests up front is that I wasn't going to burn the time till I knew they were needed since I expected the concept to require some debate.
As for the hacks angle, there isn't anything hackish there- hackish is what you have to do w/out either of these patches (http://www.pkgcore.org/trac/pkgcore/browser/snakeoil/snakeoil/caching_2to3.py). I understand not everyone likes classes mixed on the fly, but it's a perfectly valid technique w/ many, many valid uses.
Converting the innards of 2to3 over to a generalized plugin interface frankly is overkill from where I'm standing- being able to plugin new fixers is one thing (very useful in my opinion but orthogonal to what this ticke tis about), being able to plugin on the fly different refactoring outputers is another thing.
Personally what's in place w/in allow_alternate_output_tools.patch is more than enough for the crazy ideas I've got kicking around.
If you've got a specific complaint in the patch, an area that is dodgy/hacky, state it please- I'm more than open to actionable criticisms.
msg99333 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010年02月13日 23:05
2010年2月11日 Brian Harring <report@bugs.python.org>:
>
> Brian Harring <ferringb@gmail.com> added the comment:
>
> @benjamin:
>
> Tests can be written; the reason this patch doesn't bundle tests up front is that I wasn't going to burn the time till I knew they were needed since I expected the concept to require some debate.
>
> As for the hacks angle, there isn't anything hackish there- hackish is what you have to do w/out either of these patches (http://www.pkgcore.org/trac/pkgcore/browser/snakeoil/snakeoil/caching_2to3.py). I understand not everyone likes classes mixed on the fly, but it's a perfectly valid technique w/ many, many valid uses.
While it may be a valid pattern, I still think there are more elegant solutions.
>
> Converting the innards of 2to3 over to a generalized plugin interface frankly is overkill from where I'm standing- being able to plugin new fixers is one thing (very useful in my opinion but orthogonal to what this ticke tis about), being able to plugin on the fly different refactoring outputers is another thing.
I plan to work on this this spring or summer if I have time.
>
> Personally what's in place w/in allow_alternate_output_tools.patch is more than enough for the crazy ideas I've got kicking around.
>
> If you've got a specific complaint in the patch, an area that is dodgy/hacky, state it please- I'm more than open to actionable criticisms.
I think delegation would be more appropriate than adding mixin classes
at runtime.
msg120222 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010年11月02日 13:17
I'm going to reject this for now.
msg137845 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年06月07日 16:00
FTR: https://github.com/pv/lib2to3cache 
History
Date User Action Args
2022年04月11日 14:56:57adminsetgithub: 52074
2011年06月07日 16:00:49eric.araujosetnosy: + eric.araujo
messages: + msg137845
2010年11月02日 13:17:34benjamin.petersonsetstatus: open -> closed
resolution: rejected
messages: + msg120222
2010年02月13日 23:05:15benjamin.petersonsetmessages: + msg99333
2010年02月12日 01:05:58ferringbsetmessages: + msg99246
2010年02月09日 01:25:46benjamin.petersonsetpriority: normal -> low
keywords: - needs review
messages: + msg99081
2010年02月08日 22:16:32ferringbsetmessages: + msg99070
2010年02月07日 09:37:34georg.brandlsetassignee: benjamin.peterson

nosy: + benjamin.peterson
2010年02月01日 09:53:11loewissetmessages: + msg98653
2010年02月01日 09:35:44ferringbsetfiles: - 2to3-caching.patch
2010年02月01日 09:35:38ferringbsetfiles: - 2to3-caching.patch
2010年02月01日 09:35:03ferringbsetfiles: + 2to3-caching.patch

messages: + msg98652
2010年02月01日 09:31:30ferringbsetfiles: + 2to3-caching.patch

messages: + msg98651
2010年02月01日 06:42:00loewissetmessages: + msg98649
2010年02月01日 03:30:38brian.curtinsetpriority: normal
keywords: + needs review
stage: test needed
2010年02月01日 03:22:51ferringbsetmessages: + msg98646
2010年02月01日 01:28:47loewissetnosy: + loewis
messages: + msg98642
2010年02月01日 01:24:32ferringbsetfiles: + 2to3-caching.patch

messages: + msg98641
2010年02月01日 00:59:09ferringbcreate

AltStyle によって変換されたページ (->オリジナル) /