This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年03月13日 16:10 by kristjan.jonsson, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pickling.patch | kristjan.jonsson, 2012年03月13日 16:10 | review | ||
| pickling2.patch | kristjan.jonsson, 2012年03月27日 14:28 | review | ||
| Messages (25) | |||
|---|---|---|---|
| msg155626 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月13日 16:10 | |
A common theme in many talks last year about cloud computing was the need to suspend execution, pickle state, and resume it on a different node. This patch is the result of last year's stackless sprint at pycon, finally completed and submitted for review. Python does not currently support pickling of many run-time structures, but pickling for things like iterators is trivial. A large piece of Stackless' branch is to make sure that various run-time constructs are pickleable, including function objects. While this patch does not do that, it does add pickling for dictiter, and the lot. This makes it possible to have compilcated data sets, iterate through them, and pickle them in a semi-consumed state. Please note that a slight hack is needed to pickle some iterators. Many of these classes are namely "hidden" and there is no way to access their constructors by name. instead, an unpickling trick is to invoke "iter" on an object of the target type instead. Not the most elegant solution but I didn't want to complicate matters by adding iterator classes into namespaces. Where should stringiter live for example? Be a builtin like str? We also didn't aim to make all iterators copy.copy-able using the __reduce__ protocol. Some iterators actually use internal iterators themselves, and if a (non-deep) copy were to happen, we would have to shallow copy those internal objects. Instead, we just return the internal iterator object directly from __reduce__ and allow recursive pickling to proceed. |
|||
| msg155669 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2012年03月13日 21:24 | |
ISTM that a discussion on python-dev would be of value here. Iterators are a protocol, not a class; hence, the effort to make them all picklable is potentially endless. The effort would also always be incomplete because some iterators are difficult or inconvenient to pickle (esp. those with inputs from local or temporary resources). Experience with SQL hasn't shown a need to save partially consumed cursors and my experience with iterators indicates that the need for pickling would be rare or that is would distract from better solutions (perhaps message based or somesuch). The size of this patch is a hint that the idea is not a minor change and that it would add a maintenance burden. What is far from clear is whether it would have value for any real-world problems. |
|||
| msg155683 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月13日 22:27 | |
Sure, I'll start a discussion there, but at least I've gotten the patch in. The patch is smaller than it looks, most of it is tests. |
|||
| msg155685 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2012年03月13日 22:38 | |
The patch looks fine. If python-dev thinks that making-iterators-picklable is worth doing, I would support this going into Python 3.3 (no extra benefit will come from waiting). |
|||
| msg155704 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月14日 01:19 | |
Btw, is there some way I can make this patch easier to review? I haven't contributed much since the Hg switchover, can I make it so that people can do visual diff? |
|||
| msg155735 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2012年03月14日 07:27 | |
The "review" link next to the the patch file entry should already work and provide a nice visual diff + commenting interface. |
|||
| msg155775 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年03月14日 17:25 | |
I think PyAPI_FUNC(PyObject *) _PyIter_GetIter(const char *iter); has a confusing name for a convenience function which retrieves an attribute from the builtin module by name. Not sure what would be better. Maybe _PyIter_GetBuiltin(). |
|||
| msg155776 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月14日 17:30 | |
another trick has been suggested. For hidden iterator objects, such as stringiter, to actually put them in the "types" module.
in there, we could do something like:
#types.py
stringiter = iter('').__class__
and we would then change the name of the iterator in c to be "types.stringiter".
How does that sound? It _does_ make it necessary for the types module to be there to help with pickling. The _proper_ fix would be for e.g. stringiter to live in builtins, next to 'str' that it is iterating over. Any thoughts?
|
|||
| msg156688 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月24日 10:19 | |
sbt, I will fix the api name. Any other objections then? Leave it as it is with the "iter()" trick? |
|||
| msg156720 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2012年03月24日 22:40 | |
Has python-dev discussion been launched? It is far from clear that this is worth doing. Pickling runtime structures may be a normal use case for Stackless but isn't a normal use case for regular Python. Also, it seems pointless to start down this path because it will always be incomplete (i.e. pickling running generators, or socket streams, etc). It also seems to be at odds with the normal use case for passing around partially consumed iterators -- the laziness and memory friendliness is a desired feature; however, the patch pickles some iterators (such as dicts) by running them to completion and storing *all* of the results that would have been returned. |
|||
| msg156722 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年03月24日 23:51 | |
I think the "worth doing" argument doesn't really hold, given that it's done. The question at hand really is a) is the patch correct? b) can we commit to maintaining it, even as things around it may change? I'm not bothered with the patch being potentially incomplete: anybody wishing to pickle more things should contribute patches for that. As for a), I think we should give people some time to review, and then wait for beta releases to discover issues. If this is a rarely-used feature, the world won't end if it has bugs. As for b), I think the main issue is forward compatibility: will pickles created by 3.3 still be readable by future versions? |
|||
| msg156723 - (view) | Author: Michael Foord (michael.foord) * (Python committer) | Date: 2012年03月24日 23:55 | |
Yes there was a discussion on python-dev. Various people spoke in favour, no-one against: http://mail.python.org/pipermail/python-dev/2012-March/117566.html |
|||
| msg156731 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2012年03月25日 02:38 | |
Michael, thanks for the link. The email was clearer about its rationale than was listed here. When this patch gets applied, any discussion of it in the docs should be clear that generators aren't included and that pickling things like dict iterators entail running the iterator to completion and storing all of the results in a list. |
|||
| msg156749 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年03月25日 11:40 | |
> ... and that pickling things like dict iterators entail running the > iterator to completion and storing all of the results in a list. The thing to emphasise here is that pickling an iterator is "destructive": afterwards the original iterator will be "empty". I can't think of any other examples where pickling an object causes non-trivial mutation of that object. Come to think of it, doesn't copy.copy() delegate to __reduce__()/__reduce_ex__(). It would be a bit surprising if copy.copy(myiterator) were to consume myiterator. I expect copy.copy() to return an independent copy without mutating the original object. |
|||
| msg156751 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年03月25日 12:02 | |
> The thing to emphasise here is that pickling an iterator is > "destructive": afterwards the original iterator will be "empty". If you look at the patch it isn't (or shouldn't be). I agree with Raymond that accumulating dict and set iterators in a list is a bit weird. That said, with hash randomization, perhaps we can't do any better (the order of elements in the internal table depends on the process-wide hash seed). |
|||
| msg156756 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年03月25日 12:38 | |
> If you look at the patch it isn't (or shouldn't be). Sorry. I misunderstood when Raymond said "running the iterator to completion". |
|||
| msg156902 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月27日 10:57 | |
Okay, I'll go ahead, fix the 'iter()' trick api name and apply the patch. Then we'll see what happens :). Any suggestion towards what documentation changes are needed? I don't think the list of pickleable objects is made explicit anywhere, it is largely a trial and error thing. But obviously Misc/News is the prime candidate. Thanks |
|||
| msg156903 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年03月27日 11:03 | |
> Okay, I'll go ahead, fix the 'iter()' trick api name and apply the > patch. Then we'll see what happens :). Please wait for reviews. |
|||
| msg156904 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月27日 11:13 | |
Raymond had already reviewed it, and sbt. I wasn't aware of any more pending reviews, but I'll wait for yours, of course. |
|||
| msg156906 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年03月27日 11:15 | |
Well, please take a look at the review link. There are already some comments there. |
|||
| msg156915 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月27日 12:09 | |
Btw, regarding compatibility: The docs say " The pickle serialization format is guaranteed to be backwards compatible across Python releases." I take this to mean the serialization format itself. I don't think there is a broader guarantee that pickles generated by one version can be read by another, since objects can change their internal representation, types can even disappear or change. In that sense, pickles made by 2.7 can be read by 3.2, in the sense that they will correctly return an error when they can't construct a 'str' object. If I misunderstand things, then at least I think that the pickle documentation should be made clearer, that not only is the protocol supposed to be read, but also that entire pickles should always be readable _and_ instantiatable, by future versions |
|||
| msg156928 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年03月27日 14:28 | |
I've incorporated antoine's comments and the proposed internal function name into a new patch. A lot of the changes concerned thecking the type() of the unpickled iterator. Now, it wasn't a specific design goal to get the exact same objects back, only _equivalent_ objects. In particular, dicts and sets have problems, so a dictiter to a partially consumed dict cannot be pickled as it is. so, I've added type cases everywhere, but for those cases. Now, how important do you think type consistency is? when using iterators, does one ever look at it and test its type? if this is important, I _could_ take another look at dicts and seta and create fresh iterators to the dicts and sets made out of the remainder of the items, rather than iterators to lists. Any thoughs? |
|||
| msg156933 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年03月27日 15:10 | |
> Now, how important do you think type consistency is? when using > iterators, does one ever look at it and test its type? if this is > important, I _could_ take another look at dicts and seta and create > fresh iterators to the dicts and sets made out of the remainder of the > items, rather than iterators to lists. I think type consistency is important if it can be achieved reasonably simply. In the dict and set case, I'm not sure you can recreate the internal table in the same order (even accross interpreter restarts). In this case, you should just check that the unpickled data is a subclass of collections.abc.Iterator. |
|||
| msg157405 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年04月03日 09:56 | |
Good idea Antoine. So, I'll with your suggested fix to the unittests I'll commit this and then look on while Rome burns. |
|||
| msg157408 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年04月03日 11:12 | |
New changeset 4ff234337e24 by Kristján Valur Jónsson in branch 'default': Issue #14288: Serialization support for builtin iterators. http://hg.python.org/cpython/rev/4ff234337e24 New changeset 51c88d51aa4a by Kristján Valur Jónsson in branch 'default': Issue #14288: Modify Misc/NEWS http://hg.python.org/cpython/rev/51c88d51aa4a |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:27 | admin | set | github: 58496 |
| 2012年04月08日 17:07:43 | kristjan.jonsson | set | status: open -> closed resolution: fixed |
| 2012年04月03日 11:12:04 | python-dev | set | nosy:
+ python-dev messages: + msg157408 |
| 2012年04月03日 09:56:32 | kristjan.jonsson | set | messages: + msg157405 |
| 2012年03月27日 15:10:06 | pitrou | set | messages: + msg156933 |
| 2012年03月27日 14:29:08 | kristjan.jonsson | set | files:
+ pickling2.patch messages: + msg156928 |
| 2012年03月27日 12:09:01 | kristjan.jonsson | set | messages: + msg156915 |
| 2012年03月27日 11:15:29 | pitrou | set | messages: + msg156906 |
| 2012年03月27日 11:13:29 | kristjan.jonsson | set | messages: + msg156904 |
| 2012年03月27日 11:03:41 | pitrou | set | messages: + msg156903 |
| 2012年03月27日 10:57:31 | kristjan.jonsson | set | messages: + msg156902 |
| 2012年03月25日 21:31:46 | rhettinger | set | assignee: rhettinger -> |
| 2012年03月25日 12:38:21 | sbt | set | messages: + msg156756 |
| 2012年03月25日 12:02:37 | pitrou | set | messages: + msg156751 |
| 2012年03月25日 11:40:22 | sbt | set | messages: + msg156749 |
| 2012年03月25日 02:38:28 | rhettinger | set | messages: + msg156731 |
| 2012年03月24日 23:55:28 | michael.foord | set | messages: + msg156723 |
| 2012年03月24日 23:51:39 | loewis | set | messages: + msg156722 |
| 2012年03月24日 22:40:37 | rhettinger | set | messages: + msg156720 |
| 2012年03月24日 13:53:49 | pitrou | set | nosy:
+ pitrou |
| 2012年03月24日 10:19:21 | kristjan.jonsson | set | messages:
+ msg156688 versions: + Python 3.3, - Python 3.4 |
| 2012年03月14日 19:37:07 | rhettinger | set | assignee: rhettinger |
| 2012年03月14日 17:30:59 | kristjan.jonsson | set | messages: + msg155776 |
| 2012年03月14日 17:25:37 | sbt | set | nosy:
+ sbt messages: + msg155775 |
| 2012年03月14日 16:17:10 | rhettinger | set | messages: - msg155759 |
| 2012年03月14日 15:44:39 | rhettinger | set | messages: + msg155759 |
| 2012年03月14日 07:27:41 | georg.brandl | set | nosy:
+ georg.brandl messages: + msg155735 |
| 2012年03月14日 01:19:59 | kristjan.jonsson | set | messages: + msg155704 |
| 2012年03月13日 22:38:05 | rhettinger | set | messages: + msg155685 |
| 2012年03月13日 22:27:29 | kristjan.jonsson | set | messages: + msg155683 |
| 2012年03月13日 21:24:31 | rhettinger | set | nosy:
+ rhettinger messages: + msg155669 |
| 2012年03月13日 18:30:40 | jcea | set | nosy:
+ jcea |
| 2012年03月13日 16:10:19 | kristjan.jonsson | create | |