homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pickle breakage with reduction of recursive structures
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder: Cannot pickle self-referencing sets
View: 9269
Assigned To: Nosy List: ajaksu2, alex, alexandre.vassalotti, bcroq, belopolsky, ddorfman, eltoder, herring, mstefanro, pitrou, rhettinger, zzzeek
Priority: normal Keywords: patch

Created on 2004年11月08日 08:06 by ddorfman, last changed 2022年04月11日 14:56 by admin.

Files
File name Uploaded Description Edit
redcycle.diff ddorfman, 2004年11月08日 08:06 Diff
issue1062277-test-py3k.diff belopolsky, 2010年07月15日 06:34 review
Messages (10)
msg47267 - (view) Author: Dima Dorfman (ddorfman) Date: 2004年11月08日 08:06
Fix problems related to reduce cycles during pickling. A
"reduce cycle" is what happens when a __reduce__
implementation returns an args that cycles back through the
object it tried to reduce. This can't work because the
unpickler has to call the constructor with those args, but
it doesn't yet have that object to be able to place inside
the args. There are two problems related to this:
 1. The standard reduce implementation for proto < 2 in
 copy_reg (_reduce_ex) doesn't correctly deal with
 recursive structures. reduce_2 in typeobject.c does the
 right thing by using listitems and dictitems. Fix
 _reduce_ex by making it do the same thing. This is okay
 for proto < 2 because those arguments are a pickler-
 side feature. Tested in test_stdreducecycle.
 2. Our pickle implementations don't check for reduce
 cycles. This is somewhat cosmetic except that stack
 overflow protection is imperfect (for cPickle), causing
 crashes, and some kinds of cycles trigger asserts (in
 pickle). Fixed in pickle and cPickle by introducing a
 reducing_now set; on entering save_reduce, the object
 id being saved must not be in that set or we've been
 called recursively while saving the callable or
 arguments. Tested in test_reduce_cycle.
This shouldn't change any semantics. That reduce shouldn't
introduce cycles through args isn't documented, but it can't
work any other way.
Possible improvement: If we want to support reducing of real
immutable containers we might have to relax the reduction
cycle test to give the cycle a chance of resolving itself
normally (by constructing a partial object to pass to the
constructor and filling it in later). I'm not sure if this
trouble is worth it just to avoid writing a
frozenset.__setstate__.
See also: http://mail.python.
org/pipermail/python-dev/2004-October/049714.html
It would be very good if someone familiar with pickle
reviewed this; I am still not very confident that I
completely understand all the issues.
msg47268 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004年11月09日 07:31
Logged In: YES 
user_id=80475
IMO, this is more of a feature request than a bug. Also, it
needs to be fully thought-out and discussed. Putting this
sort of thing in just before the Py2.4 release candidate is
probably not wise.
msg47269 - (view) Author: Dima Dorfman (ddorfman) Date: 2004年11月09日 09:06
Logged In: YES 
user_id=908995
Triggering assertions (pickle) and producing incorrect output
(cPickle) are certainly bugs, and being able to pickle a
recursive structure is not a feature request. The copy_reg
part of the patch is the real fix, and as it's almost a
translation of what reduce_2 already does, it should be safe
for 2.4. I agree that the rest of the patch--to check for
cycles during pickling--should wait until 2.5.
msg82105 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009年02月14日 18:45
Patch has test, can someone try it? I think it might be fixed already.
msg110350 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年07月15日 06:34
The issue is still present in py3k. Attaching an updated patch with tests only. Is this the same as issue998998?
msg110868 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年07月20日 05:38
As I explained in msg110617 under issue9269, it is possible that we can do better than simply detect reduce cycles and bail out.
I am torn between two options:
1. Reject this patch and wait until a proper solution is found.
2. Admit that better is the enemy of good and start with proper error reporting on reduce cycles by accepting this patch.
My only concern about this patch is that it is likely to affect performance because pickling will have to maintain the "reducing_now" dictionary that is parallel to the memo dictionary.
msg178449 - (view) Author: Eugene Toder (eltoder) * Date: 2012年12月29日 00:44
To recap, the issue is that pickle doesn't handle recursion via reduce arguments (i.e. arguments to the constructor function as returned in 2nd element of the tuple from __reduce__). This leads to 2 kind of effects:
class C:
 def __init__(self, x=None):
 self.x = x if x is not None else self
 def __reduce__(self):
 return C, (self.x,)
A. Recursion error:
>>> pickle.dumps(C())
Traceback (most recent call last):
 File "<pyshell#5>", line 1, in <module>
 pickle.dumps(C())
RuntimeError: maximum recursion depth exceeded while calling a Python object
This cannot be helped with the current reduce protocol. The error may be improved, but that's about it.
B. Duplication of object when unpickling:
>>> c = C([])
>>> c.x.append(c)
>>> c.x[0] is c
True
>>> c2 = pickle.loads(pickle.dumps(c))
>>> c2.x[0] is c2
False
This happens because list (or another recursion-friendly type) inside the problematic object handles recursion, but we still get the outer object, identical to the inner one.
This can be solved the same way as for tuple:
>>> t=([],1,2)
>>> t[0].append(t)
>>> t2 = pickle.loads(pickle.dumps(t))
>>> t2[0][0] is t2
True
>>> pickletools.dis(pickle.dumps(t))
 0: \x80 PROTO 3
 2: ] EMPTY_LIST
 3: q BINPUT 0
 5: h BINGET 0
 7: K BININT1 1
 9: K BININT1 2
 11: \x87 TUPLE3
 12: q BINPUT 1
 14: a APPEND
 15: K BININT1 1
 17: K BININT1 2
 19: 0 POP
 20: 0 POP
 21: 0 POP
 22: h BINGET 1
 24: . STOP
After pickling its elements tuple checks if it got into memo. If it did, this means it was pickled by one of the elements, so it POPs all elements from the stack and fetches itself via GET. This is somewhat inefficient, but probably the best it can do.
I suggest we do 3 things:
1. Improve the documentation for __reduce__ function. It should mention that all state that a) can potentially point back to the object and b) not strictly necessary in the constructor function should be passed via the 3rd element of __reduce__ tuple (aka state) instead of the 2nd element, and applied by __setstate__. This handles recursion in robust and optimal way.
2. Check that all built-in/standard types follow this advice. I see that Stefan Mihaila already fixed sets.
3. To fix case B above add the memo check from save_tuple to save_reduce. While at it, we can consider checking after pickling every element instead of after pickling all elements, so we reduce the number of POPs and the wastage they cause.
msg221898 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014年06月29日 20:25
Problem "B" has been resolved, but problem "A" is still there.
Python 3.4.1 (default, Jun 29 2014, 15:26:46)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class C:
... def __init__(self, x=None):
... self.x = x if x is not None else self
... def __reduce__(self):
... return C, (self.x,)
...
>>> import pickle
>>> pickle.dumps(C())
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded while calling a Python object
>>> c = C([])
>>> c.x.append(c)
>>> c.x[0] is c
True
>>> c2 = pickle.loads(pickle.dumps(c))
>>> c2.x[0] is c2
True
msg255046 - (view) Author: Davis Herring (herring) Date: 2015年11月21日 08:17
Re msg110868: it's impossible to resolve a __reduce__ loop that involves only immutable intermediate objects (including none at all):
class Direct:
 def __reduce__(self): return id,(self,) # obviously impossible
class Indirect:
 # Can't create either the new object or the tuple first!
 def __reduce__(self): return id,((self,),)
With pre-existing mutable objects, the same trick as for tuples certainly could be applied:
class Mutable:
 def __init__(self): self.bracketed=[self]
 # Create list, REDUCE, then populate list
 def __reduce__(self): return id,(self.bracketed,)
The way an analog to the tuple implementation would deal with this would be something like
id # the dummy "constructor" in this example
[] # empty list, memoized immediately as, say, #5
id # try again to store the Mutable
@5 # fetch from memo
T1 # make singleton tuple
R # reduce (have now succeeded in "making" the Mutable as, say, #20)
APP # to the list
T1 # finish the first __reduce__ attempt
POP # abandon it, since the object has been created
POP # abandon the "id" as well
@20 # fetch the previous answer
If the list were created directly in __reduce__, you'd still recurse infinitely trying to create each new list object. On the other hand, even complicated cases like this should work:
class Switch:
 def __reduce__(self): return id,(self.lst,)
a=Switch(); b=Switch()
a.list=[b,a]; b.list=[a,b]
where the pickling of, say, a would look something like
id
[] # -> #17
id # trying b now
[] # -> #42
id # trying a again
@17
T1
R # a done (#88)
APP
id # trying b again
@42
T1
R # b done (#101)
APP # b's list now populated
POP
POP # b abandoned
@101 # b fetched
APP # finally building a's list
@88 # a is long done
APP # a's list now populated
POP
POP # a abandoned
@88 # final value: a
Perhaps a technique for distinguishing these cases is to look for new objects having been created since the last time we visited an object. If there are none, we're in the immutable case and we lose. If there are yet more created between the second and third visits, on the other hand, we're generating more objects from __reduce__ every time and should again abort.
msg255067 - (view) Author: Eugene Toder (eltoder) * Date: 2015年11月21日 19:54
Davis, the possible cases (i.e. recursion via appropriate mutable objects) are already supported, and the impossible cases are, well, impossible. The only open issue is whether to implement better error handling for infinite recursion problems, instead of relying on built-in maximum recursion depth.
History
Date User Action Args
2022年04月11日 14:56:08adminsetgithub: 41150
2021年12月16日 23:07:00ajaksu2setversions: + Python 3.11, - Python 3.5
2015年11月21日 19:54:36eltodersetmessages: + msg255067
2015年11月21日 08:17:59herringsetnosy: + herring
messages: + msg255046
2014年06月29日 20:25:17belopolskysetassignee: belopolsky ->
messages: + msg221898
versions: + Python 3.5, - Python 2.7, Python 3.2, Python 3.3, Python 3.4
2012年12月29日 00:44:37eltodersetversions: + Python 3.3, Python 3.4, - Python 3.1
2012年12月29日 00:44:17eltodersetnosy: + pitrou, eltoder
messages: + msg178449
2012年07月27日 14:37:38pitrouunlinkissue9269 dependencies
2012年07月27日 12:56:12mstefanrosetnosy: + mstefanro
2012年07月27日 01:21:51zzzeeksetnosy: + zzzeek
2011年04月24日 18:34:59alexsetnosy: + alex
2011年03月08日 12:59:22bcroqsetnosy: + bcroq
2010年08月19日 18:26:57BreamoreBoysettype: enhancement -> behavior
versions: + Python 3.1
2010年07月20日 05:38:29belopolskysetsuperseder: Cannot pickle self-referencing sets
messages: + msg110868
2010年07月20日 05:11:35terry.reedysetnosy: rhettinger, belopolsky, ddorfman, ajaksu2, alexandre.vassalotti
components: + Library (Lib), - Interpreter Core
2010年07月16日 00:16:58belopolskylinkissue9269 dependencies
2010年07月15日 21:55:54belopolskylinkissue1581183 superseder
2010年07月15日 06:34:45belopolskysetfiles: + issue1062277-test-py3k.diff

assignee: belopolsky
versions: + Python 3.2
nosy: + belopolsky, alexandre.vassalotti

messages: + msg110350
stage: needs patch
2009年02月14日 18:45:34ajaksu2settype: enhancement
messages: + msg82105
nosy: + ajaksu2
versions: + Python 2.7
2004年11月08日 08:06:54ddorfmancreate

AltStyle によって変換されたページ (->オリジナル) /