<br><br><div class="gmail_quote">On Sat, Jun 23, 2012 at 3:19 AM, M Stefan <span dir="ltr"><<a href="mailto:mstefanro@gmail.com" target="_blank">mstefanro@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* UNION_FROZENSET: like UPDATE_SET, but create a new frozenset<br>
stack before: ... pyfrozenset mark stackslice<br>
stack after : ... pyfrozenset.union(stackslice)<br></blockquote><div><br></div><div>Since frozenset are immutable, could you explain how adding the UNION_FROZENSET opcode helps in pickling self-referential frozensets? Or are you only adding this one to follow the current style used for pickling dicts and lists in protocols 1 and onward? </div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
While this design allows pickling of self-referenti/Eal sets, self-referential<br>
frozensets are still problematic. For instance, trying to pickle `fs':<br>
a=A(); fs=frozenset([a]); a.fs = fs<br>
(when unpickling, the object a has to be initialized before it is added to<br>
the frozenset)<br>
<br>
The only way I can think of to make this work is to postpone<br>
the initialization of all the objects inside the frozenset until after UNION_FROZENSET.<br>
I believe this is doable, but there might be memory penalties if the approach<br>
is to simply store all the initialization opcodes in memory until pickling the frozenset is finished.<br>
<br></blockquote><div><br></div><div>I don't think that's the only way. You could also emit POP opcode to discard the frozenset from stack and then emit a GET to fetch it back from the memo. This is how we currently handle self-referential tuples. Check out the save_tuple method in pickle.py to see how it is done. Personally, I would prefer that approach because it already well-tested and proven to work.</div>
<div><br></div><div>That said, your approach sounds good too. The memory trade-off could lead to smaller pickles and more efficient decoding (though these self-referential objects are rare enough that I don't think that any improvements there would matter much).</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">While self-referential frozensets are uncommon, a far more problematic<br>
situation is with the self-referential objects created with REDUCE. While<br>
pickle uses the idea of creating empty collections and then filling them,<br>
reduce tipically creates already-filled objects. For instance:<br>
cnt = collections.Counter(); cnt[a]=3; a.cnt=cnt; cnt.__reduce__()<br>
(<class 'collections.Counter'>, ({<__main__.A object at 0x0286E8F8>: 3},))<br>
where the A object contains a reference to the counter. Unpickling an<br>
object pickled with this reduce function is not possible, because the reduce<br>
function, which "explains" how to create the object, is asking for the object<br>
to exist before being created.<br></blockquote><div><br></div><div>Your example seems to work on Python 3. I am not sure if I follow what you are trying to say. Can you provide a working example?</div><div><br></div><div>
$ python3</div><div><div>Python 3.1.2 (r312:79147, Dec 9 2011, 20:47:34) </div><div>[GCC 4.4.3] on linux2</div><div>Type "help", "copyright", "credits" or "license" for more information.</div>
<div>>>> import pickle, collections</div><div>>>> c = collections.Counter()</div><div>>>> class A: pass</div><div>... </div><div>>>> a = A()</div><div>>>> c[a] = 3</div><div>>>> a.cnt = c</div>
</div><div><div>>>> b =pickle.loads(pickle.dumps(a))</div></div><div>>>> b in b.cnt</div><div>True</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Pickle could try to fix this by detecting when reduce returns a class type as the first tuple arg and move the dict ctor parameter to the state, but this may not always be intended. It's also a bit strange that __getstate__ is never used anywhere in pickle directly.<br>
</blockquote><div><br></div><div>I would advise against any such change. The reduce protocol is already fairly complex. Further I don't think change it this way would give us any extra flexibility.</div><div><br></div>
<div>The documentation has a good explanation of how __getstate__ works under hood: </div><div><a href="http://docs.python.org/py3k/library/pickle.html#pickling-class-instances">http://docs.python.org/py3k/library/pickle.html#pickling-class-instances</a></div>
<div><br></div><div>And if you need more, PEP 307 (<a href="http://www.python.org/dev/peps/pep-0307/">http://www.python.org/dev/peps/pep-0307/</a>) provides some of the design rationales of the API.</div></div>