homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: The protocol> 0 of cPickle does not given stable dictionary values
Type: behavior Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: jcea, kayhayen, pitrou
Priority: normal Keywords:

Created on 2012年01月08日 03:38 by kayhayen, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
stream.py kayhayen, 2012年01月08日 03:38
stream.py kayhayen, 2012年01月08日 18:41
Messages (9)
msg150841 - (view) Author: Kay Hayen (kayhayen) Date: 2012年01月08日 03:38
Hello,
I am implementing a Python compiler (Nuitka) that is testing if when it compiles itself, it gives the same output. 
I have been using "protocol = 0" ever since with "pickle" module for historic reasons (gcc bug with raw strings lead me to believe it's better) and lately, I have changed to "protocol = 2" and cPickle. But then I noticed that my compile itself test now fail to give same code from pickling of dictionary constants.
Imanaged and isolated the issue, and it's a Python2.7 regression, Python2.6 is fine:
Observe this output from "cPickle.dumps" for a constant dictionary with one element:
Protocol 0 :
Dumping read const const stream "(dp1\nS'modules'\np2\nNs."
Dumping load const const stream "(dp1\nS'modules'\np2\nNs."
Dumping load const const stream "(dp1\nS'modules'\np2\nNs."
Protocol 1 :
Dumping read const const stream '}q\x01U\x07modulesq\x02Ns.'
Dumping load const const stream '}q\x01U\x07modulesNs.'
Dumping load const const stream '}q\x01U\x07modulesNs.'
Protocol 2 :
Dumping read const const stream '\x80\x02}q\x01U\x07modulesq\x02Ns.'
Dumping load const const stream '\x80\x02}q\x01U\x07modulesNs.'
Dumping load const const stream '\x80\x02}q\x01U\x07modulesNs.'
It seems that cPickle as of CPython2.7 does give a better stream for dictionaries it itself emitted. With CPython2.6 I observe no difference.
My work-around is to "re-stream", "dumps" -> "loads" -> "dumps" with CPython2.7 for the time being.
Can you either: Fix cPickle to treat the dictionaries the same, or enhance to core to produce the same dict as cPickle does? It appears at least some kind of efficiency might be missed out for marshall as well.
msg150871 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年01月08日 15:28
- Can you use pickletools.dis to examine what differs in the pickles' bytecode?
- Does it also apply to 3.3?
> It appears at least some kind of efficiency might be missed out for
> marshall as well.
marshal doesn't use the pickle protocols, so it's unlikely to be affected.
msg150891 - (view) Author: Kay Hayen (kayhayen) Date: 2012年01月08日 18:41
It seems that there is an extra "BINPUT 2", whatever it does. I am attaching a variant that does pickletools.dis on the 3 dumps.
Protocol 2 :
Dumping read const const stream '\x80\x02}q\x01U\x07modulesq\x02Ns.'
 0: \x80 PROTO 2
 2: } EMPTY_DICT
 3: q BINPUT 1
 5: U SHORT_BINSTRING 'modules'
 14: q BINPUT 2
 16: N NONE
 17: s SETITEM
 18: . STOP
highest protocol among opcodes = 2
Dumping load const const stream '\x80\x02}q\x01U\x07modulesNs.'
 0: \x80 PROTO 2
 2: } EMPTY_DICT
 3: q BINPUT 1
 5: U SHORT_BINSTRING 'modules'
 14: N NONE
 15: s SETITEM
 16: . STOP
highest protocol among opcodes = 2
Dumping load const const stream '\x80\x02}q\x01U\x07modulesNs.'
 0: \x80 PROTO 2
 2: } EMPTY_DICT
 3: q BINPUT 1
 5: U SHORT_BINSTRING 'modules'
 14: N NONE
 15: s SETITEM
 16: . STOP
highest protocol among opcodes = 2
msg150892 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年01月08日 18:46
Ah, right. BINPUT is used to memoize objects in case a restructive structure happens. You can use pickletools.optimize() to remove the useless BINPUTs out of a pickle stream.
Note that 3.x is more consistent and always emits the BINPUTs. It seems 2.x's cPickle (which is really a weird piece of code) may do some premature optimization here.
msg150893 - (view) Author: Kay Hayen (kayhayen) Date: 2012年01月08日 18:46
Sending my attached file "stream.py" through "2to3.py" it shows that CPython 3.2 doesn't exihibit the issue for either protocol, which may be because it's now "unicode" key, but as it's the only value I tried, I can't tell.
Hope this helps.
Regarding the "marshal", I presume, that somehow the dictionary when created via "marshal" (or compile, if no ".pyc" is involved?) first time is somehow less efficient to determine/stream that the one "cPickle" created.
My assumption is that somehow, when creating the dictionary from cPickle, it is different (has to be), and that's interesting. And of course the easier to stream dictionary may be the nicer one at runtime as well. But that is just my speculation.
Yours,
Kay
msg150894 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年01月08日 18:49
I think the "premature" optimization comes down to that code:
"""
static int
put(Picklerobject *self, PyObject *ob)
{
 if (Py_REFCNT(ob) < 2 || self->fast)
 return 0;
 return put2(self, ob);
}
"""
(put2() being the function which emits BINPUT)
When you unpickle a dict, its contents are created anew and therefore the refcount is 1 => next pickling avoids emitting a BINPUT.
msg150895 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年01月08日 19:02
> Regarding the "marshal", I presume, that somehow the dictionary when
> created via "marshal" (or compile, if no ".pyc" is involved?) first
> time is somehow less efficient to determine/stream that the one
> "cPickle" created.
I don't think so. marshal uses a much simpler and less general
serialization protocol, so it should theoretically (!) be faster as
well.
For example, I don't think marshal supports recursive structures.
msg150896 - (view) Author: Kay Hayen (kayhayen) Date: 2012年01月08日 19:03
I see, if it's refcount dependent, that explains why it changes from interpreter provided dictionary and self-created one.
So, I take, I should always call "pickletools.optimize( cPickle.dumps( value ))" then.
Thanks,
Kay
msg150897 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年01月08日 19:04
Closing the issue as it's really a quirk rather than a bug. Thanks for reporting, though.
History
Date User Action Args
2022年04月11日 14:57:25adminsetgithub: 57944
2012年01月09日 14:37:14jceasetnosy: + jcea
2012年01月08日 19:04:29pitrousetstatus: open -> closed
resolution: wont fix
messages: + msg150897

stage: resolved
2012年01月08日 19:03:16kayhayensetmessages: + msg150896
2012年01月08日 19:02:54pitrousetmessages: + msg150895
2012年01月08日 18:49:24pitrousetmessages: + msg150894
2012年01月08日 18:46:58kayhayensetmessages: + msg150893
2012年01月08日 18:46:52pitrousetmessages: + msg150892
2012年01月08日 18:41:41kayhayensetfiles: + stream.py

messages: + msg150891
2012年01月08日 15:28:30pitrousetnosy: + pitrou
messages: + msg150871
2012年01月08日 03:38:30kayhayencreate

AltStyle によって変換されたページ (->オリジナル) /