homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: cPickle - stored data differ for same dictionary
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Philipp.Mölders, Ramchandra Apte, alexandre.vassalotti, pitrou, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2011年07月20日 16:20 by Philipp.Mölders, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cPickletest.py Philipp.Mölders, 2011年07月20日 16:20 Sample script to show the bug
cPickletest2.py serhiy.storchaka, 2013年02月17日 22:27
Messages (11)
msg140750 - (view) Author: Philipp Mölders (Philipp.Mölders) Date: 2011年07月20日 16:20
I think there is a problem within cPickle. I wanted to store a dictionary with only one entry with cPickle.dump() this works fine and can be loaded with cPickle.load(). But if you store the loaded data with cPickle.dump() again, the stored data differ from the first stored data. But the load works fine only the written data on disk differ. I've written a sample script, that shows the problem within code. 
This problem occurs only in the 2.7 version of Python and only with dictionaries with one entry.
msg140751 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年07月20日 16:25
If the load produces the same result, why does it matter that what is on disk differs?
msg140752 - (view) Author: Philipp Mölders (Philipp.Mölders) Date: 2011年07月20日 16:34
The file on disk matters for a replication service, so if a file is touched but not changed it will not be replicated, but in this special case the data change even when the structures have not changed. So if this happens very often it could cause a lot of replication which is not needed because nothing changed.
msg181586 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年02月07日 09:42
As soon as hash randomization is turned on (and it's the default starting with Python 3.3), the pickled representation of dicts will also vary from run to run:
$ python -R -c "import pickle; print pickle.dumps({'a':1, 'b':2})" |md5sum
c0ae6b7f62b9c0839be883dd1efee84e -
$ python -R -c "import pickle; print pickle.dumps({'a':1, 'b':2})" |md5sum
b03bf608516f3e0244a96d740139b050 -
msg181594 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年02月07日 12:01
It is surprising that the pickled representation of 1-element dict varies from run to run.
msg181595 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013年02月07日 12:26
Try `./python -R -c "import pickle; print(pickle.dumps({'a':1, 'v':1}))" |md5sum`. The output will differ on subsequent run, while trying `./python -R -c "import pickle; print(pickle.dumps({'a':1}))" |md5sum`, the output is always the same. I suspect because the order of dicts are different on every run (try repr).
msg181597 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013年02月07日 12:27
Darn, last sentence has some mistakes.
I suspect this issue is happening because the order of a dictionary is different on every run (try repr).
msg181598 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013年02月07日 12:30
Further proof:
here are the results of two invocations of `./python -R -c "import pickle; print(pickle.dumps({'a':1, 'v':1}))"`
b'\x80\x03}q\x00(X\x01\x00\x00\x00vq\x01K\x01X\x01\x00\x00\x00aq\x02K\x01u.'
b'\x80\x03}q\x00(X\x01\x00\x00\x00aq\x01K\x01X\x01\x00\x00\x00vq\x02K\x01u.'
Notice that in the second pickled data, the pickled data for 'v' has exchanged places with the one for 'a'! ('v' has become 'a' and at the second-last character 'a' has become 'v')
msg181603 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年02月07日 13:13
It is most probable that the difference is caused by the string interning.
msg182289 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年02月17日 22:27
Here is a minimal reproducer. Results:
pickle.dumps('spam', 2)
 0: \x80 PROTO 2
 2: U SHORT_BINSTRING 'spam'
 8: q BINPUT 0
 10: . STOP
highest protocol among opcodes = 2
pickle.dumps('spam1'[:-1], 2)
 0: \x80 PROTO 2
 2: U SHORT_BINSTRING 'spam'
 8: q BINPUT 0
 10: . STOP
highest protocol among opcodes = 2
cPickle.dumps('spam', 2)
 0: \x80 PROTO 2
 2: U SHORT_BINSTRING 'spam'
 8: q BINPUT 1
 10: . STOP
highest protocol among opcodes = 2
cPickle.dumps('spam1'[:-1], 2)
 0: \x80 PROTO 2
 2: U SHORT_BINSTRING 'spam'
 8: . STOP
highest protocol among opcodes = 2
The difference between 3rd and 4th examples is "BINPUT 1". In the last case the string has refcount=1 and BINPUT doesn't emitted due to optimization. Note that Python implementation emits BINPUT with different number.
msg188287 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013年05月03日 01:17
There is no guarantee the binary representation of pickled data will be same between different runs. We try to make it mostly consistent when we can, but there are cases, like this one, where we cannot ensure consistency without hurting performance significantly.
History
Date User Action Args
2022年04月11日 14:57:19adminsetgithub: 56805
2013年05月03日 01:17:15alexandre.vassalottisetstatus: open -> closed

nosy: + alexandre.vassalotti
messages: + msg188287

resolution: works for me
stage: needs patch -> resolved
2013年02月17日 22:27:33serhiy.storchakasetfiles: + cPickletest2.py

messages: + msg182289
2013年02月07日 13:13:38serhiy.storchakasetmessages: + msg181603
2013年02月07日 12:30:32Ramchandra Aptesetmessages: + msg181598
2013年02月07日 12:27:24Ramchandra Aptesetmessages: + msg181597
2013年02月07日 12:26:04Ramchandra Aptesetnosy: + Ramchandra Apte

messages: + msg181595
versions: + Python 3.3, Python 3.4
2013年02月07日 12:01:20serhiy.storchakasetmessages: + msg181594
components: + Extension Modules, - None
2013年02月07日 09:42:38pitrousetnosy: + pitrou
messages: + msg181586
2013年02月06日 10:32:25serhiy.storchakasetnosy: + serhiy.storchaka

stage: needs patch
2011年07月20日 16:34:27Philipp.Mölderssetmessages: + msg140752
2011年07月20日 16:25:51r.david.murraysetnosy: + r.david.murray
messages: + msg140751
2011年07月20日 16:24:12Philipp.Mölderssettype: behavior
2011年07月20日 16:20:51Philipp.Mölderscreate

AltStyle によって変換されたページ (->オリジナル) /