Message80670
| Author |
jakemcguire |
| Recipients |
jakemcguire |
| Date |
2009年01月27日.21:52:16 |
| SpamBayes Score |
2.4687994e-09 |
| Marked as misclassified |
No |
| Message-id |
<1233093140.55.0.0122548653801.issue5084@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Instance attribute names are normally interned - this is done in
PyObject_SetAttr (among other places). Unpickling (in pickle and
cPickle) directly updates __dict__ on the instance object. This
bypasses the interning so you end up with many copies of the strings
representing your attribute names, which wastes a lot of space, both in
RAM and in pickles of sequences of objects created from pickles. Note
that the native python memcached client uses pickle to serialize
objects.
>>> import pickle
>>> class C(object):
... def __init__(self, x):
... self.long_attribute_name = x
...
>>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None),
pickle.HIGHEST_PROTOCOL)) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
3658
>>> len(pickle.dumps([C(None) for i in range(100)],
pickle.HIGHEST_PROTOCOL))
1441
>>>
Interning the strings on unpickling makes the pickles smaller, and at
least for cPickle actually makes unpickling sequences of many objects
slightly faster. I have included proposed patches to cPickle.c and
pickle.py, and would appreciate any feedback. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2009年01月27日 21:52:20 | jakemcguire | set | recipients:
+ jakemcguire |
| 2009年01月27日 21:52:20 | jakemcguire | set | messageid: <1233093140.55.0.0122548653801.issue5084@psf.upfronthosting.co.za> |
| 2009年01月27日 21:52:19 | jakemcguire | link | issue5084 messages |
| 2009年01月27日 21:52:17 | jakemcguire | create |
|