This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年01月27日 21:52 by jakemcguire, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pickle.py.diff | jakemcguire, 2009年01月27日 21:52 | diff to pickle to intern attribute names | ||
| cPickle.c.diff | jakemcguire, 2009年02月25日 21:17 | take three. | ||
| pickletester.diff | jakemcguire, 2009年05月02日 02:38 | add a unit test to verify attribute names are interned | ||
| Messages (16) | |||
|---|---|---|---|
| msg80670 - (view) | Author: Jake McGuire (jakemcguire) | Date: 2009年01月27日 21:52 | |
Instance attribute names are normally interned - this is done in PyObject_SetAttr (among other places). Unpickling (in pickle and cPickle) directly updates __dict__ on the instance object. This bypasses the interning so you end up with many copies of the strings representing your attribute names, which wastes a lot of space, both in RAM and in pickles of sequences of objects created from pickles. Note that the native python memcached client uses pickle to serialize objects. >>> import pickle >>> class C(object): ... def __init__(self, x): ... self.long_attribute_name = x ... >>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None), pickle.HIGHEST_PROTOCOL)) for i in range(100)], pickle.HIGHEST_PROTOCOL)) 3658 >>> len(pickle.dumps([C(None) for i in range(100)], pickle.HIGHEST_PROTOCOL)) 1441 >>> Interning the strings on unpickling makes the pickles smaller, and at least for cPickle actually makes unpickling sequences of many objects slightly faster. I have included proposed patches to cPickle.c and pickle.py, and would appreciate any feedback. |
|||
| msg80678 - (view) | Author: Gabriel Genellina (ggenellina) | Date: 2009年01月27日 23:30 | |
Either my browser got crazy, or you uploaded the same patch (.py) twice. |
|||
| msg80688 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2009年01月28日 02:54 | |
The patch for cPickle doesn't do the same thing as the pickle one. In the cPickle one, you are only interning slot attributes, which, I believe, is not what you intended. :-) |
|||
| msg80689 - (view) | Author: Jake McGuire (jakemcguire) | Date: 2009年01月28日 03:00 | |
Are you sure? I may not have enough context in my diff, which I should have done against an anonymous svn checkout, but it looks like the slot attributes get set several lines after my diff. "while (PyDict_Next(slotstate, ...))" as opposed to the "while (PyDict_Next(state, ...))" in my change... -jake On Jan 27, 2009, at 6:54 PM, Alexandre Vassalotti wrote: > > Alexandre Vassalotti <alexandre@peadrop.com> added the comment: > > The patch for cPickle doesn't do the same thing as the pickle one. In > the cPickle one, you are only interning slot attributes, which, I > believe, is not what you intended. :-) > > ---------- > assignee: -> alexandre.vassalotti > nosy: +alexandre.vassalotti > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue5084> > _______________________________________ |
|||
| msg80690 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2009年01月28日 03:03 | |
Oh, you are right. I was looking at py3k's version of pickle, which uses PyDict_Update instead of a while loop; that is why I got confused. |
|||
| msg80918 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月01日 20:53 | |
Why do you call PyString_AsString() followed by PyString_FromString()? Strings are immutable so you shouldn't neek to take a copy. Besides, it would be nice to add a test. |
|||
| msg82686 - (view) | Author: Jake McGuire (jakemcguire) | Date: 2009年02月24日 23:32 | |
The fromstring/asstring dance was due to my incomplete understanding of refcounting. PyDict_Next returns a borrowed reference but PyString_InternInPlace expects an owned reference. Thanks to Kirk McDonald, I have a new patch that does the refcounting correctly. What sort of test did you have in mind? |
|||
| msg82690 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月25日 02:03 | |
The test should check that the unpickled strings have been interned. |
|||
| msg82691 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月25日 02:07 | |
In your patch, I'm not sure where the `name` variable is coming from. Have you checked it works? |
|||
| msg82692 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月25日 02:11 | |
To give an example of what the test could check:
>>> class C(object):
... def __init__(self):
... self.some_long_attribute_name = 5
...
>>> c = C()
>>> c.__dict__
{'some_long_attribute_name': 5}
>>> sorted(map(id, c.__dict__))
[140371243499696]
>>> import pickle
>>> d = pickle.loads(pickle.dumps(c, -1))
>>> d
<__main__.C object at 0x7faaba1b0390>
>>> d.__dict__
{'some_long_attribute_name': 5}
>>> sorted(map(id, d.__dict__))
[140371243501232]
The `sorted(map(id, d.__dict__))` should have been the same before and
after pickling.
|
|||
| msg82720 - (view) | Author: Jake McGuire (jakemcguire) | Date: 2009年02月25日 21:17 | |
Ugh. Clearly I didn't check to see if it worked or not, as it didn't even compile. A new diff, tested and verified to work, is attached. I'll work on creating a test. |
|||
| msg86919 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年05月01日 21:54 | |
Jake, are you string working on a test? |
|||
| msg86920 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年05月01日 21:55 | |
(s/string/still/, of course...) |
|||
| msg86930 - (view) | Author: Jake McGuire (jakemcguire) | Date: 2009年05月02日 02:38 | |
This fell through the cracks. But the following unit test seems like it does the right thing (fails with the old module, works with the new ones). |
|||
| msg86939 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年05月02日 11:41 | |
Thanks, I'll take a look very soon. |
|||
| msg86982 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年05月02日 21:42 | |
Committed in r72223, r72224. Thanks! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:44 | admin | set | github: 49334 |
| 2009年05月02日 21:42:49 | pitrou | set | status: open -> closed resolution: fixed messages: + msg86982 |
| 2009年05月02日 11:41:39 | pitrou | set | assignee: alexandre.vassalotti -> pitrou messages: + msg86939 |
| 2009年05月02日 11:40:40 | pitrou | set | stage: needs patch -> patch review |
| 2009年05月02日 02:38:05 | jakemcguire | set | files:
+ pickletester.diff messages: + msg86930 |
| 2009年05月01日 21:55:32 | pitrou | set | messages: + msg86920 |
| 2009年05月01日 21:54:58 | pitrou | set | priority: normal stage: needs patch messages: + msg86919 versions: + Python 3.1, Python 2.7 |
| 2009年02月25日 21:19:13 | jakemcguire | set | files: - cPickle.c.diff |
| 2009年02月25日 21:19:08 | jakemcguire | set | files: - cPickle.c.diff |
| 2009年02月25日 21:17:28 | jakemcguire | set | files:
+ cPickle.c.diff messages: + msg82720 |
| 2009年02月25日 02:11:36 | pitrou | set | messages: + msg82692 |
| 2009年02月25日 02:07:01 | pitrou | set | messages: + msg82691 |
| 2009年02月25日 02:03:09 | pitrou | set | messages: + msg82690 |
| 2009年02月24日 23:32:52 | jakemcguire | set | files:
+ cPickle.c.diff messages: + msg82686 |
| 2009年02月03日 02:20:50 | collinwinter | set | nosy: + collinwinter, jyasskin |
| 2009年02月01日 20:53:38 | pitrou | set | nosy:
+ pitrou messages: + msg80918 |
| 2009年01月29日 10:56:14 | jcea | set | nosy: + jcea |
| 2009年01月28日 03:03:10 | alexandre.vassalotti | set | messages: + msg80690 |
| 2009年01月28日 03:00:02 | jakemcguire | set | messages: + msg80689 |
| 2009年01月28日 02:54:53 | alexandre.vassalotti | set | assignee: alexandre.vassalotti messages: + msg80688 nosy: + alexandre.vassalotti |
| 2009年01月27日 23:39:02 | jakemcguire | set | files: + cPickle.c.diff |
| 2009年01月27日 23:38:46 | jakemcguire | set | files: - cPickle.c.diff |
| 2009年01月27日 23:30:05 | ggenellina | set | nosy:
+ ggenellina messages: + msg80678 |
| 2009年01月27日 21:52:58 | jakemcguire | set | files: + pickle.py.diff |
| 2009年01月27日 21:52:19 | jakemcguire | create | |