homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unpickling is really slow
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, alexandre.vassalotti, amaury.forgeotdarc, belopolsky, collinwinter, hagen, jcea, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2008年09月15日 16:37 by hagen, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pickletst.py hagen, 2009年01月18日 14:17
pickle_read_buffer.patch vstinner, 2009年04月06日 11:52
pickletest.py vstinner, 2009年04月06日 11:56
unpickleprefetch.patch pitrou, 2010年09月23日 22:17 review
unpickleprefetch2.patch pitrou, 2010年09月24日 18:24 review
bench_pickle.py pitrou, 2010年09月27日 22:11
unpickleprefetch3.patch pitrou, 2010年09月27日 22:23 review
Messages (38)
msg73267 - (view) Author: Hagen Fürstenau (hagen) Date: 2008年09月15日 16:37
Unpickling e.g. a large list seems to be really slow in Python 3.0.
The attached test script gives the following results for pickling and
unpickling a list of 1M zeros, showing that although the C
implementation seems to be used in Python 3.0, unpickling is even slower
than with the "pickle" module of Python 2.6!
[hagenf@cluster-06 hagenf]$ python pickletst.py
2.71067500114
2.77484893799
[hagenf@cluster-06 hagenf]$ python3.0 pickletst.py
0.0925059318542
5.76121616364 
msg73269 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年09月15日 16:56
Do the numbers vary if you read the whole file at once and then unpickle
the resulting bytes string?
Large parts of the IO library are written in Python in 3.0, which might
explain the discrepancy.
msg73271 - (view) Author: Hagen Fürstenau (hagen) Date: 2008年09月15日 17:17
Yes, it gets much better, but even so (first reading file and timing
only "loads") unpickling takes four times as long in Python 3.0 as with
the old cPickle module:
[hagenf@cluster-06 hagenf]$ python pickletst2.py
0.0744678974152
0.0514161586761
[hagenf@cluster-06 hagenf]$ python3.0 pickletst3.py
0.0911619663239
0.208593845367
But I guess this can still be blamed on the BytesIO implementation...
msg73273 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008年09月15日 17:24
Indeed. If I replace the file with 
 f = io.BytesIO(open("tst", "rb").read())
timings are divided by 20...
After quick profiling, it seems that PyLong_New would benefit from a
free list. len(bytearray) is called very often.
To stay simple, it would be enough to only store longs of length 1 (<2**15).
msg73275 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年09月15日 17:37
Gregory had patches for a freelist of long objects in #2013.
msg74487 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008年10月07日 21:12
The solution is to add a read buffer to Unpickler (Pickler already has a
write buffer, so that why it is unaffected). I believe this would
mitigate much of the (quite large) Python function call overhead.
cPickle has a performance hack to make it uses cStringIO and PyFile 
directly (via C function calls). In Python 3, the hack was removed since
cStringIO.h and fileobject.h aren't public anymore. This explains the
0.02s deviation you are getting for dump speed.
msg76988 - (view) Author: Hagen Fürstenau (hagen) Date: 2008年12月05日 09:21
I think a read buffer is not possible without being able to unread bytes
from the stream. pickle shoudn't consume bytes after the end of a
pickled object!
msg80093 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年01月18日 12:13
Making this a duplicate of #4565 ("Rewrite the IO stack in C").
If anyone disagrees, please reopen!
msg80097 - (view) Author: Hagen Fürstenau (hagen) Date: 2009年01月18日 13:40
With the io-c branch I see much better unpickling performance than
before. But it still seems to be around 2 or 3 times slower than with
cPickle in 2.6.
Is this expected at this point of io-c development? Otherwise perhaps
this issue should stay open until it can be verified that nothing else
can be done to get closer to the old cPickle performance.
msg80101 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年01月18日 14:01
Hello,
> With the io-c branch I see much better unpickling performance than
> before. But it still seems to be around 2 or 3 times slower than with
> cPickle in 2.6.
It's much closer here.
With 2.7 (trunk) and cPickle:
0.439934968948
0.669679880142
With the io-c branch:
0.136367082596
0.798221111298
> Is this expected at this point of io-c development?
20-30% slower is expected for binary I/O (which is what I get above),
not 2x or 3x slower.
What is your system?
msg80103 - (view) Author: Hagen Fürstenau (hagen) Date: 2009年01月18日 14:17
I uploaded a new pickletst.py which specifies protocol 2, otherwise
we're comparing apples with oranges. With this I get:
0.211881160736
0.322369813919
for Python 2.6 and
0.158488035202
1.21621990204
on the io-c branch. Can you confirm this?
msg80104 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年01月18日 14:25
Nice catch! I can confirm your figures with protocol=2 (and protocol=-1
as well).
msg85458 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年04月05日 01:23
alexandre.vassalotti wrote:
> The solution is to add a read buffer to Unpickler (...) 
> would mitigate much of the (quite large) Python function 
> call overhead. (...) cPickle has a performance hack to make it 
> uses cStringIO and PyFile directly (via C function calls). In 
> Python 3, the hack was removed (...)
Yes, unpickler_read() calls Buffered_read() through 
PyObject_Call+PyCFunction_Call which is expensive. And unpickle main 
loop starts with unpickler_read(self, &s, 1): just read *one* byte 
(the "opcode"). If Buffered_read() call is expensive, a solution is to 
avoid calling it (eg. read more bytes and... go backward at the end?).
msg85459 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年04月05日 01:25
Unladen Swallow has a project to optimize pickle. Currently, it uses 3 
benchmarks: 
 pickle - use the cPickle module to pickle a variety of datasets. 
 pickle_dict - microbenchmark; use the cPickle module to pickle a lot 
of dicts. 
 pickle_list - microbenchmark; use the cPickle module to pickle a lot 
of lists.
msg85462 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年04月05日 02:02
gprof (--enable-profiler) results:
Each sample counts as 0.01 seconds.
 % cumulative self self total 
 time seconds seconds calls ms/call ms/call name 
 18.18 0.16 0.16 2011055 0.00 0.00 vgetargs1
 7.95 0.23 0.07 2017033 0.00 0.00 
PyObject_GenericGetAttr
 6.82 0.29 0.06 2002006 0.00 0.00 
_BufferedReader_read_unlocked
 5.68 0.34 0.05 2012123 0.00 0.00 convertsimple
 4.55 0.38 0.04 2013284 0.00 0.00 PyCFunction_Call
 4.55 0.42 0.04 2004543 0.00 0.00 PyLong_AsSsize_t
 3.41 0.45 0.03 2004565 0.00 0.00 PyNumber_Index
 3.41 0.48 0.03 2002177 0.00 0.00 float_argument_error
 3.41 0.51 0.03 1000497 0.00 0.00 
PyLong_AsLongAndOverflow
 2.27 0.53 0.02 2012123 0.00 0.00 convertitem
It looks like PyArg_ParseTuple() should be optimized ;-)
msg85636 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年04月06日 11:52
Create a read buffer (4096 bytes) in unpickler class. Using [0]*10**7 or
[1000]*10**7, load() is from 6 to 8 times faster.
I removed last_string attribute because it wasn't used.
If there are "tail" bytes, seek backward.
msg85637 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年04月06日 11:53
I don't know why, but python-trunk is *much* slower than py3k (eg. with
dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).
msg85638 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年04月06日 11:56
My version of pickletest.py:
 - make sure that file position is correct after the load()
 - some benchmark. most interesting numbers:
without the patch :
version | data | dump ms | load ms |
 py3k | 0, 10^6 | 230 | 1500 |
 py3k | 1000, 10^6 | 255 | 1780 |
 py3k | 0, 10^7 | 2360 | 16044 |
 py3k | 1000, 10^7 | 2615 | 19380 |
with the patch:
version | data | dump ms | load ms | speed up:
 py3k | 0, 10^6 | 237 | 183 | x8
 py3k | 1000, 10^6 | 241 | 248 | x7
 py3k | 0, 10^7 | 2420 | 1860 | x8
 py3k | 1000, 10^7 | 2850 | 3100 | x6
(data: 0, 10^6 means: [0]*10^6)
msg85639 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年04月06日 12:02
Note about my patch: the buffer should be truncated after
PyBytes_Concat(&self->buffer.pybytes, data) to avoid very long buffer.
Something like: self->buffer.pybytes += data; self->buffer.pybytes =
self->buffer.pybytes[index:]; self->buffer.index=0;
msg85640 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年04月06日 12:09
> I don't know why, but python-trunk is *much* slower than py3k (eg. with
> dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).
Perhaps you tried with the pure Python version (pickle) rather than the
C one (cPickle)?
msg85780 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009年04月08日 17:28
Victor, Unpickler shouldn't raise an error if the given file object does
support seek(); it should gracefully fall back to using only read() and
readline(). Also, I think you could get a greater performance
improvement by using peek() and avoiding backward seeks. As far as I
know, a backward seek may invalidate the internal buffer of the file
object, thus forcing a read from disk.
Incidentally, I think it would be nice to add to Python 3 some kind of
file protocol as this would provide a way to mitigate the function call
overhead for I/O operations.
msg86047 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年04月16日 21:34
By the way, the patch won't work with unseekable files, which is
probably bad.
msg110687 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年07月18日 21:27
Has this slipped under the radar? I believe that one way or the other any performance issue should be resolved if at all possible.
msg111867 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年07月28日 22:58
New version of my patch:
 - add "used" attribute to UnpicklerBuffer structure: disable the read buffer for not seekable file and for protocol 0 (at the first call to unpickle_readline)
 - check if PyObject_GetAttrString(file, "seek") is NULL or not
 - unpickle_readline() flushs also the buffer
 - add a new patch specific to the read buffer: ensure that unpickler doesn't eat data at the end of the file
test_pickle pass without any error.
Disable read buffer at the first call to unpickle_readline() because unpickle_readline() have to flush the buffer. I will be very difficult to optimize protocol 0, but I hope that nobody uses it nowadays.
===========
Benchmark with [0]*10**6, Python compiled with pydebug.
Without the patch
-----------------
Protocol 0:
- dump: 598.0 ms
- load (seekable=False): 3337.3 ms
- load (seekable=True): 3309.6 ms
Protocol 1:
- dump: 217.8 ms
- load (seekable=False): 864.2 ms
- load (seekable=True): 873.3 ms
Protocol 2:
- dump: 226.5 ms
- load (seekable=False): 867.8 ms
- load (seekable=True): 854.6 ms
With the patch
--------------
Protocol 0
- dump: 615.5 ms
- load (seekable=False): 3201.3 ms
- load (seekable=True): 3223.4 ms
Protocol 1
- dump: 219.8 ms
- load (seekable=False): 942.1 ms
- load (seekable=True): 175.2 ms
Protocol 2
- dump: 221.1 ms
- load (seekable=False): 943.9 ms
- load (seekable=True): 175.5 ms
msg111870 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年07月28日 23:10
Same benchmark with Python 2.6.5+, so without the patch, but compiled with maximum compiler optimization (whereas pydebug means no optimization):
Protocol 0
- dump: 517.3 ms
- load: 876.6 ms <= because of the new I/O library, python3 is much slower
Protocol 1
- dump: 141.8 ms
- load: 255.2 ms <= with my patch, python3 is 1.5x faster
Protocol 2
- dump: 142.7 ms
- load: 262.1 ms <= with my patch, python3 is 1.5x faster
It would be better to redo all tests with the same compiler options, but the most important point is that Python3 is *faster* than Python2 with my patch ;-)
msg111871 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年07月28日 23:11
bench_pickle.py: script used to produce last benchmarks.
msg111892 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2010年07月29日 05:48
Victor, have you tried using peek() instead of seek()? I mentioned this previously in msg85780.
msg111964 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年07月29日 16:38
> Victor, have you tried using peek() instead of seek()? 
> I mentioned this previously in msg85780.
In a file encoded in protocol 0, backward seek are needed to each call to unpickler_readline... and this function is called to read a number, a boolean, etc. (to read most, or all, data values). I choosed to disable the read buffer because it's slower with it.
For protocol 1 and 2, there is only *one* seek at the end (but a lot of read: file size / 4096). So I don't think that it does really matter to use peek or seek.
seek() is more natural (for me) and frequent than peek(), so I prefer to keep seek().
msg117239 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月23日 22:10
Here is a fixed version of Victor's bench (didn't work on 2.x).
msg117241 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月23日 22:17
And here is new performance patch (Victor's patch was outdated because of heavy changes incorporated from Unladen Swallow). Results of bench_pickle.py are as follows:
* Python 2.7 (cPickle):
Protocol 0
- dump: 189.8 ms
- load (seekable=False): 318.1 ms
- load (seekable=True): 321.2 ms
Protocol 1
- dump: 58.2 ms
- load (seekable=False): 89.5 ms
- load (seekable=True): 89.4 ms
Protocol 2
- dump: 58.0 ms
- load (seekable=False): 89.1 ms
- load (seekable=True): 89.1 ms
* Python 3.2 (unpatched):
Protocol 0
- dump: 144.5 ms
- load (seekable=False): 530.7 ms
- load (seekable=True): 530.5 ms
Protocol 1
- dump: 27.8 ms
- load (seekable=False): 280.4 ms
- load (seekable=True): 279.9 ms
Protocol 2
- dump: 27.8 ms
- load (seekable=False): 280.6 ms
- load (seekable=True): 281.9 ms
* Python 3.2 + patch:
Protocol 0
- dump: 146.4 ms
- load (seekable=False): 611.1 ms
- load (seekable=True): 194.2 ms
Protocol 1
- dump: 27.3 ms
- load (seekable=False): 254.7 ms
- load (seekable=True): 31.2 ms
Protocol 2
- dump: 27.3 ms
- load (seekable=False): 254.6 ms
- load (seekable=True): 31.2 ms
msg117320 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2010年09月24日 18:12
I get this error with the patch:
python: /home/alex/src/python.org/py3k/Modules/_pickle.c:908: _Unpickler_ReadFromFile: Assertion `self->next_read_idx == 0' failed.
Aborted
msg117323 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月24日 18:24
Ah, thank you. I hadn't tested in debug mode and there was a wrong assert from the previous code.
Here is a patch with the assert removed.
msg117471 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月27日 20:52
One problem with the seek() approach is that some file-like objects have expensive seeks. One example is GzipFile, where seek(n) is O(n) (it first rewinds to the start of file, then reads n decompressed bytes). In the end, unpickling from a GzipFile becomes O(n**2).
I will try to use peek() instead.
msg117472 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2010年09月27日 20:58
Didn't Victor say that only one seek at the end is necessary per
pickle? If this is the case, I don't think expensive seeks will be an
issue.
msg117480 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月27日 22:06
> Didn't Victor say that only one seek at the end is necessary per
> pickle? If this is the case, I don't think expensive seeks will be an
> issue.
If you are unpickling from a multi-megabyte gzip file and the seek at
the end makes you uncompress all the gzip file again, it can certainly
be a problem.
Another issue with seeking only at the end is that it would make
readline() quite more complicated. And, apparently, readline() is not
only used on protocol 0 but also on binary protocols.
msg117481 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月27日 22:11
Here is an update bench_pickle which also makes the file unpeekable.
msg117483 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月27日 22:23
Here is a patch using peek() rather than seek(). There are some inefficiencies around (such as using read() to skip the consumed prefetched bytes), but the benchmark results are still as good as with seek():
Protocol 0
- dump: 142.5 ms
- load (seekable=False): 598.8 ms
- load (seekable=True): 240.2 ms
Protocol 1
- dump: 28.9 ms
- load (seekable=False): 253.2 ms
- load (seekable=True): 25.7 ms
Protocol 2
- dump: 28.9 ms
- load (seekable=False): 253.3 ms
- load (seekable=True): 25.7 ms
msg118472 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年10月12日 20:52
Patch committed in r85384.
History
Date User Action Args
2022年04月11日 14:56:39adminsetgithub: 48123
2011年03月19日 03:00:12jceasetnosy: + jcea
2010年10月12日 20:52:02pitrousetstatus: open -> closed
resolution: fixed
messages: + msg118472

stage: patch review -> resolved
2010年09月27日 22:23:14pitrousetfiles: + unpickleprefetch3.patch

messages: + msg117483
2010年09月27日 22:11:48pitrousetfiles: - bench_pickle.py
2010年09月27日 22:11:43pitrousetfiles: - bench_pickle.py
2010年09月27日 22:11:36pitrousetfiles: + bench_pickle.py

messages: + msg117481
2010年09月27日 22:06:35pitrousetmessages: + msg117480
2010年09月27日 20:58:06alexandre.vassalottisetmessages: + msg117472
2010年09月27日 20:52:14pitrousetmessages: + msg117471
2010年09月24日 18:25:00pitrousetfiles: + unpickleprefetch2.patch

messages: + msg117323
versions: + Python 3.2, - Python 3.1
2010年09月24日 18:12:44alexandre.vassalottisetmessages: + msg117320
2010年09月23日 22:17:44pitrousetfiles: + unpickleprefetch.patch

messages: + msg117241
2010年09月23日 22:10:23pitrousetfiles: + bench_pickle.py

messages: + msg117239
2010年07月29日 16:46:20belopolskysetnosy: + belopolsky
2010年07月29日 16:38:51vstinnersetmessages: + msg111964
2010年07月29日 05:48:16alexandre.vassalottisetmessages: + msg111892
2010年07月28日 23:11:19vstinnersetfiles: + bench_pickle.py

messages: + msg111871
2010年07月28日 23:10:29vstinnersetmessages: + msg111870
2010年07月28日 22:58:02vstinnersetmessages: + msg111867
2010年07月18日 21:27:28BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110687
2009年04月16日 21:34:41pitrousetmessages: + msg86047
2009年04月16日 21:33:44pitrousetpriority: normal
stage: patch review
versions: + Python 3.1, - Python 3.0
2009年04月08日 17:28:05alexandre.vassalottisetmessages: + msg85780
2009年04月06日 16:37:42collinwintersetnosy: + collinwinter
2009年04月06日 12:09:02pitrousetmessages: + msg85640
2009年04月06日 12:02:56vstinnersetmessages: + msg85639
2009年04月06日 11:56:49vstinnersetfiles: + pickletest.py

messages: + msg85638
2009年04月06日 11:53:40vstinnersetmessages: + msg85637
2009年04月06日 11:52:25vstinnersetfiles: + pickle_read_buffer.patch
keywords: + patch
messages: + msg85636
2009年04月05日 02:02:12vstinnersetmessages: + msg85462
2009年04月05日 01:25:19vstinnersetmessages: + msg85459
2009年04月05日 01:23:08vstinnersetnosy: + vstinner
messages: + msg85458
2009年01月18日 14:25:43pitrousetstatus: closed -> open
resolution: duplicate -> (no value)
superseder: Rewrite the IO stack in C ->
messages: + msg80104
2009年01月18日 14:17:54hagensetfiles: + pickletst.py
messages: + msg80103
2009年01月18日 14:14:12hagensetfiles: - pickletst.py
2009年01月18日 14:01:19pitrousetmessages: + msg80101
2009年01月18日 13:40:36hagensetmessages: + msg80097
2009年01月18日 12:13:00pitrousetstatus: open -> closed
resolution: duplicate
superseder: Rewrite the IO stack in C
messages: + msg80093
2008年12月05日 09:21:17hagensetmessages: + msg76988
2008年10月07日 21:12:43alexandre.vassalottisetnosy: + alexandre.vassalotti
messages: + msg74487
2008年09月15日 17:37:29pitrousetmessages: + msg73275
2008年09月15日 17:24:06amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg73273
2008年09月15日 17:17:23hagensetmessages: + msg73271
2008年09月15日 16:56:46pitrousetnosy: + pitrou
messages: + msg73269
2008年09月15日 16:37:16hagencreate

AltStyle によって変換されたページ (->オリジナル) /