Is there a way in Python 2.7 to improve reading/writing speed (or memory consumption of the file) compared to this version?
import gzip
import cPickle
import io
# save zipped and pickled file
def save_zipped_pickle(obj, filename):
# disable garbage collector (hack for faster reading/writing)
gc.disable()
with gzip.open(filename, 'wb') as f:
cPickle.dump(obj, io.BufferedWriter(f), -1)
# enable garbage collector again
gc.enable()
# load zipped and pickled file
def load_zipped_pickle(filename):
# disable garbage collector (hack for faster reading/writing)
gc.disable()
with gzip.open(filename, 'rb') as f:
loaded_object = cPickle.load(io.BufferedReader(f))
# enable garbage collector again
gc.enable()
return loaded_object
1 Answer 1
This looks really efficient and potentially the best you can do, but here are some thoughts:
try specifying the highest protocol (for Python 2.7 that would be protocol 2):
with gzip.open(filename, 'wb') as f: cPickle.dump(obj, io.BufferedWriter(f), protocol=cPickle.HIGHEST_PROTOCOL)
re-iterate over the serialization format itself - if JSON is an option,
ujson
might win the performance battle againstcPickle
see if
PyPy
is an option- you can also get
Cython
into play here - there are famous stories how just specifying types with Cython leads to serious performance gains - there is also this undocumented
.fast
option/flag, but I am not at all sure it would actually help to speed things up - too good to be true :) - as far as improving on memory, you can try chunking with
shutil
FYI, the gc.disable
/gc.enable()
pair is planned to be implemented as a context manager in Python 3.7. You can borrow a Python sample from the issue tracker:
@contextmanager
def gc_disabled():
if gc.isenabled():
gc.disable()
yield
gc.enable()
else:
yield
Explore related questions
See similar questions with these tags.