homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add write buffering to gzip
Type: enhancement Stage: needs patch
Components: Extension Modules Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, ebfe, neologix, pitrou, rhettinger
Priority: normal Keywords:

Created on 2006年06月05日 16:40 by rhettinger, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
gztest.py rhettinger, 2006年06月05日 16:40 Script to generate comparative timings
Messages (6)
msg54815 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006年06月05日 16:40
A series of write() calls is dog slow compared to
building-up a pool of data and then writing it in
larger batches.
The attached script demonstrates the speed-up
potential. It compares a series of GzipFile.write()
calls to an alternate approach using cStringIO.write()
calls followed by a GzipFile.write(sio.getvalue()). On
my box, there is a three-fold speed-up.
msg83968 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年03月22日 11:14
Although the script does not work as-is (missing import of "string",
typo between "frags" and "wfrags"), I can conform the 3x ratio.
msg83984 - (view) Author: Lukas Lueg (ebfe) Date: 2009年03月22日 21:21
This is true for all objects whose input could be concatenated.
For example with hashlib:
data = ['foobar']*100000
mdX = hashlib.sha1()
for d in data:
 mdX.update(d)
mdY = hashlib.sha1()
mdY.update("".join(data))
mdX.digest() == mdY.digest()
the second version is multiple times faster...
msg102233 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010年04月03日 10:20
In the test script, simply changing 
def emit(f, data=snips):
 for datum in data:
 f.write(datum)
to 
def gemit(f, data=snips):
 datas = ''.join(data)
 f.write(datas)
improves direct gzip performance from
[1.1799781322479248, 0.50524115562438965, 0.2713780403137207]
[1.183434009552002, 0.50997591018676758, 0.26801109313964844]
[1.173914909362793, 0.51325297355651855, 0.26233196258544922]
to
[0.43065404891967773, 0.50007486343383789, 0.26698708534240723]
[0.43662095069885254, 0.49983596801757812, 0.2686460018157959]
[0.43778109550476074, 0.50057196617126465, 0.2687230110168457]
which means that you're better off letting the application handle buffering issues. Furthermore, the problem with gzip-level buffering is the choice of the default buffer size.
Time to close ?
msg102249 - (view) Author: Lukas Lueg (ebfe) Date: 2010年04月03日 12:33
agreed
msg102253 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年04月03日 12:45
Additionally, since issue7471 was fixed, you should be able to wrap a GzipFile in a Buffered{Reader,Writer} object for faster buffering.
History
Date User Action Args
2022年04月11日 14:56:17adminsetgithub: 43459
2010年04月03日 12:45:17pitrousetstatus: open -> closed
resolution: out of date
messages: + msg102253
2010年04月03日 12:33:36ebfesetmessages: + msg102249
2010年04月03日 10:20:09neologixsetnosy: + neologix
messages: + msg102233
2009年03月22日 21:21:27ebfesetnosy: + ebfe
messages: + msg83984
2009年03月22日 11:14:32pitrousetnosy: + pitrou
messages: + msg83968
2009年03月21日 03:39:38ajaksu2setnosy: + ajaksu2
versions: + Python 3.1, Python 2.7

stage: needs patch
2006年06月05日 16:40:29rhettingercreate

AltStyle によって変換されたページ (->オリジナル) /