Issue 1501108: Add write buffering to gzip

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43459

classification

Title:	Add write buffering to gzip
Type:	enhancement	Stage:	needs patch
Components:	Extension Modules	Versions:	Python 3.1, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	out of date
Assigned To:	Nosy List:	ajaksu2, ebfe, neologix, pitrou, rhettinger
Priority:	normal	Keywords:

Created on 2006年06月05日 16:40 by rhettinger, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
gztest.py	rhettinger, 2006年06月05日 16:40	Script to generate comparative timings

Messages (6)
msg54815 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2006年06月05日 16:40
A series of write() calls is dog slow compared to building-up a pool of data and then writing it in larger batches. The attached script demonstrates the speed-up potential. It compares a series of GzipFile.write() calls to an alternate approach using cStringIO.write() calls followed by a GzipFile.write(sio.getvalue()). On my box, there is a three-fold speed-up.
msg83968 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年03月22日 11:14
Although the script does not work as-is (missing import of "string", typo between "frags" and "wfrags"), I can conform the 3x ratio.
msg83984 - (view)	Author: Lukas Lueg (ebfe)	Date: 2009年03月22日 21:21
This is true for all objects whose input could be concatenated. For example with hashlib: data = ['foobar']*100000 mdX = hashlib.sha1() for d in data: mdX.update(d) mdY = hashlib.sha1() mdY.update("".join(data)) mdX.digest() == mdY.digest() the second version is multiple times faster...
msg102233 - (view)	Author: Charles-François Natali (neologix) * (Python committer)	Date: 2010年04月03日 10:20
In the test script, simply changing def emit(f, data=snips): for datum in data: f.write(datum) to def gemit(f, data=snips): datas = ''.join(data) f.write(datas) improves direct gzip performance from [1.1799781322479248, 0.50524115562438965, 0.2713780403137207] [1.183434009552002, 0.50997591018676758, 0.26801109313964844] [1.173914909362793, 0.51325297355651855, 0.26233196258544922] to [0.43065404891967773, 0.50007486343383789, 0.26698708534240723] [0.43662095069885254, 0.49983596801757812, 0.2686460018157959] [0.43778109550476074, 0.50057196617126465, 0.2687230110168457] which means that you're better off letting the application handle buffering issues. Furthermore, the problem with gzip-level buffering is the choice of the default buffer size. Time to close ?
msg102249 - (view)	Author: Lukas Lueg (ebfe)	Date: 2010年04月03日 12:33
agreed
msg102253 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2010年04月03日 12:45
Additionally, since issue7471 was fixed, you should be able to wrap a GzipFile in a Buffered{Reader,Writer} object for faster buffering.

History
Date	User	Action	Args
2022年04月11日 14:56:17	admin	set	github: 43459
2010年04月03日 12:45:17	pitrou	set	status: open -> closed resolution: out of date messages: + msg102253
2010年04月03日 12:33:36	ebfe	set	messages: + msg102249
2010年04月03日 10:20:09	neologix	set	nosy: + neologix messages: + msg102233
2009年03月22日 21:21:27	ebfe	set	nosy: + ebfe messages: + msg83984
2009年03月22日 11:14:32	pitrou	set	nosy: + pitrou messages: + msg83968
2009年03月21日 03:39:38	ajaksu2	set	nosy: + ajaksu2 versions: + Python 3.1, Python 2.7 stage: needs patch
2006年06月05日 16:40:29	rhettinger	create

homepage