This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2006年06月05日 16:40 by rhettinger, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| gztest.py | rhettinger, 2006年06月05日 16:40 | Script to generate comparative timings | ||
| Messages (6) | |||
|---|---|---|---|
| msg54815 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2006年06月05日 16:40 | |
A series of write() calls is dog slow compared to building-up a pool of data and then writing it in larger batches. The attached script demonstrates the speed-up potential. It compares a series of GzipFile.write() calls to an alternate approach using cStringIO.write() calls followed by a GzipFile.write(sio.getvalue()). On my box, there is a three-fold speed-up. |
|||
| msg83968 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月22日 11:14 | |
Although the script does not work as-is (missing import of "string", typo between "frags" and "wfrags"), I can conform the 3x ratio. |
|||
| msg83984 - (view) | Author: Lukas Lueg (ebfe) | Date: 2009年03月22日 21:21 | |
This is true for all objects whose input could be concatenated.
For example with hashlib:
data = ['foobar']*100000
mdX = hashlib.sha1()
for d in data:
mdX.update(d)
mdY = hashlib.sha1()
mdY.update("".join(data))
mdX.digest() == mdY.digest()
the second version is multiple times faster...
|
|||
| msg102233 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2010年04月03日 10:20 | |
In the test script, simply changing def emit(f, data=snips): for datum in data: f.write(datum) to def gemit(f, data=snips): datas = ''.join(data) f.write(datas) improves direct gzip performance from [1.1799781322479248, 0.50524115562438965, 0.2713780403137207] [1.183434009552002, 0.50997591018676758, 0.26801109313964844] [1.173914909362793, 0.51325297355651855, 0.26233196258544922] to [0.43065404891967773, 0.50007486343383789, 0.26698708534240723] [0.43662095069885254, 0.49983596801757812, 0.2686460018157959] [0.43778109550476074, 0.50057196617126465, 0.2687230110168457] which means that you're better off letting the application handle buffering issues. Furthermore, the problem with gzip-level buffering is the choice of the default buffer size. Time to close ? |
|||
| msg102249 - (view) | Author: Lukas Lueg (ebfe) | Date: 2010年04月03日 12:33 | |
agreed |
|||
| msg102253 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年04月03日 12:45 | |
Additionally, since issue7471 was fixed, you should be able to wrap a GzipFile in a Buffered{Reader,Writer} object for faster buffering. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:17 | admin | set | github: 43459 |
| 2010年04月03日 12:45:17 | pitrou | set | status: open -> closed resolution: out of date messages: + msg102253 |
| 2010年04月03日 12:33:36 | ebfe | set | messages: + msg102249 |
| 2010年04月03日 10:20:09 | neologix | set | nosy:
+ neologix messages: + msg102233 |
| 2009年03月22日 21:21:27 | ebfe | set | nosy:
+ ebfe messages: + msg83984 |
| 2009年03月22日 11:14:32 | pitrou | set | nosy:
+ pitrou messages: + msg83968 |
| 2009年03月21日 03:39:38 | ajaksu2 | set | nosy:
+ ajaksu2 versions: + Python 3.1, Python 2.7 stage: needs patch |
| 2006年06月05日 16:40:29 | rhettinger | create | |