This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年11月09日 15:51 by Jimbofbx, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| zipfiletest.py | Jimbofbx, 2012年05月01日 20:26 | |||
| zipfile_optimize_read.patch | serhiy.storchaka, 2012年05月10日 22:24 | |||
| zipfile_optimize_read.patch | loewis, 2012年05月13日 18:32 | regenerate patch for review (without manually deleted chunks) | review | |
| zipfile_optimize_read_2.patch | serhiy.storchaka, 2012年05月31日 07:44 | review | ||
| Messages (12) | |||
|---|---|---|---|
| msg120871 - (view) | Author: James Hutchison (Jimbofbx) | Date: 2010年11月09日 15:51 | |
The Unzip module is always unbuffered (tested v.3.1.2 Windows XP, 32-bit). This means that if one has to do many small reads it is a lot slower than reading a chunk of data to a buffer and then reading from that buffer. It seems logical that the unzip module should default to buffered reading and/or have a buffered argument. Likewise, the documentation should clarify that there is no buffering involved when doing a read, which runs contrary to the default behavior of a normal read. start Zipfile read done 27432 reads done took 0.859 seconds start buffered Zipfile read done 27432 reads done took 0.072 seconds start normal read (default buffer) done 27432 reads done took 0.139 seconds start buffered normal read done 27432 took 0.137 seconds |
|||
| msg120873 - (view) | Author: James Hutchison (Jimbofbx) | Date: 2010年11月09日 15:55 | |
I should clarify that this is the zipfile constructor I am using: zipfile.ZipFile(filename, mode='r', allowZip64=True); |
|||
| msg159603 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月29日 11:56 | |
Actually reading from the zip file is buffered (at least 4 KiB of uncompressed data at a time). Can you give tests, scripts and data, which show the problem? |
|||
| msg159767 - (view) | Author: James Hutchison (Jimbofbx) | Date: 2012年05月01日 20:26 | |
See attached, which will open a zipfile that contains one file and reads it a bunch of times using unbuffered and buffered idioms. This was tested on windows using python 3.2 You're in charge of coming up with a file to test it on. Sorry. Example output: Enter filename: test.zip Timing unbuffered read, 5 bytes at a time. 10 loops took 6.671999931335449 Timing buffered read, 5 bytes at a time (4000 byte buffer). 10 loops took 0.7350001335144043 |
|||
| msg160377 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年05月10日 22:24 | |
This is not because zipfile module is unbuffered. This is the difference between expensive function call and cheap bytes slicing. Replace `zf.open(namelist [0])` to `io.BufferedReader(zf.open(namelist [0]))` to see the effect of a good buffering. In 3.2 zipfile read() implemented not optimal, so it slower (twice), but in 3.3 it will be almost as fast as using io.BufferedReader. It is still several times more slowly than bytes slicing, but there's nothing you can do with it.
Here is a patch, which is speeds up (+20%) the reading from a zip file by small chunks. Microbenchmark:
./python -m zipfile -c test.zip python
./python -m timeit -n 1 -s "import zipfile;zf=zipfile.ZipFile('test.zip')" "with zf.open('python') as f:" " while f.read(1):pass"
Python 3.3 (vanilla): 1 loops, best of 3: 36.4 sec per loop
Python 3.3 (patched): 1 loops, best of 3: 30.1 sec per loop
Python 3.3 (with io.BufferedReader): 1 loops, best of 3: 30.2 sec per loop
And, for comparison, Python 3.2: 1 loops, best of 3: 74.5 sec per loop
|
|||
| msg160542 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年05月13日 18:36 | |
Thank you, Martin, now I understood why not work Rietveld review. |
|||
| msg161985 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年05月31日 07:44 | |
The patch updated to reflect Martin's stylistic comments. Sorry for the delay, Martin. I have not received an email with your review from 2012年05月13日, and only today accidentally discovered your comments in Rietveld. It seems to have been some bug in Rietveld. |
|||
| msg162831 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年06月14日 21:26 | |
Martin, now the patch is good? |
|||
| msg163582 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年06月23日 11:28 | |
Any chance to commit the patch before final feature freeze? |
|||
| msg163603 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2012年06月23日 13:15 | |
Patch looks fine to me. Antoine, can you commit this? I'm currently away from the computer that has my SSH key on it. |
|||
| msg163616 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年06月23日 14:48 | |
New changeset 0e8285321659 by Antoine Pitrou in branch 'default': On behalf of Nadeem Vawda: issue #10376: micro-optimize reading from a Zipfile. http://hg.python.org/cpython/rev/0e8285321659 |
|||
| msg163618 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年06月23日 14:51 | |
> Antoine, can you commit this? Ok, done. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:08 | admin | set | github: 54585 |
| 2012年06月23日 14:51:21 | pitrou | set | status: open -> closed resolution: fixed messages: + msg163618 stage: patch review -> resolved |
| 2012年06月23日 14:48:24 | python-dev | set | nosy:
+ python-dev messages: + msg163616 |
| 2012年06月23日 13:15:59 | nadeem.vawda | set | messages: + msg163603 |
| 2012年06月23日 11:34:20 | pitrou | set | assignee: docs@python -> nosy: + nadeem.vawda stage: patch review |
| 2012年06月23日 11:28:37 | serhiy.storchaka | set | messages: + msg163582 |
| 2012年06月14日 21:26:34 | serhiy.storchaka | set | messages: + msg162831 |
| 2012年05月31日 07:44:31 | serhiy.storchaka | set | files:
+ zipfile_optimize_read_2.patch messages: + msg161985 |
| 2012年05月13日 18:36:01 | serhiy.storchaka | set | messages: + msg160542 |
| 2012年05月13日 18:32:04 | loewis | set | files: + zipfile_optimize_read.patch |
| 2012年05月10日 22:26:06 | vstinner | set | nosy:
+ pitrou |
| 2012年05月10日 22:24:06 | serhiy.storchaka | set | files:
+ zipfile_optimize_read.patch versions: - Python 2.7, Python 3.2 messages: + msg160377 components: - Documentation keywords: + patch |
| 2012年05月01日 20:26:59 | Jimbofbx | set | files:
+ zipfiletest.py messages: + msg159767 |
| 2012年04月29日 11:56:57 | serhiy.storchaka | set | messages: + msg159603 |
| 2012年04月07日 17:59:01 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka |
| 2011年06月06日 11:27:14 | xuanji | set | nosy:
+ xuanji |
| 2011年06月01日 06:23:11 | terry.reedy | set | versions: + Python 3.2, Python 3.3, - Python 2.6, Python 2.5, Python 3.1 |
| 2010年11月09日 15:55:12 | Jimbofbx | set | messages: + msg120873 |
| 2010年11月09日 15:51:48 | Jimbofbx | create | |