Issue 30468: Propagate zipfile.py pypy issue #905 patch to CPython 3.7

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74653

classification

Title:	Propagate zipfile.py pypy issue #905 patch to CPython 3.7
Type:	performance	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	not a bug
Assigned To:	serhiy.storchaka	Nosy List:	alanmcintyre, alecsandru.patrascu, serhiy.storchaka, shubhar, twouters
Priority:	normal	Keywords:	patch

Created on 2017年05月24日 23:08 by shubhar, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue905.diff	shubhar, 2017年05月24日 23:08

Messages (10)
msg294413 - (view)	Author: Shubha Ramani (shubhar) *	Date: 2017年05月24日 23:08
PyPy had a longstanding issue : ZipFile.extractall is very slow compared to CPython 2.6 https://bitbucket.org/pypy/pypy/issues/905/zipfileextractall-is-very-slow-compared-to which has been fixed in the PyPy code base. The changes were entirely in zipfile.py (see the attached patch for PyPy) The patch fixed a significant performance bottleneck in PyPy.
msg294466 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2017年05月25日 11:10
Could you provide a benchmarkin script that demonstrates a performance bottleneck? The patch is against 2.7. Please provide a patch against the master branch.
msg294487 - (view)	Author: Shubha Ramani (shubhar) *	Date: 2017年05月25日 15:20
Please assign this bug to me. I will submit a patch
msg294488 - (view)	Author: Shubha Ramani (shubhar) *	Date: 2017年05月25日 15:21
serhiy sure I will attach proof of the performance bottle-neck on 2.7 and 3.7 before I submit a patch. Please assign this bug to me.
msg294495 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2017年05月25日 16:20
I assigned this issue to me as the maintainer of the zipfile module. This means that I undertake to examine the results of benchmarking, make the review of the proposed patch, and merge it if it improves the performance and doesn't have negative side effects. This shouldn't stop you from writing your patch. Actually there was a reason why the code is written in that way. But maybe that reason no longer actual or there are stronger arguments for other ways. It is hard to say until I reproduce benchmarking results.
msg294496 - (view)	Author: Shubha Ramani (shubhar) *	Date: 2017年05月25日 16:42
Serhiy yes what you said makes sense. Thanks for clarifying. Updates (benchmarking results) shortly...stay tuned.
msg294503 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2017年05月25日 17:41
Serhiy, I am curious why did you have chosen to compute the CRC32 table everytime? It is standard (the generator polynomial does not change) and always will output the same values. And it is also less computational intensive to loading a precomputed array vs calculating it each time a zip archive is extracted.
msg294505 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2017年05月25日 18:17
I'm not the initial author of zipfile. That initial code for decrypting was added in issue698833. I like your idea about replacing the generating code with the precomputed table. Please open a separate issue for this.
msg294571 - (view)	Author: Shubha Ramani (shubhar) *	Date: 2017年05月26日 20:57
Upon comparing the PyPy changes from attached diff to the latest cpython3 github, I don't find a need for improvement. Looks like cpython3 zipfile.py has the same changes and the read() method in class ZipExtFile(io.BufferedIOBase) is vastly improved compared to what pypy had. In fact it has been completely re-written. The CPython 2.7 version of this bug was promptly closed as a duplicate: http://bugs.python.org/issue30467 Perhaps the changes from CPython3 are not being backported to CPython2 ? This bug can be closed because CPython3 has no issues.
msg294579 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2017年05月27日 05:11
The reason of using bytes concatenating rather than accumulating in the list, is that in most cases one of arguments is an empty bytes object (appending to the empty buffer or uncompressing a file with large compression block), and this case is optimized in CPython. In mos cases there is at most one nontrivial bytes concatenation per read operation, and using b''.join() is slower in that case.

History
Date	User	Action	Args
2022年04月11日 14:58:46	admin	set	github: 74653
2017年05月27日 05:11:26	serhiy.storchaka	set	messages: + msg294579
2017年05月27日 05:04:18	serhiy.storchaka	set	status: open -> closed resolution: not a bug stage: resolved
2017年05月26日 20:57:55	shubhar	set	messages: + msg294571
2017年05月25日 18:17:05	serhiy.storchaka	set	messages: + msg294505
2017年05月25日 17:41:06	alecsandru.patrascu	set	nosy: + alecsandru.patrascu messages: + msg294503
2017年05月25日 16:42:14	shubhar	set	messages: + msg294496
2017年05月25日 16:20:36	serhiy.storchaka	set	messages: + msg294495
2017年05月25日 15:21:06	shubhar	set	messages: + msg294488
2017年05月25日 15:20:03	shubhar	set	messages: + msg294487
2017年05月25日 11:10:26	serhiy.storchaka	set	messages: + msg294466
2017年05月25日 10:48:42	serhiy.storchaka	link	issue30467 superseder
2017年05月25日 10:46:53	serhiy.storchaka	set	assignee: serhiy.storchaka components: + Library (Lib) nosy: + twouters, alanmcintyre, serhiy.storchaka
2017年05月24日 23:08:12	shubhar	create

homepage