This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年10月05日 20:08 by shashank, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| zipdecrypt.patch | shashank, 2010年10月05日 20:08 | |||
| zipdecrypt.patch | shashank, 2010年10月07日 13:50 | Updated patch with conditional use of C implementation | ||
| zipdecrypt.patch | shashank, 2010年10月12日 07:17 | modified tests to test both C and Py impl | ||
| zipdecrypt.patch | shashank, 2010年10月12日 07:56 | replacing tabs with spaces in the patch | ||
| zipdecrypt.patch | shashank, 2010年10月23日 21:22 | updated patch | ||
| zipdecrypt.patch | shashank, 2010年11月04日 16:04 | corrected patch | ||
| zipdecrypt-3.patch | rhdv, 2012年11月03日 21:27 | Patch for python3 | review | |
| zipdecrypt-2.7.patch | rhdv, 2012年11月03日 21:28 | Patch for python 2.7 | review | |
| doc.patch | rhdv, 2012年11月04日 13:20 | |||
| zipfile_decryptor_speedup.patch | serhiy.storchaka, 2012年11月04日 16:00 | Speedup of Python implementation | review | |
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 550 | merged | serhiy.storchaka, 2017年03月07日 15:34 | |
| Messages (22) | |||
|---|---|---|---|
| msg118030 - (view) | Author: Shashank (shashank) | Date: 2010年10月05日 20:08 | |
As promised in this thread http://mail.python.org/pipermail/python-dev/2009-August/091450.html (a year ago!), attached is a patch that replaces simple zip decryption logic written in pure python with that in C. As reported in the link above, this can result in speedups up to couple of orders of magnitude. There doesn't seem to be any need to add any new tests as this patch doesn't change any public API |
|||
| msg118039 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年10月05日 22:35 | |
It would be nice to retain the pure python version as a fallback for non CPython implementations, that will require tweaking the tests to make sure both are tested. |
|||
| msg118110 - (view) | Author: Shashank (shashank) | Date: 2010年10月07日 13:50 | |
I have updated the patch with a check for the availability of C impl and to use pure-py impl as a fallback. How do you suggest would the tests change? As I had mentioned before, in my understanding since there is no change in the API the already existing tests should work. One can simulate the absence of C impl in a system where it is present but AFAIU this is not what it is usually done (e.g, in the case of optional zlib dependency in the same module) |
|||
| msg118112 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年10月07日 15:24 | |
It is what is normally done *now* when there is both a C and a python implementation (see, for example, test_datetime.py and test_io.py for two different approaches to that). Not all tests have been updated to this practice. Thanks for working on this. |
|||
| msg118405 - (view) | Author: Shashank (shashank) | Date: 2010年10月12日 07:17 | |
Attached is a patch with changes in Lib/test/test_zipfile.py to test both C and pure-py impls (on systems where the C impl is present). Admittedly, this approach to emulating the absence of C impl is a bit hacky. This is primarily because the changed class is not a part of the public API and hence not being tested directly. David, could you verify that the approach is ok? |
|||
| msg118420 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年10月12日 12:34 | |
Hello, Some quick comments: - the C module should be private and therefore called _zipdecrypt - if you want to avoid API mismatch, you could give a tp_call to your C decrypter object, rather than a "decrypt" method - you can put all initialization code in zipdecrypt_new and avoid the need for zipdecrypt_init - it's better to use the "y*" code in PyArg_ParseTuple, rather than "s#" - you should define your module as PY_SSIZE_T_CLEAN and use Py_ssize_t as length variables (rather than int) - you *mustn't* change the contents of the buffer which is given you by "s#" or "y*", since that buffer is read-only (it can be a bytes object); instead, create a new bytes object using PyBytes_FromStringAndSize(NULL, length) and write into that; or, if you want a read-write buffer, use the "w*" code |
|||
| msg119476 - (view) | Author: Shashank (shashank) | Date: 2010年10月23日 21:22 | |
>the C module should be private and therefore called _zipdecrypt done >if you want to avoid API mismatch, you could give a tp_call to your C >decrypter object, rather than a "decrypt" method done >- you can put all initialization code in zipdecrypt_new and avoid the >need for zipdecrypt_init keeping this similar to the existing _ZipDecrypter class in ZipFile (all initialization in init rather than new), which was probably to allow re-init and re-use of one instance >it's better to use the "y*" code in PyArg_ParseTuple, rather than "s#" y* does not seem to be available in 2.7, using s* instead >you should define your module as PY_SSIZE_T_CLEAN and use Py_ssize_t >as length variables (rather than int) done >you *mustn't* change the contents of the buffer which is given you by >"s#" or "y*", since that buffer is read-only (it can be a bytes >object); instead, create a new bytes object using >PyBytes_FromStringAndSize(NULL, length) and write into that; or, if >you want a read-write buffer, use the "w*" code corrected, not altering the input buffer, reading input buffer as s* |
|||
| msg120417 - (view) | Author: Shashank (shashank) | Date: 2010年11月04日 15:54 | |
I had uploaded an incorrect patch. New corrected patch against trunk (on Mac OS uploaded). |
|||
| msg174699 - (view) | Author: Robert de Vries (rhdv) | Date: 2012年11月03日 21:27 | |
Attached you will find the updated patch for the python 3 tree as of now. I have measured a speed-up of more than a factor 100. |
|||
| msg174700 - (view) | Author: Robert de Vries (rhdv) | Date: 2012年11月03日 21:28 | |
Patch for python 2.7 Same patch as for python 3, backported to python 2.7 Tested on Linux only. |
|||
| msg174789 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月04日 12:17 | |
I quote from Gregory P. Smith (msg91897): """ The decryption provided by the zipfile module is for the worthless 32-bit crc based "encryption" of zipfiles. I think promoting the use of that is a bad idea. zipfile can be used by people to get their data out of such files. We should not encourage them to put it and/or their code into such a stupid format. """ I think that the effort required for speedup of this almost useless feature is excessive. If someone want to implement the "strong encryption" for zip files - welcome. |
|||
| msg174791 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月04日 12:27 | |
See also criticism in the original discussion: http://mail.python.org/pipermail/python-dev/2009-August/091450.html . |
|||
| msg174793 - (view) | Author: Robert de Vries (rhdv) | Date: 2012年11月04日 13:20 | |
My use case is decrypting files of 100's of megabytes. This is so slow that it is quite useless. About an hour or so. I do agree that the encryption is worthless, but that is not important for my use case where I want to discourage people from reverse engineering the contents. If it is so dangerous as some people have pointed out, it should be removed at the cost of not supporting a standard feature of ZIP files. In my opinion you either support a feature and you support it good (efficient) or you don't. As it stands now, users will be disappointed in using a supported feature. Some people argue that adding C code to Python is dangerous as it will lead to bugs, vulnerabilities etc. You could dismiss every addition with C code to Python with this argument, so there must be some positive aspects to outweigh the negative side. The negative side is fairly small (50 lines of very simple C code), plus some standard Python glue code. The benefit is a 100 fold increase in performance and the removal of 1 line of documentation telling that this feature is extremely slow. (patch attached) |
|||
| msg174794 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年11月04日 13:43 | |
We aren't particularly interested in helping people make their files slightly harder to reverse engineer, either, so I don't think that is a good enough reason for accepting this. There might be other reasons that are good enough, but I don't think that is one of them. |
|||
| msg174795 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2012年11月04日 13:48 | |
From the zlib FAQ: 38. How can I encrypt/decrypt zip files with zlib? zlib doesn't support encryption. The original PKZIP encryption is very weak and can be broken with freely available programs. To get strong encryption, use GnuPG, http://www.gnupg.org/ , which already includes zlib compression. For PKZIP compatible "encryption", look at http://www.info-zip.org/ I don't see the point of a weak and easily breakable encryption. |
|||
| msg174803 - (view) | Author: Robert de Vries (rhdv) | Date: 2012年11月04日 14:28 | |
If the encryption is so horrible why is there any support (with bad performance) at all in Python? It would be better to remove it altogether. This prevents users from building software using this feature only to find out later how bad the performance is. (This is the main reason why I have submitted this patch.) If the support had not been in Python I would not have used this feature to begin with. To reiterate my previous point. Either support something and do it well, or don't. Don't do a half job. Please note that there are more attempts to fix this, so I am not completely alone here. http://pypi.python.org/pypi/czipfile |
|||
| msg174806 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年11月04日 14:45 | |
> If the encryption is so horrible why is there any support (with bad > performance) at all in Python? > It would be better to remove it altogether. We don't remove it as it would break existing programs which rely on this feature. However adding a bunch of C code to improve performance of such a questionable feature is controversial. I wouldn't be against the acceleration myself, however I am not interested in reviewing or accepting it, sorry. |
|||
| msg174816 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月04日 16:00 | |
Here is a patch which optimize (speed up 2x) Python implementation of ZIP decryptor. It is almost the maximum of what can be achieved without significant degradation of maintainability. Of course, 2x is less then 100x, but it more portable and costs almost nothing. If that's not enough, I suggest to use an external unzip. You can even run multiple unzips at a time for the parallel extraction of multiple files. If in the future someone will implement the strong encryption for ZIP files, it is possible it will required a C accelerator module and it is possible there will be a place for PKWARE's traditional encryption. |
|||
| msg174823 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年11月04日 18:03 | |
> If the encryption is so horrible why is there any support (with bad > performance) at all in Python? I would say it there so that people can use python to "decrypt" an "encrypted" zip archive they have been sent that was generated by some other tool. I would say this is not a particularly strong use case since there are other tools that can be used for this, but as Antoine said removing the "feature" could break existing working code, so we are unlikely to do it. |
|||
| msg174830 - (view) | Author: Robert de Vries (rhdv) | Date: 2012年11月04日 20:36 | |
The current situation is now that the decryption is part of Python. It is well known to be computationally intensive and should therefore be implemented in C. This patch provides that support. The discussion if Python should support the decryption is behind us, as the support is already in. The only discussion should be about if there are enough users wanting this performance improvement to add it. From the download statistics of czipfile I would say that there are roughly 2500 users interested. |
|||
| msg290497 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年03月25日 20:54 | |
Brian, you requested tests, but PR 550 doesn't add new API and doesn't change the behavior of public API. No new tests are needed. |
|||
| msg290857 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年03月30日 16:09 | |
New changeset 06e522521c06671b4559eecf9e2a185c2d62c141 by Serhiy Storchaka in branch 'master': bpo-10030: Sped up reading encrypted ZIP files by 2 times. (#550) https://github.com/python/cpython/commit/06e522521c06671b4559eecf9e2a185c2d62c141 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:07 | admin | set | github: 54239 |
| 2017年03月30日 16:10:19 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017年03月30日 16:09:11 | serhiy.storchaka | set | messages: + msg290857 |
| 2017年03月25日 20:54:01 | serhiy.storchaka | set | nosy:
+ brian.curtin messages: + msg290497 versions: + Python 3.7, - Python 3.4 |
| 2017年03月07日 15:34:15 | serhiy.storchaka | set | pull_requests: + pull_request452 |
| 2012年11月04日 20:36:25 | rhdv | set | messages: + msg174830 |
| 2012年11月04日 18:03:34 | r.david.murray | set | messages: + msg174823 |
| 2012年11月04日 16:00:58 | serhiy.storchaka | set | files:
+ zipfile_decryptor_speedup.patch messages: + msg174816 |
| 2012年11月04日 14:45:54 | pitrou | set | messages: + msg174806 |
| 2012年11月04日 14:28:46 | rhdv | set | messages: + msg174803 |
| 2012年11月04日 13:48:57 | christian.heimes | set | nosy:
+ christian.heimes messages: + msg174795 |
| 2012年11月04日 13:43:22 | r.david.murray | set | messages: + msg174794 |
| 2012年11月04日 13:20:20 | rhdv | set | files:
+ doc.patch messages: + msg174793 |
| 2012年11月04日 12:27:51 | serhiy.storchaka | set | messages: + msg174791 |
| 2012年11月04日 12:17:00 | serhiy.storchaka | set | messages: + msg174789 |
| 2012年11月03日 22:06:11 | pitrou | set | nosy:
+ serhiy.storchaka versions: + Python 3.4, - Python 3.2 |
| 2012年11月03日 21:28:33 | rhdv | set | files:
+ zipdecrypt-2.7.patch messages: + msg174700 |
| 2012年11月03日 21:27:17 | rhdv | set | files:
+ zipdecrypt-3.patch nosy: + rhdv messages: + msg174699 |
| 2010年11月04日 16:04:00 | shashank | set | files: + zipdecrypt.patch |
| 2010年11月04日 16:03:29 | shashank | set | files: - zipdecrypt.patch |
| 2010年11月04日 15:54:18 | shashank | set | files:
+ zipdecrypt.patch messages: + msg120417 |
| 2010年10月23日 21:22:10 | shashank | set | files:
+ zipdecrypt.patch messages: + msg119476 |
| 2010年10月12日 12:34:12 | pitrou | set | versions:
+ Python 3.2, - Python 2.7 nosy: + pitrou messages: + msg118420 stage: patch review |
| 2010年10月12日 07:56:36 | shashank | set | files: + zipdecrypt.patch |
| 2010年10月12日 07:17:51 | shashank | set | files:
+ zipdecrypt.patch messages: + msg118405 |
| 2010年10月07日 15:24:43 | r.david.murray | set | messages: + msg118112 |
| 2010年10月07日 13:50:19 | shashank | set | files:
+ zipdecrypt.patch messages: + msg118110 |
| 2010年10月05日 22:35:56 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg118039 |
| 2010年10月05日 20:08:50 | shashank | create | |