homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Expand zipimport to include other compression methods
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brian.curtin, eric.snow, gregory.p.smith, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, superluser
Priority: normal Keywords:

Created on 2013年01月20日 18:41 by rhettinger, last changed 2022年04月11日 14:57 by admin.

Messages (11)
msg180307 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年01月20日 18:41
Only a little of the existing logic is tied to the zipfile format. Consider adding support for xz, tar, tar.gz, tar.bz2, etc.
In particular, xz has better compression, resulting in both space savings and faster load times.
msg180310 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月20日 20:19
tar.* is not a good choice because it doesn't allow random access. Bare tar better than zip only in case when you need to save additional file attributes (Unix file access mode, times, owner, group, links). ZIP format supports all this too, but not zipfile module yet.
Adding bz2 or lzma compression to ZIP file shouldn't be too hard.
msg180311 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月20日 20:32
Here are some tests.
time 7z a -tzip -mx=0 python-0.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip python.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mx=9 python-9.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=bzip2 python-bzip2.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=bzip2 -mx=9 python-bzip2-9.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=lzma python-lzma.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=lzma -mx=9 python-lzma-9.zip $(find Lib -type f -name '*.py') >/dev/null
time 7z t python-0.zip >/dev/null
time 7z t python.zip >/dev/null
time 7z t python-9.zip >/dev/null
time 7z t python-bzip2.zip >/dev/null
time 7z t python-bzip2-9.zip >/dev/null
time 7z t python-lzma >/dev/null
time 7z t python-lzma.zip >/dev/null
time 7z t python-lzma-9.zip >/dev/null
wc -c python*.zip
Results:
 pack* unpack size
 time time (MB)
store 0.5 0.2 19.42
deflate 6 0.4 4.59
deflate-max 40 0.4 4.52
bzip2 6 2.1 4.45
bzip2-max 79 2.0 4.39
lzma 37 0.7 4.42
lzma-max 62 0.7 4.39
*) For pack time I take user time because 7-zip well parallelize deflate and bzip2 compression.
As you can see, a size difference between maximal compression with different methods only 3%. lzma decompress almost twice slower then deflate, and bzip2 decompress 5 times slower. Python files are too small to get benefit from advanced compression.
msg180313 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年01月20日 20:54
> Here are some tests.
I think you want to put pyc files in the zip file as well.
msg180314 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年01月20日 21:09
xz will likely be the best win -- it is purported to compress smaller than bz2 while retaining the decompression speed of zip.
As Antoine says, the usual practice is to add py, pyc, and pyo files to the compressed library; otherwise, there is an added cost with Python tries to write a missing pyc/pyo file.
msg180323 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月20日 21:55
Well.
./python -m compileall $(find Lib -type f -name '*.py')
./python -O -m compileall $(find Lib -type f -name '*.py')
Tests:
FILES="$(find Lib -name '*.py' -o -name '*.py[co]')"
time 7z a -tzip -mx=0 python-0.zip $FILES >/dev/null
time 7z a -tzip python.zip $FILES >/dev/null
time 7z a -tzip -mx=9 python-9.zip $FILES >/dev/null
time 7z a -tzip -mm=bzip2 python-bzip2.zip $FILES >/dev/null
time 7z a -tzip -mm=bzip2 -mx=9 python-bzip2-9.zip $FILES >/dev/null
time 7z a -tzip -mm=lzma python-lzma.zip $FILES >/dev/null
time 7z a -tzip -mm=lzma -mx=9 python-lzma-9.zip $FILES >/dev/null
time 7z t python-0.zip >/dev/null
time 7z t python.zip >/dev/null
time 7z t python-9.zip >/dev/null
time 7z t python-bzip2.zip >/dev/null
time 7z t python-bzip2-9.zip >/dev/null
time 7z t python-lzma.zip >/dev/null
time 7z t python-lzma-9.zip >/dev/null
wc -c python*.zip
Results:
 pack unpack size
 time time (MB)
store 1.6 0.5 65.4
deflate 19 0.9 17.5
deflate-max 134 0.9 17.2
bzip2 21 4.2 16.5
bzip2-max 294 4.1 16.3
lzma 120 2.3 15.9
lzma-max 204 2.3 15.8
All numbers are about 3x larger. lzma-max is 8% less than deflate-max but 2.5 times slower. Bzip2 is out of the game.
msg180324 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年01月20日 21:58
Agreed it doesn't look very promising.
msg180347 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013年01月21日 18:00
So this seems like a confluence of both supporting compressed files for loading source code as well as supporting new archive formats (e.g. xz vs. tar); zip just happens to do both implicitly. And there is also the question of if you explicitly plan to do this in C code or in pure Python as I plan to introduce a pure Python version of zipimport into importlib for 3.4 so that it can use zipfile directly and thus all of its full support of zipfile abilities.
And there doesn't have to be any performance cost in trying to write bytecode files; it's very simple to have a loader which simply skips that step entirely.
msg220589 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014年06月14日 22:19
related: issue #17630 and issue #5950 
msg267527 - (view) Author: (yan12125) * Date: 2016年06月06日 12:58
+1 for that. I like XZ support so that our application size can be reduced.
msg325729 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年09月19日 07:53
zipimport has been rewritten in pure Python (issue25711). Now it is easier to add support of other compression methods. Although I don't think that reducing the size by 3-8% is worth complicating the code.
If you still need this, I think that the simplest way is importing the zipfile module and monkey patching the simple ZIP file implementation in the zipimport module with zipfile-based implementation. This can be made only after importing zipfile itself, i.e. in case of zipping the stdlib, the zipfile module and its dependencies should be stored uncompressed or with the deflate compression.
History
Date User Action Args
2022年04月11日 14:57:40adminsetgithub: 61206
2022年04月06日 03:00:34yan12125setnosy: - yan12125
2022年04月05日 16:53:26christian.heimessetversions: + Python 3.11, - Python 3.8
2020年03月06日 20:01:35brett.cannonsetnosy: - brett.cannon
2018年09月19日 07:53:50serhiy.storchakasetmessages: + msg325729
versions: + Python 3.8, - Python 3.6
2016年06月06日 12:58:29yan12125setnosy: + yan12125
messages: + msg267527
2015年08月05日 15:58:39eric.snowsetnosy: + gregory.p.smith, superluser

versions: + Python 3.6, - Python 3.4
2014年06月14日 22:19:35eric.snowsetnosy: + eric.snow
messages: + msg220589
2014年06月14日 08:47:51serhiy.storchakalinkissue21751 superseder
2013年01月21日 18:00:53brett.cannonsetnosy: + brett.cannon
messages: + msg180347
2013年01月20日 21:58:08pitrousetmessages: + msg180324
2013年01月20日 21:55:39serhiy.storchakasetmessages: + msg180323
2013年01月20日 21:09:12rhettingersetmessages: + msg180314
2013年01月20日 20:54:26pitrousetnosy: + pitrou
messages: + msg180313
2013年01月20日 20:32:22serhiy.storchakasetmessages: + msg180311
2013年01月20日 20:19:58serhiy.storchakasetnosy: + serhiy.storchaka, nadeem.vawda

messages: + msg180310
stage: needs patch
2013年01月20日 18:45:44brian.curtinsetnosy: + brian.curtin
components: + Library (Lib)
2013年01月20日 18:41:42rhettingercreate

AltStyle によって変換されたページ (->オリジナル) /