homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Support xz compression in tarfile module
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: 6715 Superseder:
Assigned To: lars.gustaebel Nosy List: amaury.forgeotdarc, doko, eric.araujo, eysispeisi, georg.brandl, itkach, koen, lars.gustaebel, loewis, nadeem.vawda, paul.moore, pitrou, proyvind, python-dev, v+python, vstinner
Priority: normal Keywords: patch

Created on 2009年04月04日 16:08 by doko, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
2011年09月15日-tarfile-lzma.diff lars.gustaebel, 2011年09月15日 16:13 review
2011年12月08日-tarfile-lzma.diff lars.gustaebel, 2011年12月08日 11:47 review
lzma-preset.diff lars.gustaebel, 2011年12月23日 15:22 review
Messages (38)
msg85403 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2009年04月04日 16:08
GNU tar now supports lzma compression as a compression method. Please
consider adding lzma support to the tarfile module (either by using the
external lzma program or by adding a lzma extension to the standard
library).
lzma extension at http://svn.fancycode.com/repos/python/pylzma/trunk/
lzma is used in many tools (7zip, dpkg, rpm), offers faster
decompression than bzip2, slower compression than gzip and bzip2.
msg85424 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009年04月04日 18:36
As for an lzma module - I would prefer one that isn't LGPL'ed. Instead,
it should link against a system-provide lzma library (which then might
or might not licensed under lpgl). I would probably exclude the lzma
module from Windows, as distributing the lzma sources along with the
Python binaries is too painful.
msg85468 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009年04月05日 08:58
If we support LZMA, we should do so on all platforms; it kind of
restricts usefulness to only have it on some. Maybe the LZMA code in
one of the many archival tools in existence that supports it is not LGPL'd?
msg86197 - (view) Author: Koen van de Sande (koen) Date: 2009年04月20日 14:42
The LZMA implementation from 7-zip has been released as public domain 
(since version 4.62 / Nov 2008) in the LZMA SDK: http://www.7-zip.org/
sdk.html
So, there shouldn't be a license issue for Windows. I am not sure if 
there are already system-provided LZMA libraries on Linux at this time.
msg86205 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009年04月20日 19:51
> The LZMA implementation from 7-zip has been released as public domain 
> (since version 4.62 / Nov 2008) in the LZMA SDK: http://www.7-zip.org/
> sdk.html
That's good news. Now, if somebody could contribute a Python wrapper for
these...
> So, there shouldn't be a license issue for Windows. I am not sure if 
> there are already system-provided LZMA libraries on Linux at this time.
There are. The Linux version apparently originates from the same
sources, so they might be API compatible. However, I wouldn't mind
if we extracted the entire lzma library from 7zip, and put it into
the source distribution.
msg106425 - (view) Author: Per Øyvind Karlsen (proyvind) Date: 2010年05月25日 11:07
I'm the author of the pyliblzma module, and if desired, I'd be happy to help out adapting pyliblzma for inclusion with python.
Most of it's code is based on bz2module.c, so it shouldn't be very far away from being good 'nuff.
What I see as required is:
* clean out use of C99 types etc.
* clean up the LZMAOptions class (this is the biggest difference from the bz2 module, as the filter supports a wide range of various options, everything related such as parsing, api documentation etc. was placed in it's own class, I've yet to receive any feedback on this decission or find any remote equivalents out there to draw inspiration from;)
* While most of the liblzma API has been implemented, support for multiple/alternate filters still remains to be implemented. When done it will also cause some breakage with the current pyliblzma API.
I plan on doing these things sooner or later anyways, it's pretty much just a matter of motivation and priorities standing in the way, actual interest from others would certainly have a positive effect on this. ;)
For other alternatives to the LGPL liblzma, you really don't have any, keep in mind that LZMA is "merely" the algorithm, while xz (and LZMA_alone, used for '.lzma', now obsolete, but still supported) are the actual format you want support for. The LZMA SDK does not provide any compatibility for this.
msg106426 - (view) Author: Per Øyvind Karlsen (proyvind) Date: 2010年05月25日 11:14
ps: pylzma uses the LZMA SDK, which is not what you want.
pyliblzma (not the same module;) OTOH uses liblzma, which is the library used by xz/lzma utils
You'll find it available at http://launchpad.net/pyliblzma 
msg106469 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010年05月25日 19:24
> For other alternatives to the LGPL liblzma, you really don't have
> any,
If that's really the case (which I don't believe it is), then this
project stops right here. If the underlying library is LGPL, it would
require us to distribute its sources along with the Windows binaries,
which I'm not willing to do.
msg106482 - (view) Author: Koen van de Sande (koen) Date: 2010年05月25日 21:51
The XZ Utils website ( http://tukaani.org/xz/ ) states the following:
"The most interesting parts of XZ Utils (e.g. liblzma) are in the public domain. You can do whatever you want with the public domain parts. 
Some parts of XZ Utils (e.g. build system and some utilities) are under different free software licenses such as GNU LGPLv2.1, GNU GPLv2, or GNU GPLv3."
So, liblzma is not the problem. But the license of PylibLZMA is LGPL3.
msg106517 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年05月26日 08:51
> If the underlying library is LGPL, it would
> require us to distribute its sources along with the Windows binaries,
> which I'm not willing to do.
Martin, this is wrong, you don't have to bundle the source *in* the object code package. Making it available on some HTTP or FTP site is sufficient.
(actually, if we don't modify the library source, we can even point at the original download site)
msg106535 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2010年05月26日 15:31
Per, on 2010年03月17日, I asked you via email:
"I was looking at
 http://bugs.python.org/issue5689
 http://bugs.python.org/issue6715
and Martin's comments about the licensing of the bindings; is there a special reason for the lgpl3 license of the bindings, given that both python and xz-utils are not gpl'ed?"
Does pyliblzma need to be licensed under the lgpl3?
msg106553 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010年05月26日 17:44
Am 26.05.2010 10:51, schrieb Antoine Pitrou:
>
> Antoine Pitrou<pitrou@free.fr> added the comment:
>
>> If the underlying library is LGPL, it would
>> require us to distribute its sources along with the Windows binaries,
>> which I'm not willing to do.
>
> Martin, this is wrong, you don't have to bundle the source *in* the object code package.
That's why I said "along". I'm still not willing to do that: making the 
source available is still inconvenient. More importantly, anybody 
redistributing Python binaries would have to comply also (e.g. on 
CD-ROMs or py2exe binaries); this is a burden I don't want to impose
on our users. Fortunately, we don't have to, as the LZMA compression 
itself is in the public domain. For the Python wrapper, I hope that 
somebody contributes such a module under a PSF contributor agreement.
If nobody else does, I may write one from scratch one day.
msg106566 - (view) Author: Per Øyvind Karlsen (proyvind) Date: 2010年05月26日 18:29
if you're already looking at issue6715, then I don't get why you're asking.. ;)
quoting from msg106433:
"For my code, feel free to use your own/any other license you'd like or even public domain (if the license of bz2module.c that much of it's derived from permits of course)!"
The reason why I picked LGPLv3 in the past was simply just because liblzma at the time was licensed under it, so I just picked the same for simplicity.
I've actually already dual-licensed it under the python license in addition on the project page though, but I just forgot updating the module's metadata as well before I released 0.5.3 last month..
Martin: For LGPL (or even GPL for that matter, disregarding linking restrictions) libraries you don't have to distribute the sources of those libraries at all (they're already made available by others, so that would be quite overly redundant, uh?;). LGPL actually doesn't even care at all about the license of your software as long as you only dynamically link against it.
I don't really get what the issue would be even if liblzma were still LGPL, it doesn't prohibit you from distributing a dynamically linked library along with python either if necessary (which of course would be of convenience on win32..)..
tsktsk, discussions about python module for xz compression should anyways be kept at issue6715 as this one is about the tarfile module ;p
msg106573 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010年05月26日 19:54
> tsktsk, discussions about python module for xz compression should
> anyways be kept at issue6715 as this one is about the tarfile module
> ;p
Ok, following up there.
msg144084 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年09月15日 16:13
Attached is a patch with the current state of my work on lzma integration into tarfile (17 test errors).
msg148661 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年11月30日 15:12
Python now has an lzma module. Lars, do you have the time to update your patch or should I do it?
msg148716 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年12月01日 10:38
I will be happy to, but my spare time is limited right now, so this could take about a week. If this is a problem, please go ahead.
msg148746 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年12月01日 23:19
There is plenty of time until 3.3. OTOH, if Eric wants to work on it now: you got a week :-) Do recognize that there is a patch to start from already.
msg148755 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年12月02日 16:12
I’m perfectly happy to let Lars do it next week or next month, there is no rush. The existing patch may even require very little or no change, as Nadeem’s module (in the stdlib) provides the same classes than the other lzma module which was used by the patch.
msg149020 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年12月08日 11:47
For those who want to test it first, I post the current state of the patch here. It is ready for commit, there are no failing tests. If nobody objects, I will apply it this weekend.
msg149021 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月08日 12:00
Some comments about 2011年12月08日-tarfile-lzma.diff:
> elif self.buf.startswith(b"\x5d\x00\x00\x80") or self.buf.startswith(b"...
Micro-optimization: you can use self.buf.startswith((b"\x5d\x00\x00\x80", b"\xfd7zXZ")) here.
> raise ValueError("mode must be 'r' or 'w'.")
Error messages usually don't end with a dot (or am I wrong?).
It would be better to use a skip instead of just return here:
def test_no_name_argument(self):
 if self.mode.endswith("bz2") or self.mode.endswith("xz"):
 # BZ2File and LZMAFile have no name attribute.
 return
In _Stream.__init__, for zlib:
> self.exception = zlib.error
Could you add a test for this change?
msg149040 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年12月08日 16:42
Patch looks great. I did a review on Rietveld.
msg149180 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月10日 19:40
New changeset 899a8c7b2310 by Lars Gustäbel in branch 'default':
Issue #5689: Add support for lzma compression to the tarfile module.
http://hg.python.org/cpython/rev/899a8c7b2310 
msg149182 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年12月10日 19:48
Thanks for the review, guys! I can't close this issue yet because it depends on #6715.
msg149184 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年12月10日 20:08
Great stuff! I'll close this issue along with issue 6715 once the buildbot
stuff is all sorted out.
msg149325 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年12月12日 16:11
Lars, as part of a small doc patch I want to change this in tarfile.rst:
 The :mod:`tarfile` module makes it possible to read and write tar
 archives, including those using gzip or bz2 compression.
 -(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
 +Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
 +higher-level functions in :ref:`shutil <archiving-operations>`.
Any objection?
msg149331 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年12月12日 16:56
Please, go ahead!
msg149807 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月19日 00:28
There is failure on a XP buildbot. I don't know if it is a sporadic issue or not.
http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3921/steps/test/logs/stdio
======================================================================
ERROR: test_append_lzma (test.test_tarfile.AppendTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "D:\Buildslave3円.x.moore-windows\build\lib\test\test_tarfile.py", line 1539, in test_append_lzma
 self._create_testtar("w:xz")
 File "D:\Buildslave3円.x.moore-windows\build\lib\test\test_tarfile.py", line 1486, in _create_testtar
 with tarfile.open(self.tarname, mode) as tar:
 File "D:\Buildslave3円.x.moore-windows\build\lib\tarfile.py", line 1721, in open
 return func(name, filemode, fileobj, **kwargs)
 File "D:\Buildslave3円.x.moore-windows\build\lib\tarfile.py", line 1826, in xzopen
 mode=mode, fileobj=fileobj, preset=preset)
 File "D:\Buildslave3円.x.moore-windows\build\lib\lzma.py", line 117, in __init__
 preset=preset, filters=filters)
MemoryError
msg150085 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年12月22日 10:44
This failure seems to crop up often, but not on every run:
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3941/steps/test/logs/stdio
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3940/steps/test/logs/stdio
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3937/steps/test/logs/stdio
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3929/steps/test/logs/stdio
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3921/steps/test/logs/stdio
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3916/steps/test/logs/stdio
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3914/steps/test/logs/stdio
 http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3906/steps/test/logs/stdio
I've been able to reproduce the failure on my own XP machine;
I'll investigate it over the weekend.
msg150088 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年12月22日 10:54
Perhaps Paul can try to reproduce and diagnose the issue directly on the buildbot?
msg150111 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2011年12月22日 21:12
A simple rebuild and test run of that test in debug mode didn't fail...
I'll run the full test suite as a check, but that could take some time - that buildslave isn't the fastest in the world...
msg150119 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年12月22日 22:48
Not to worry - as I said in my previous message, I can reproduce the error
on my own XP machine.
I also noticed that running test_tarfile alone doesn't trigger the errors,
which leads me to suspect that the failure is due to some interaction with
another test getting run before test_tarfile. I'm currently trying to
determine what this test is.
I suspect that the problem is at least partially caused by the fact that
tarfile uses a default compresslevel of 9 for .tar.xz archives (rather
than the recommended value of 6). According to the man page for the xz
tool <http://manpages.ubuntu.com/manpages/lucid/man1/xz.1.html>, using a
compresslevel of 9 can result in memory usage of up to 800MB during
compression, which is a significant fraction of the bot's 2GB of RAM.
(I suppose it would be a good idea to mention this in the documentation
for the lzma module, so users won't get bitten by this...)
msg150161 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年12月23日 13:56
Wouldn't it be better then to use a default compresslevel of 6 in tarfile? I used level 9 in my patch without a particular reason, just because I thought 9 must be better than 6 ;-)
msg150165 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年12月23日 14:51
Yes, that's a good idea. I've been testing a similar change, and it seems
to drop the peak memory usage for test_tarfile from around 810MB down to
under 200MB. It looks like 2GB genuinely isn't enough to reliably use LZMA
compression with preset=9.
You might want to use preset=None instead of explicitly saying preset=6,
though. This tells LZMAFile to use the default preset, and will allow you
to get rid of the if-statement on lines 1821-1823.
Something unrelated that I noticed in the surrounding code: gzopen and
bz2open validate the mode by testing 'len(mode) > 1 or mode not in "rw"'.
This would be simpler as 'mode not in ("r", "w")' (like you've done in
xzopen), and it would accept only "r" and "w" (but not "" or "rw").
msg150167 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011年12月23日 15:22
Yes, that's much better. Thanks for the tip.
msg150184 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年12月23日 17:15
Patch looks good to me.
msg151485 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012年01月17日 20:36
Ping. Windows buildbots are still failing with MemoryError because of this preset=9.
The patch looks good to me as well.
msg151535 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年01月18日 13:04
New changeset b86b54fcb5c2 by Lars Gustäbel in branch 'default':
Issue #5689: Avoid excessive memory usage by using the default lzma preset.
http://hg.python.org/cpython/rev/b86b54fcb5c2 
History
Date User Action Args
2022年04月11日 14:56:47adminsetgithub: 49939
2012年03月06日 11:32:17nadeem.vawdasetstatus: open -> closed
2012年01月18日 13:04:28python-devsetmessages: + msg151535
2012年01月17日 20:36:51amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg151485
2011年12月28日 16:15:40nikratiosetnosy: - nikratio
2011年12月23日 17:15:05nadeem.vawdasetmessages: + msg150184
2011年12月23日 15:22:47lars.gustaebelsetfiles: + lzma-preset.diff

messages: + msg150167
2011年12月23日 15:21:13lars.gustaebelsetfiles: - lzma-preset.diff
2011年12月23日 14:51:18nadeem.vawdasetmessages: + msg150165
2011年12月23日 13:56:40lars.gustaebelsetfiles: + lzma-preset.diff

messages: + msg150161
2011年12月22日 22:48:44nadeem.vawdasetmessages: + msg150119
2011年12月22日 21:12:20paul.mooresetmessages: + msg150111
2011年12月22日 10:54:24pitrousetnosy: + paul.moore
messages: + msg150088
2011年12月22日 10:44:31nadeem.vawdasetmessages: + msg150085
2011年12月19日 00:28:50vstinnersetmessages: + msg149807
2011年12月12日 16:56:29lars.gustaebelsetmessages: + msg149331
2011年12月12日 16:11:53eric.araujosetmessages: + msg149325
2011年12月10日 20:08:06nadeem.vawdasetmessages: + msg149184
2011年12月10日 19:48:55lars.gustaebelsetresolution: fixed
messages: + msg149182
stage: needs patch -> resolved
2011年12月10日 19:40:24python-devsetnosy: + python-dev
messages: + msg149180
2011年12月08日 16:42:08eric.araujosetmessages: + msg149040
2011年12月08日 12:00:53vstinnersetnosy: + vstinner
messages: + msg149021
2011年12月08日 11:47:06lars.gustaebelsetfiles: + 2011年12月08日-tarfile-lzma.diff

messages: + msg149020
2011年12月02日 16:12:29eric.araujosetmessages: + msg148755
2011年12月01日 23:19:18loewissetmessages: + msg148746
2011年12月01日 10:38:22lars.gustaebelsetmessages: + msg148716
2011年11月30日 15:12:24eric.araujosetmessages: + msg148661
2011年09月15日 16:13:23lars.gustaebelsetfiles: + 2011年09月15日-tarfile-lzma.diff
assignee: lars.gustaebel
messages: + msg144084

keywords: + patch
2011年09月15日 16:01:06eric.araujosetnosy: + nadeem.vawda
title: please support lzma compression as an extension and in the tarfile module -> Support xz compression in tarfile module

components: + Library (Lib), - Extension Modules
versions: + Python 3.3, - Python 3.2
2010年08月29日 11:43:58eric.araujolinkissue8266 superseder
2010年08月19日 15:52:02eysispeisisetnosy: + eysispeisi
2010年07月21日 12:06:33eric.araujolinkissue5411 dependencies
2010年07月21日 12:05:47eric.araujosetstage: needs patch
versions: + Python 3.2, - Python 3.1, Python 2.7
2010年05月26日 19:54:32loewissetmessages: + msg106573
2010年05月26日 18:29:50proyvindsetmessages: + msg106566
2010年05月26日 17:44:04loewissetmessages: + msg106553
2010年05月26日 15:31:39dokosetmessages: + msg106535
2010年05月26日 08:51:07pitrousetnosy: + pitrou
messages: + msg106517
2010年05月26日 05:10:15lars.gustaebelsetnosy: + lars.gustaebel
2010年05月25日 21:51:54koensetmessages: + msg106482
2010年05月25日 19:24:44loewissetmessages: + msg106469
2010年05月25日 11:14:22proyvindsetmessages: + msg106426
2010年05月25日 11:07:04proyvindsetnosy: + proyvind
messages: + msg106425
2010年02月05日 19:39:23eric.araujosetnosy: + eric.araujo
2010年01月27日 15:58:57pitrousetdependencies: + xz compressor support
2010年01月27日 15:58:41pitrouunlinkissue6715 dependencies
2010年01月26日 08:18:43v+pythonsetnosy: + v+python
2009年09月01日 15:47:20itkachsetnosy: + itkach
2009年08月17日 11:37:26amaury.forgeotdarclinkissue6715 dependencies
2009年08月14日 03:26:32nikratiosetnosy: + nikratio
2009年04月20日 19:51:50loewissetmessages: + msg86205
title: please support lzma compression as an extension and in the tarfile module -> please support lzma compression as an extension and in the tarfile module
2009年04月20日 14:42:37koensetnosy: + koen
messages: + msg86197
2009年04月05日 08:58:34georg.brandlsetnosy: + georg.brandl
messages: + msg85468
2009年04月04日 18:36:37loewissetnosy: + loewis
messages: + msg85424
2009年04月04日 16:10:35pitrousetpriority: normal
versions: + Python 3.1, Python 2.7
2009年04月04日 16:08:34dokocreate

AltStyle によって変換されたページ (->オリジナル) /