homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lars.gustaebel Nosy List: lars.gustaebel, pombredanne, python-dev, taleinat
Priority: low Keywords: patch

Created on 2015年06月26日 09:18 by pombredanne, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
commons-logging-1.1.2-src.tar.gz pombredanne, 2015年06月26日 09:18 Problematic archive from http://archive.apache.org/dist/commons/logging/source/commons-logging-1.1.2-src.tar.gz
issue24514.diff lars.gustaebel, 2015年06月26日 10:00 Patch for 3.4 review
issue24514.diff lars.gustaebel, 2015年06月29日 13:32 New version of the patch for 3.4
Messages (10)
msg245839 - (view) Author: Philippe Ombredanne (pombredanne) * Date: 2015年06月26日 09:18
The extraction fails when calling tarfile.open using this archive: http://archive.apache.org/dist/commons/logging/source/commons-logging-1.1.2-src.tar.gz
After some investigation, the file can be extracted with gnu tar and bsdtar and the gzip compression is not the issue: if I gunzip the tar.gz to a tar and call tarfile on plain tar, the problem is the same.
Also this archive was created most likely on Windows (based on the `file` command output) using some Java tools per http://commons.apache.org/proper/commons-logging/building.html from these original files: http://svn.apache.org/repos/asf/commons/proper/logging/tags/LOGGING_1_1_2/ ... that's all I could find out.
The error trace is slightly different on 2.7 and 3.4 but similar. 
The problem has been verified on Linux 64 with Python 2.7 and 3.4 and on Windows with Python 2.7.
On 2.7:
>>> TarFile.taropen(name)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen
 return cls(name, mode, fileobj, **kwargs)
 File "/usr/lib/python2.7/tarfile.py", line 1574, in __init__
 self.firstmember = self.next()
 File "/usr/lib/python2.7/tarfile.py", line 2335, in next
 raise ReadError(str(e))
tarfile.ReadError: invalid header
On 3.4:
>>> TarFile.taropen(name)
Traceback (most recent call last):
 File "/usr/lib/python3.4/tarfile.py", line 180, in nti
 n = int(nts(s, "ascii", "strict") or "0", 8)
ValueError: invalid literal for int() with base 8: ' '
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
 File "/usr/lib/python3.4/tarfile.py", line 2248, in next
 tarinfo = self.tarinfo.fromtarfile(self)
 File "/usr/lib/python3.4/tarfile.py", line 1083, in fromtarfile
 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
 File "/usr/lib/python3.4/tarfile.py", line 1032, in frombuf
 obj.uid = nti(buf[108:116])
 File "/usr/lib/python3.4/tarfile.py", line 182, in nti
 raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen
 return cls(name, mode, fileobj, **kwargs)
 File "/usr/lib/python3.4/tarfile.py", line 1469, in __init__
 self.firstmember = self.next()
 File "/usr/lib/python3.4/tarfile.py", line 2260, in next
 raise ReadError(str(e))
tarfile.ReadError: invalid header
msg245840 - (view) Author: Philippe Ombredanne (pombredanne) * Date: 2015年06月26日 09:21
Note: the traceback above are from calling taropen on the gunzipped tar.gz
The error are similar but a tar less informative when using the tgz and open.
msg245844 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2015年06月26日 10:00
The problem is that the tar archive has empty uid and gid fields, i.e. 7 spaces terminated with a null-byte.
I attached a patch that solves the problem.
msg245845 - (view) Author: Philippe Ombredanne (pombredanne) * Date: 2015年06月26日 10:03
lars: you are my hero! you rock. I picture you being able to read through tar binary headers while you sleep. I am in awe.
msg245846 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2015年06月26日 10:10
You're welcome :-D
msg245847 - (view) Author: Philippe Ombredanne (pombredanne) * Date: 2015年06月26日 10:17
I verified that the patch issue24514.diff (adding .rstrip() ) works also on Python 2.7. I verified it also works on Python 3.4
I ran it on 2.7 against a fairly large test suite of tar files without problems.
This is a +1 for me.
Lars: Do you think you could apply it to 2.7 too?
msg245848 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2015年06月26日 10:35
Yes, Python 2.7 still gets bugfixes.
However, there's still some work to do on the patch (maybe clean the code, write a test, add a NEWS entry).
msg245934 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2015年06月29日 12:56
The patch is very simple, but this needs tests. At the very least, a simple tar file which reproduces this issue could be added to the tests.
Taking this a step further would be writing some unit tests for the internal nti() and itn() functions, and perhaps also stn() and nts().
msg245936 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2015年06月29日 13:32
I think a simple addition to the existing unittest for nti() will be enough. itn() seems well-tested, and nts() and stn() are not affected, because they don't operate on numbers.
msg246090 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年07月02日 17:44
New changeset 301d7efac3de by Lars Gustäbel in branch '2.7':
Issue #24514: tarfile now tolerates number fields consisting of only whitespace.
https://hg.python.org/cpython/rev/301d7efac3de
New changeset 140b4b7b84bd by Lars Gustäbel in branch '3.4':
Issue #24514: tarfile now tolerates number fields consisting of only whitespace.
https://hg.python.org/cpython/rev/140b4b7b84bd
New changeset 1692065524cc by Lars Gustäbel in branch '3.5':
Merge with 3.4: Issue #24514: tarfile now tolerates number fields consisting of only whitespace.
https://hg.python.org/cpython/rev/1692065524cc
New changeset 08fad9037206 by Lars Gustäbel in branch 'default':
Merge with 3.5: Issue #24514: tarfile now tolerates number fields consisting of only whitespace.
https://hg.python.org/cpython/rev/08fad9037206 
History
Date User Action Args
2022年04月11日 14:58:18adminsetgithub: 68702
2015年12月08日 21:40:46martin.panterlinkissue15858 superseder
2015年07月02日 17:45:30lars.gustaebelsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015年07月02日 17:44:38python-devsetnosy: + python-dev
messages: + msg246090
2015年06月29日 13:32:23lars.gustaebelsetfiles: + issue24514.diff

messages: + msg245936
2015年06月29日 12:56:45taleinatsetnosy: + taleinat
messages: + msg245934
2015年06月26日 10:35:47lars.gustaebelsetmessages: + msg245848
2015年06月26日 10:17:17pombredannesetmessages: + msg245847
2015年06月26日 10:10:34lars.gustaebelsetpriority: normal -> low
versions: + Python 3.5, Python 3.6
messages: + msg245846

assignee: lars.gustaebel
type: behavior
stage: patch review
2015年06月26日 10:03:13pombredannesetmessages: + msg245845
2015年06月26日 10:00:10lars.gustaebelsetfiles: + issue24514.diff
keywords: + patch
messages: + msg245844
2015年06月26日 09:21:27pombredannesetmessages: + msg245840
2015年06月26日 09:18:57pombredannecreate

AltStyle によって変換されたページ (->オリジナル) /