This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年06月26日 09:18 by pombredanne, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| commons-logging-1.1.2-src.tar.gz | pombredanne, 2015年06月26日 09:18 | Problematic archive from http://archive.apache.org/dist/commons/logging/source/commons-logging-1.1.2-src.tar.gz | ||
| issue24514.diff | lars.gustaebel, 2015年06月26日 10:00 | Patch for 3.4 | review | |
| issue24514.diff | lars.gustaebel, 2015年06月29日 13:32 | New version of the patch for 3.4 | ||
| Messages (10) | |||
|---|---|---|---|
| msg245839 - (view) | Author: Philippe Ombredanne (pombredanne) * | Date: 2015年06月26日 09:18 | |
The extraction fails when calling tarfile.open using this archive: http://archive.apache.org/dist/commons/logging/source/commons-logging-1.1.2-src.tar.gz After some investigation, the file can be extracted with gnu tar and bsdtar and the gzip compression is not the issue: if I gunzip the tar.gz to a tar and call tarfile on plain tar, the problem is the same. Also this archive was created most likely on Windows (based on the `file` command output) using some Java tools per http://commons.apache.org/proper/commons-logging/building.html from these original files: http://svn.apache.org/repos/asf/commons/proper/logging/tags/LOGGING_1_1_2/ ... that's all I could find out. The error trace is slightly different on 2.7 and 3.4 but similar. The problem has been verified on Linux 64 with Python 2.7 and 3.4 and on Windows with Python 2.7. On 2.7: >>> TarFile.taropen(name) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen return cls(name, mode, fileobj, **kwargs) File "/usr/lib/python2.7/tarfile.py", line 1574, in __init__ self.firstmember = self.next() File "/usr/lib/python2.7/tarfile.py", line 2335, in next raise ReadError(str(e)) tarfile.ReadError: invalid header On 3.4: >>> TarFile.taropen(name) Traceback (most recent call last): File "/usr/lib/python3.4/tarfile.py", line 180, in nti n = int(nts(s, "ascii", "strict") or "0", 8) ValueError: invalid literal for int() with base 8: ' ' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.4/tarfile.py", line 2248, in next tarinfo = self.tarinfo.fromtarfile(self) File "/usr/lib/python3.4/tarfile.py", line 1083, in fromtarfile obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors) File "/usr/lib/python3.4/tarfile.py", line 1032, in frombuf obj.uid = nti(buf[108:116]) File "/usr/lib/python3.4/tarfile.py", line 182, in nti raise InvalidHeaderError("invalid header") tarfile.InvalidHeaderError: invalid header During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen return cls(name, mode, fileobj, **kwargs) File "/usr/lib/python3.4/tarfile.py", line 1469, in __init__ self.firstmember = self.next() File "/usr/lib/python3.4/tarfile.py", line 2260, in next raise ReadError(str(e)) tarfile.ReadError: invalid header |
|||
| msg245840 - (view) | Author: Philippe Ombredanne (pombredanne) * | Date: 2015年06月26日 09:21 | |
Note: the traceback above are from calling taropen on the gunzipped tar.gz The error are similar but a tar less informative when using the tgz and open. |
|||
| msg245844 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2015年06月26日 10:00 | |
The problem is that the tar archive has empty uid and gid fields, i.e. 7 spaces terminated with a null-byte. I attached a patch that solves the problem. |
|||
| msg245845 - (view) | Author: Philippe Ombredanne (pombredanne) * | Date: 2015年06月26日 10:03 | |
lars: you are my hero! you rock. I picture you being able to read through tar binary headers while you sleep. I am in awe. |
|||
| msg245846 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2015年06月26日 10:10 | |
You're welcome :-D |
|||
| msg245847 - (view) | Author: Philippe Ombredanne (pombredanne) * | Date: 2015年06月26日 10:17 | |
I verified that the patch issue24514.diff (adding .rstrip() ) works also on Python 2.7. I verified it also works on Python 3.4 I ran it on 2.7 against a fairly large test suite of tar files without problems. This is a +1 for me. Lars: Do you think you could apply it to 2.7 too? |
|||
| msg245848 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2015年06月26日 10:35 | |
Yes, Python 2.7 still gets bugfixes. However, there's still some work to do on the patch (maybe clean the code, write a test, add a NEWS entry). |
|||
| msg245934 - (view) | Author: Tal Einat (taleinat) * (Python committer) | Date: 2015年06月29日 12:56 | |
The patch is very simple, but this needs tests. At the very least, a simple tar file which reproduces this issue could be added to the tests. Taking this a step further would be writing some unit tests for the internal nti() and itn() functions, and perhaps also stn() and nts(). |
|||
| msg245936 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2015年06月29日 13:32 | |
I think a simple addition to the existing unittest for nti() will be enough. itn() seems well-tested, and nts() and stn() are not affected, because they don't operate on numbers. |
|||
| msg246090 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年07月02日 17:44 | |
New changeset 301d7efac3de by Lars Gustäbel in branch '2.7': Issue #24514: tarfile now tolerates number fields consisting of only whitespace. https://hg.python.org/cpython/rev/301d7efac3de New changeset 140b4b7b84bd by Lars Gustäbel in branch '3.4': Issue #24514: tarfile now tolerates number fields consisting of only whitespace. https://hg.python.org/cpython/rev/140b4b7b84bd New changeset 1692065524cc by Lars Gustäbel in branch '3.5': Merge with 3.4: Issue #24514: tarfile now tolerates number fields consisting of only whitespace. https://hg.python.org/cpython/rev/1692065524cc New changeset 08fad9037206 by Lars Gustäbel in branch 'default': Merge with 3.5: Issue #24514: tarfile now tolerates number fields consisting of only whitespace. https://hg.python.org/cpython/rev/08fad9037206 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:18 | admin | set | github: 68702 |
| 2015年12月08日 21:40:46 | martin.panter | link | issue15858 superseder |
| 2015年07月02日 17:45:30 | lars.gustaebel | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2015年07月02日 17:44:38 | python-dev | set | nosy:
+ python-dev messages: + msg246090 |
| 2015年06月29日 13:32:23 | lars.gustaebel | set | files:
+ issue24514.diff messages: + msg245936 |
| 2015年06月29日 12:56:45 | taleinat | set | nosy:
+ taleinat messages: + msg245934 |
| 2015年06月26日 10:35:47 | lars.gustaebel | set | messages: + msg245848 |
| 2015年06月26日 10:17:17 | pombredanne | set | messages: + msg245847 |
| 2015年06月26日 10:10:34 | lars.gustaebel | set | priority: normal -> low versions: + Python 3.5, Python 3.6 messages: + msg245846 assignee: lars.gustaebel type: behavior stage: patch review |
| 2015年06月26日 10:03:13 | pombredanne | set | messages: + msg245845 |
| 2015年06月26日 10:00:10 | lars.gustaebel | set | files:
+ issue24514.diff keywords: + patch messages: + msg245844 |
| 2015年06月26日 09:21:27 | pombredanne | set | messages: + msg245840 |
| 2015年06月26日 09:18:57 | pombredanne | create | |