This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年09月03日 19:47 by tlynn, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| bad.tar | tlynn, 2012年09月05日 19:13 | Example tarfile with one dir and one unlisted file | ||
| patch.py | tlynn, 2012年09月05日 19:16 | |||
| tarfile-patch.txt | akuchling, 2013年11月11日 19:10 | review | ||
| test_tarfile.patch | efloehr, 2014年02月24日 03:10 | New test to test_tarfile.py to test this issue and its solution | review | |
| Messages (11) | |||
|---|---|---|---|
| msg169799 - (view) | Author: Tom Lynn (tlynn) | Date: 2012年09月03日 19:47 | |
The tarfile module silently truncates the list of entries when reading a tar file if it sees an entry with a uid/gid field containing only spaces/NULs. I got such a tarball from Java Maven/plexus-archiver. I don't know whether they write such fields deliberately, but it seems reasonable to me, especially since they were providing the user/group names textually. I'd like to see two fixes - a None/-1/0 value for the uid/gid and not silently swallowing HeaderErrors in TarFile.next() (or at least documenting why it's being done). 0 would be consistent with the default value when writing, but None seems more honest. -1 seems hard to defend. Only tested on silly Python versions (2.6, PyPy-1.8), sorry. It's what I've got to hand, but I think this issue also applies to recent Python too going by looking at the hg trunk. |
|||
| msg169822 - (view) | Author: Tom Lynn (tlynn) | Date: 2012年09月04日 10:58 | |
I think the default has to be 0 for consistency with how other empty numeric fields are handled.
In theory spaces and NULs are supposed to be equivalent terminators in numeric fields, but I've just noticed that plexus-archiver is also using leading spaces rather than leading zeros (against the spec), e.g. ' 10422\x00 '. tarfile currently supports this, which I think is good, so I think the right approach is to lstrip(' ') fields and then treat space as a terminator.
|
|||
| msg169868 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2012年09月05日 12:18 | |
Could you provide some sample data and code? I see the problem, but I cannot quite reproduce the behaviour you describe. In all of my testcases tarfile either throws an exception or successfully reads the archive, but never silently stops. |
|||
| msg169887 - (view) | Author: Tom Lynn (tlynn) | Date: 2012年09月05日 19:13 | |
See attached bad.tar.
$ less bad.tar | cat
drwxr-xr-x 0/0 0 2012年09月05日 20:04 foo/
-rw-rw-r-- uname/gname 0 2012年09月05日 20:04 foo/a
$ python -c 'import tarfile; print(tarfile.open("bad.tar").getnames())'
['foo']
$ python -c 'import tarfile, patch; patch.patch_tarfile(); print (tarfile.open("bad.tar").getnames())'
['foo', 'foo/a']
I'm only allowed to attach one file via the tracker web UI, so patch.py will follow.
Creation code for bad.tar, largely for my benefit:
import java.io.FileOutputStream;
import java.io.IOException;
import org.codehaus.plexus.archiver.tar.TarOutputStream;
import org.codehaus.plexus.archiver.tar.TarEntry;
class TarTest {
public static void main(String[] args) throws IOException {
FileOutputStream fos = new FileOutputStream("bad.tar");
TarOutputStream tos = new TarOutputStream(fos);
TarEntry entry = new TarEntry("foo/");
entry.setMode(16877); // 0o40755
entry.setUserId(0);
entry.setGroupId(0);
entry.setUserName("");
entry.setGroupName("");
tos.putNextEntry(entry);
TarEntry entry2 = new TarEntry("foo/a");
entry2.setMode(33204); // 0o100664
entry2.setUserId(-1); // XXX: dodgy
entry2.setGroupId(-1); // XXX: dodgy
entry2.setUserName("uname");
entry2.setGroupName("gname");
tos.putNextEntry(entry2);
tos.close();
fos.close();
}
}
|
|||
| msg169888 - (view) | Author: Tom Lynn (tlynn) | Date: 2012年09月05日 19:16 | |
patch.py attached - what I'm using as a workaround at the moment. |
|||
| msg202641 - (view) | Author: A.M. Kuchling (akuchling) * (Python committer) | Date: 2013年11月11日 19:10 | |
Here's the changes from patch.py, put into patch format. I took out the inlining for ntb() and support for different tarfile APIs, and also replaced the use of .split(,1)[0] by .partition(). |
|||
| msg208225 - (view) | Author: Tom Lynn (tlynn) | Date: 2014年01月16日 00:21 | |
The secondary issue, which the patch doesn't address, is that TarFile.next() seems unpythonic; it treats any {Invalid,Empty,Truncated}HeaderError after offset 0 as EOF rather than propagating the exception. It looks deliberate, but I'm not sure why it would be done like that or if it should be changed.
|
|||
| msg212057 - (view) | Author: Eric Floehr (efloehr) | Date: 2014年02月24日 03:10 | |
I have attached a patch file for test_tarfile.py that uses the attached 'bad.tar' to test this issue. After applying this test patch, put 'bad.tar' in Lib/test with the name 'issue15858.tar'. This tests Tarfile.next(), but the issue can be seen via the command line interface, which uses .next() under the covers: Pre-patch: $ ./python -m tarfile -l Lib/test/issue15858.tar foo/ Post-patch: $ ./python -m tarfile -l Lib/test/issue15858.tar foo/ foo/a |
|||
| msg224013 - (view) | Author: Adam Tauno Williams (whitemice) | Date: 2014年07月25日 23:50 | |
test fails for me with provided bad.tar [as described in comment] but test passed after applying patch to tarfile. |
|||
| msg256115 - (view) | Author: Tom Lynn (tlynn) | Date: 2015年12月08日 15:21 | |
I think issue24514 (fixed in Py2.7.11) is a duplicate of this issue. |
|||
| msg256128 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年12月08日 21:40 | |
Yes I think you are right. With Python 3.5.0:
$ python3 -c 'import tarfile; print(tarfile.open("bad.tar").getnames())'
['foo', 'foo/a']
$ python3 -m tarfile --list bad.tar
foo/
foo/a
The proposed fix here is slightly different: truncate from the first space, rather than trailing whitespace.
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:35 | admin | set | github: 60062 |
| 2015年12月08日 21:40:46 | martin.panter | set | status: open -> closed superseder: tarfile fails to extract archive (handled fine by gnu tar and bsdtar) nosy: + martin.panter messages: + msg256128 resolution: out of date stage: test needed -> resolved |
| 2015年12月08日 15:21:12 | tlynn | set | messages: + msg256115 |
| 2014年12月31日 16:21:35 | akuchling | set | nosy:
- akuchling |
| 2014年07月25日 23:50:18 | whitemice | set | nosy:
+ whitemice messages: + msg224013 |
| 2014年02月24日 03:10:48 | efloehr | set | files:
+ test_tarfile.patch nosy: + efloehr messages: + msg212057 keywords: + patch |
| 2014年01月16日 00:21:47 | tlynn | set | messages: + msg208225 |
| 2014年01月15日 16:56:49 | serhiy.storchaka | set | stage: test needed versions: + Python 2.7, Python 3.3, Python 3.4, - Python 2.6, 3rd party |
| 2013年11月11日 19:10:36 | akuchling | set | files:
+ tarfile-patch.txt nosy: + akuchling messages: + msg202641 |
| 2012年09月05日 19:16:29 | tlynn | set | files:
+ patch.py messages: + msg169888 |
| 2012年09月05日 19:13:34 | tlynn | set | files:
+ bad.tar messages: + msg169887 |
| 2012年09月05日 12:18:50 | lars.gustaebel | set | assignee: lars.gustaebel messages: + msg169868 |
| 2012年09月04日 10:58:44 | tlynn | set | messages: + msg169822 |
| 2012年09月03日 23:25:05 | ned.deily | set | nosy:
+ lars.gustaebel |
| 2012年09月03日 19:47:40 | tlynn | create | |