homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tarfile using fstat on GZip file object
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: BreamoreBoy, bartolsthoorn, docs@python, lars.gustaebel, martin.panter, python-dev
Priority: normal Keywords: patch

Created on 2014年09月23日 08:49 by bartolsthoorn, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
gettarinfo.patch martin.panter, 2015年04月20日 01:25 review
Messages (6)
msg227328 - (view) Author: Bart Olsthoorn (bartolsthoorn) Date: 2014年09月23日 08:49
CPython tarfile `gettarinfo` method uses fstat to determine the size of a file (using its fileobject). When that file object is actually created with Gzip.open (so a GZipfile), it will get the compressed size of the file. The addfile method will then continue to read the uncompressed data of the gzipped file, but will read too few bytes, resulting in a tar of incomplete files.
I suggest checking the file object class before using fstat to determine the size, and raise a warning if it's a gzip file.
To clarify, this only happens when adding a GZip file object to tar. I know that it's not a really common scenario, and the problem is really that GZip file size can only properly be determined by uncompressing and reading it entirely, but I think it's nice to not fail without warning.
So this is an example that is failing:
```
import tarfile
c = io.BytesIO()
with tarfile.open(mode='w', fileobj=c) as tar:
 for textfile in ['1.txt.gz', '2.txt.gz']:
 with gzip.open(textfile) as f:
 tarinfo = tar.gettarinfo(fileobj=f)
 tar.addfile(tarinfo=tarinfo, fileobj=f)
 data = c.getvalue()
return data
```
Instead this reads the proper filesize and writes the files to a tar:
```
import tarfile
c = io.BytesIO()
with tarfile.open(mode='w', fileobj=c) as tar:
 for textfile in ['1.txt.gz', '2.txt.gz']:
 with gzip.open(textfile) as f:
 buff = f.read()
 tarinfo = tarfile.TarInfo(name=f.name)
 tarinfo.size = len(buff)
 tar.addfile(tarinfo=tarinfo, fileobj=io.BytesIO(buff))
 data = c.getvalue()
return data
```
msg238961 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015年03月23日 00:12
msg227328 states "it's not a really common scenario" but I believe we must still allow for it, what do others think?
msg238967 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年03月23日 01:15
I think a warning in the documentation might be helpful.
However a special check in the code doesn’t seem right. Would you check for LZMAFile and BZ2File as well? Some of the other attributes (modification time, owner, etc) may be useful even for a GzipFile, and the programmer can just overwrite the file size attribute if necessary.
msg241582 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年04月20日 01:25
I am posting a documentation patch which I hope should clarify that objects like GzipFile won’t work automatically with gettarinfo(). It also has other modifications to address Issue 21996 (name must be text) and help with Issue 22208 (clarify non-OS files won’t work).
msg260537 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年02月20日 00:18
New changeset 94a94deaf06a by Martin Panter in branch '3.5':
Issues #22468, #21996, #22208: Clarify gettarinfo() and TarInfo usage
https://hg.python.org/cpython/rev/94a94deaf06a
New changeset e66c476b25ec by Martin Panter in branch 'default':
Issue #22468: Merge gettarinfo() doc from 3.5
https://hg.python.org/cpython/rev/e66c476b25ec
New changeset 9d5217aaea13 by Martin Panter in branch '2.7':
Issues #22468, #21996, #22208: Clarify gettarinfo() and TarInfo usage
https://hg.python.org/cpython/rev/9d5217aaea13 
msg260541 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016年02月20日 00:26
Hoping my clarification in the documentation is enough to call this fixed
History
Date User Action Args
2022年04月11日 14:58:08adminsetgithub: 66658
2016年02月20日 00:26:53martin.pantersetstatus: open -> closed
versions: + Python 2.7, Python 3.6, - Python 3.4
messages: + msg260541

resolution: fixed
stage: patch review -> resolved
2016年02月20日 00:18:57python-devsetnosy: + python-dev
messages: + msg260537
2016年02月09日 23:04:35martin.panterlinkissue21996 dependencies
2015年04月20日 01:25:10martin.pantersetfiles: + gettarinfo.patch

assignee: docs@python
components: + Documentation
versions: + Python 3.5
keywords: + patch
nosy: + docs@python

messages: + msg241582
stage: patch review
2015年03月23日 01:15:34martin.pantersetnosy: + martin.panter
messages: + msg238967
2015年03月23日 00:12:57BreamoreBoysetnosy: + BreamoreBoy
messages: + msg238961
2014年09月23日 18:22:37ned.deilysetnosy: + lars.gustaebel
2014年09月23日 08:49:52bartolsthoorncreate

AltStyle によって変換されたページ (->オリジナル) /