homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: gzip module does the wrong thing with an os.fdopen()'ed fileobj
Type: behavior Stage: resolved
Components: Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nadeem.vawda Nosy List: antoine.pietri, gregory.p.smith, jld, nadeem.vawda, python-dev
Priority: normal Keywords: patch

Created on 2012年01月13日 22:31 by gregory.p.smith, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
gzip_fdopen_prob.py gregory.p.smith, 2012年01月13日 22:31
gzip-fdopen.diff nadeem.vawda, 2012年01月17日 11:41 review
Messages (9)
msg151203 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012年01月13日 22:31
gzip.GzipFile accepts a fileobj parameter with an open file object.
Unfortunately gzip requires a filename be embedded in the gzip file and the gzip module code uses fileobj.name to get that.
This results in the fake "<fdopen>" name from posixmodule.c being embedded in the output gzipped file when using Python 2.x. This causes problems when ungzipping these files with gzip -d or ungzip implementations that always rely on the embedded filename when writing their output file rather than stripping a suffix from the input filename as they cannot open a file called "<fdopen>" or if they do, each successive ungzip overwrites the previous...
On Python 3.x the problem is different, the gzip module fails entirely when given an os.fdopen()'ed file object:
$ ./python gzip_fdopen_prob.py 
out_file <_io.BufferedWriter name='FOO.gz'>
out_fd 3
fd_out_file <_io.BufferedWriter name=3>
fd_out_file.name 3
Traceback (most recent call last):
 File "gzip_fdopen_prob.py", line 13, in <module>
 gz_out_file = gzip.GzipFile(fileobj=fd_out_file)
 File "/home/gps/oss/cpython/default/Lib/gzip.py", line 184, in __init__
 self._write_gzip_header()
 File "/home/gps/oss/cpython/default/Lib/gzip.py", line 221, in _write_gzip_header
 fname = os.path.basename(self.name)
 File "/home/gps/oss/cpython/default/Lib/posixpath.py", line 132, in basename
 i = p.rfind(sep) + 1
AttributeError: 'int' object has no attribute 'rfind'
(code attached)
The os.fdopen()'ed file object is kindly using the integer file descriptor as its .name attribute. That might or might not be an issue, but regardless of that:
1) GzipFile should not fail in this case.
2) GzipFile should never embed a fake made up filename in its output.
Fixing the gzip module to catch errors and use an empty b'' filename for the gzip code in the above error is easy.
What should be done about the .name attribute on fake file objects? I don't think it should exist at all.
(another quick test shows that gzip in python 3.x can't output to a BytesIO fileobj at all, it thinks it is readonly)
msg151414 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年01月16日 22:37
For 3.x, I think that ignoring non-string names is a reasonable fix. The docs
for io.FileIO specify that its name attribute can be either a path or an integer
file descriptor, and changing this doesn't seem to serve any purpose.
As for the case of 2.7's bogus "<fdopen>" name attribute, I'm not sure what the
best course of action is. I agree that ideally we would want to get rid of the
attribute altogether (for objects returned by fdopen), or change the semantics
to those used by FileIO in 3.x, but making that sort of change in a bugfix
release seems unwise.
One alternative would be for GzipFile to specifically check whether a file
object was returned by fdopen(), and if so ignore the fake name. I'm not sure
how this could be accomplished, though - just checking for name == "<fdopen>" is
too fragile for my liking, and I can't see any other obvious way of
distinguishing objects created by fdopen() from those created by open().
> (another quick test shows that gzip in python 3.x can't output to a BytesIO
> fileobj at all, it thinks it is readonly)
Are you sure about this? I can't reproduce the problem. Running this script:
 import gzip, io
 b = io.BytesIO()
 with gzip.GzipFile(fileobj=b, mode="w") as g:
 g.write(b"asdf ghjk")
 print(b.getvalue())
 b.seek(0)
 with gzip.GzipFile(fileobj=b, mode="r") as g:
 print(g.read())
I get the following output:
 b'\x1f\x8b\x08\x00\xe1\xa4\x14O\x02\xffK,NISH\xcf\xc8\xca\x06\x00P\xd2\x1cJ\t\x00\x00\x00'
 b'asdf ghjk'
msg151446 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年01月17日 11:41
Attached is a fix for 3.x.
msg151511 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012年01月18日 02:26
thanks that looks good.
As far as fixing this for 2.7 goes, i don't like the _sound_ of it because it is gross... But i'm actually okay with having special case code in the gzip module that rejects '<fdopen>' as an actual filename and uses '' instead in that case. It is VERY unlikely that anyone ever intentionally wants to use that as a filename.
Anything more than that (changing the actual '<fdopen>' string for example) seems too invasive and might break someone's doctests and does genuinely make it more difficult to see what a fdopened file object is from its repr.
msg151520 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年01月18日 07:32
New changeset 7d405058e458 by Nadeem Vawda in branch '3.2':
Issue #13781: Fix GzipFile to work with os.fdopen()'d file objects.
http://hg.python.org/cpython/rev/7d405058e458
New changeset fe36edf3a341 by Nadeem Vawda in branch 'default':
Merge: #13781: Fix GzipFile to work with os.fdopen()'d file objects.
http://hg.python.org/cpython/rev/fe36edf3a341 
msg151521 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年01月18日 07:55
> As far as fixing this for 2.7 goes, i don't like the _sound_ of it
> because it is gross... But i'm actually okay with having special case
> code in the gzip module that rejects '<fdopen>' as an actual filename
> and uses '' instead in that case. It is VERY unlikely that anyone ever
> intentionally wants to use that as a filename.
I agree - it sounds ugly, but pragmatically it seems like the best option.
Given that the output will still be a valid gzip file even in this rare
case, it seems unlikely to cause trouble even then.
msg151568 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012年01月18日 21:19
Looks like you've got commit privs (yay) so i'm assigning this to you to take care of that way for 2.7 as well.
I'd add a comment to the fdopen C code where the "<fdopen>" constant lives as well as to the gzip.py module around the special case for this mentioning that they should be kept in sync. (not that either is _ever_ likely to be changed in 2.7)
msg151572 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年01月18日 22:41
New changeset a08e9e84f33f by Nadeem Vawda in branch '2.7':
Issue #13781: Fix GzipFile to work with os.fdopen()'d file objects.
http://hg.python.org/cpython/rev/a08e9e84f33f 
msg151581 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年01月18日 23:16
Done.
History
Date User Action Args
2022年04月11日 14:57:25adminsetgithub: 57990
2014年04月05日 20:49:17ezio.melottisetfiles: - test_tarfile_fdopen.diff
2014年04月05日 20:45:58ezio.melottisetmessages: - msg215627
2014年04月05日 20:43:34antoine.pietrisetfiles: + test_tarfile_fdopen.diff
nosy: + antoine.pietri
messages: + msg215627

2012年01月18日 23:16:40nadeem.vawdasetstatus: open -> closed
type: behavior
messages: + msg151581

resolution: fixed
stage: resolved
2012年01月18日 22:41:57python-devsetmessages: + msg151572
2012年01月18日 21:19:26gregory.p.smithsetassignee: gregory.p.smith -> nadeem.vawda
messages: + msg151568
versions: - Python 3.2, Python 3.3
2012年01月18日 07:55:58nadeem.vawdasetmessages: + msg151521
2012年01月18日 07:32:32python-devsetnosy: + python-dev
messages: + msg151520
2012年01月18日 02:26:08gregory.p.smithsetassignee: gregory.p.smith
messages: + msg151511
2012年01月17日 11:41:58nadeem.vawdasetfiles: + gzip-fdopen.diff
keywords: + patch
messages: + msg151446
2012年01月16日 22:37:57nadeem.vawdasetmessages: + msg151414
2012年01月15日 10:04:40jldsetnosy: + jld
2012年01月14日 00:07:06pitrousetnosy: + nadeem.vawda

versions: - Python 2.6, Python 3.1
2012年01月13日 22:31:55gregory.p.smithcreate

AltStyle によって変換されたページ (->オリジナル) /