homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: gzip.open breaks with 'U' flag
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Chris.Barker, agillesp, jackdied, nadeem.vawda, python-dev, radek768
Priority: normal Keywords: easy, needs review, patch

Created on 2009年02月04日 01:05 by Chris.Barker, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
gzipU.diff skip.montanaro, 2009年02月05日 02:36 review
Messages (8)
msg81121 - (view) Author: Christopher Barker (Chris.Barker) Date: 2009年02月04日 01:05
If you pass the 'U' (Universal newlines) flag into gzip.open(), the flag
gets passed into the file open command used to open the gzip file
itself. As the 'U' flag can cause changes in teh data (Lineffed
translation), when it is used with a binary file open, the data is
corrupted, and all can go to heck.
In virtually all of my code that reads text files, I use the 'U' flag to
open files, it really helps not having to deal with newline issues. Yes,
they are fewer now that the Macintosh uses \n, but they can still be a pain.
Anyway, we added such support to some matplotlib methods, and found that
gzip file reading broken We were passing the flags though into either
file() or gzip.open(), and passing 'U' into gzip.open() turns out to be
fatal.
1) It would be nice if the gzip module (and the zip lib module)
supported Universal newlines -- you could read a compressed text file
with "wrong" newlines, and have them handled properly. However, that may
be hard to do, so at least:
2) Passing a 'U' flag in to gzip.open shouldn't break it -- it shuld be
ignored or raise an exeption.
I took a look at the Python SVN (2.5.4 and 2.6.1) for the gzip lib. I
see this:
 # guarantee the file is opened in binary mode on platforms
 # that care about that sort of thing
 if mode and 'b' not in mode:
 mode += 'b'
 if fileobj is None:
 fileobj = self.myfileobj = __builtin__.open(filename, mode
or 'rb')
this is going to break for 'U' == you'll get 'rUb'. I tested
file(filename, 'rUb'), and it looks like it does universal newline
translation.
So:
* Either gzip should be a bit smarter, and remove the 'U' flag (that's
what we did in the MPL code), or force 'rb' or 'wb'.
* Or: file opening should be a bit smarter -- what does 'rUb' mean? a
file can't be both Binary and Universal Text. Should it raise an
exception? Somehow I think it would be better to ignore the 'U', but
maybe that's only because of the issue I happen to be looking at now.
That later seems a better idea -- this issue could certainly come up in
other places than the gzip module, but maybe it would break a bunch of
code -- who knows?
I haven't touched py3 yet, so I have not idea if this issue is different
there. 
NOTE: passing in the 'U' flag doesn't guarantee that gzi will break. The
right combination of bytes needs to be there. In fact, when I first
tested this with a small test file, it worked just fine -- I though gzip
was ignoring the flag. However, when tested with a larger (real) gz
file, it did break.
very simple patch:
Add:
mode.replace('U', '')
to the above code before opeing the file 
But we may want to do something smarter...
see the (limited) discussion at:
http://mail.python.org/pipermail/python-dev/2009-January/085662.html 
msg81158 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009年02月04日 20:21
Seems like this should be fairly easy to do right. 'U' needs to be
removed from the flags but then applied to the lines read from the
stream.
msg81185 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009年02月05日 02:36
Here's a patch against trunk. Extra test case and minor doc tweak
included.
msg81954 - (view) Author: Radek (radek768) Date: 2009年02月13日 17:21
Same bug in 2.5, I don't know if the patch applies to 2.5
msg84207 - (view) Author: Jack Diederich (jackdied) * (Python committer) Date: 2009年03月26日 19:51
Unfortunately universal newlines are more complicated than replace() can
handle. See io.py, you may be able to use one of those classes to the
the universal new line handling on the cheap (or at least easy).
msg92007 - (view) Author: Art Gillespie (agillesp) Date: 2009年08月27日 15:42
The problem appears to be that the gzip module simply doesn't support
universal newlines yet.
I'm currently working on the zipfile module's universal newline support
(issue6759) so if nobody else is working on this, I'll do it.
I'm not sure if file object's open() behavior when presented with 'rUb'
is correct or not.
>>> f = open("test.txt", "w").write("blah\r\nblah\rblah\nblah\r\n")
>>> f = open("test.txt", "rUb")
>>> f.read()
'blah\nblah\nblah\nblah\n'
Since 'U' and 'b' are conceptually mutually exclusive on platforms where
'b' matters, I can see this being confusing.
msg173460 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年10月21日 16:30
New changeset e647229c422b by Nadeem Vawda in branch '2.7':
Issue #5148: Ignore 'U' in mode given to gzip.open() and gzip.GzipFile().
http://hg.python.org/cpython/rev/e647229c422b 
msg173462 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012年10月21日 16:34
The data corruption issue is now fixed in the 2.7 branch.
In 3.x, using a mode containing 'U' results in an exception rather than silent data corruption. Additionally, gzip.open() has supported text modes ("rt"/"wt"/"at") and newline translation since 3.3 [issue 13989].
History
Date User Action Args
2022年04月11日 14:56:45adminsetgithub: 49398
2012年10月21日 16:34:00nadeem.vawdasetstatus: open -> closed
versions: + Python 2.7, - Python 2.6
messages: + msg173462

resolution: fixed
stage: patch review -> resolved
2012年10月21日 16:30:13python-devsetnosy: + python-dev
messages: + msg173460
2012年02月11日 13:51:38nadeem.vawdasetnosy: + nadeem.vawda
2010年05月20日 20:31:56skip.montanarosetnosy: - skip.montanaro
2009年08月27日 15:42:10agillespsetnosy: + agillesp
messages: + msg92007
2009年03月26日 19:51:35jackdiedsetnosy: + jackdied
messages: + msg84207
2009年02月13日 17:21:29radek768setnosy: + radek768
messages: + msg81954
2009年02月05日 03:36:15skip.montanarosetkeywords: + needs review
stage: needs patch -> patch review
2009年02月05日 02:36:17skip.montanarosetfiles: + gzipU.diff
keywords: + patch
messages: + msg81185
2009年02月04日 20:21:46skip.montanarosetkeywords: + easy
nosy: + skip.montanaro
messages: + msg81158
stage: needs patch
2009年02月04日 01:05:41Chris.Barkercreate

AltStyle によって変換されたページ (->オリジナル) /