This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年12月29日 19:53 by dabeaz, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| gzipfile_read1.diff | nadeem.vawda, 2011年04月04日 10:56 | Add GzipFile.read1() plus tests | review | |
| gzipfile_read1.diff | nadeem.vawda, 2011年04月04日 17:01 | Fix bug in previous patch | review | |
| Messages (30) | |||
|---|---|---|---|
| msg124870 - (view) | Author: David Beazley (dabeaz) | Date: 2010年12月29日 19:53 | |
Is something like this supposed to work:
>>> import gzip
>>> import io
>>> f = io.TextIOWrapper(gzip.open("foo.gz"),encoding='ascii'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: readable
In a nutshell--reading a .gz file as text.
|
|||
| msg124875 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年12月29日 21:59 | |
Since GZipFile inherits from BufferedIOBase, and TextIOWrapper is supposed to be designed to wrap a BufferedIOBase object, I would say yes it ought to work. On the other hand there may also be a doc error there, since it may be that TextIOWrapper actually needs to wrap one of the subclasses of BufferedIOBase. |
|||
| msg124876 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年12月29日 22:00 | |
Oops. It only has that inheritance in 3.2. |
|||
| msg124877 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年12月29日 22:01 | |
Heh, and 2.7. Fixing versions yet again. |
|||
| msg124878 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年12月29日 22:12 | |
This should be easy to fix, if only the "readable" and "writable" methods are needed. Do you want to try writing a patch? |
|||
| msg124879 - (view) | Author: David Beazley (dabeaz) | Date: 2010年12月29日 22:49 | |
It goes without saying that this also needs to be checked with the bz2 module. A quick check seems to indicate that it has the same problem. While you're at it, maybe someone could add an 'open' function to bz2 to make it symmetrical with gzip as well :-). |
|||
| msg124880 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年12月29日 22:55 | |
bz2 is a pure C module, so that's a very different situation. |
|||
| msg124881 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年12月29日 22:56 | |
> While you're at it, maybe someone could add an 'open' function to bz2 > to make it symmetrical with gzip as well :-). That's a nice idea, but quite orthogonal to this issue. |
|||
| msg124882 - (view) | Author: David Beazley (dabeaz) | Date: 2010年12月29日 22:58 | |
C or not, wrapping a BZ2File instance with a TextIOWrapper to get text still seems like something that someone might want to do. I doubt it would take much modification to give BZ2File instances the required set of methods. |
|||
| msg124884 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年12月29日 23:05 | |
Right, but in the bz2 case I think it is a feature request rather than a bugfix. In any case it should be a separate issue. |
|||
| msg124885 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年12月29日 23:10 | |
> C or not, wrapping a BZ2File instance with a TextIOWrapper to get text > still seems like something that someone might want to do. I doubt it > would take much modification to give BZ2File instances the required > set of methods. BZ2File uses FILE pointers internally so it may be more complicated than it looks to be (because the methods may not have the right semantics). |
|||
| msg124886 - (view) | Author: David Beazley (dabeaz) | Date: 2010年12月29日 23:12 | |
Do Python devs really view gzip and bz2 as two totally completely different animals? They both have the same functionality and would be used for the same kinds of things. Maybe I'm missing something. |
|||
| msg124887 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年12月29日 23:14 | |
> Do Python devs really view gzip and bz2 as two totally completely > different animals? They both have the same functionality and would be > used for the same kinds of things. Maybe I'm missing something. Well, the reality of divergent implementation strategies trumps the theory of API compatibility :) The approach taken by bz2 is IMO regrettable, but it's not a ten minutes job to write it again from scratch. |
|||
| msg124888 - (view) | Author: David Beazley (dabeaz) | Date: 2010年12月29日 23:17 | |
Hmmm. Interesting. In the big picture, it might be an interesting project for someone (not necessarily the core devs) to sit down and refactor both of these modules so that they play nice with Python 3 I/O system. Obviously that's a project outside the scope of this bug or the 3.2 release for that matter. |
|||
| msg129187 - (view) | Author: David Beazley (dabeaz) | Date: 2011年02月23日 13:14 | |
Bump. This is still broken in Python 3.2. |
|||
| msg129198 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年02月23日 14:29 | |
If a patch had been proposed it probably would have gotten in to 3.2. Maybe someone (perhaps you?) will find the time before 3.2.1. Someone has decided to work on the bz2 rewrite, by the way (issue 5863). |
|||
| msg129199 - (view) | Author: David Beazley (dabeaz) | Date: 2011年02月23日 14:37 | |
If I can find some time, I may took a look at this. I just noticed that similar problems arise trying to wrap TextIOWrapper around the file-like objects returned by urllib.request.urlopen as well. In the big picture, some discussion of what it means to be "file-like" might be in order. If something is "file-like" and binary, should that always imply that I be able to wrap a TextIOWrapper object around it in order to encode/decode text? I would argue "yes", but I'd be curious to know what others think. |
|||
| msg129200 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年02月23日 14:42 | |
What is the problem with Python 3.2? It works correctly here:
$ cat bla.txt
bli
blo
bla
$ gzip bla.txt
$ ./python
Python 3.3a0 (unknown, Feb 23 2011, 13:03:50)
>>> import gzip, io
>>> f = io.TextIOWrapper(gzip.open("bla.txt.gz"),encoding='ascii')
>>> f.read()
'bli\nblo\nbla\n'
If someone added Python 3.2 in the Versions field because of an issue with bz2: please open a new issue instead.
|
|||
| msg129201 - (view) | Author: David Beazley (dabeaz) | Date: 2011年02月23日 14:46 | |
Python 3.2 (r32:88445, Feb 20 2011, 21:51:21)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> import io
>>> f = io.TextIOWrapper(gzip.open("file.gz"),encoding='latin-1')
>>> f.readline()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: read1
>>>
|
|||
| msg129203 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年02月23日 14:56 | |
Yes, a clear definition of the minimum requirements for being wrapped by TextIOWrapper sounds like a necessary thing to have (and I'd be inclined to agree with your assertion, but I didn't work on the IO library :). It would be best to open a new issue for that. |
|||
| msg129205 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年02月23日 15:04 | |
> Yes, a clear definition of the minimum requirements for being wrapped > by TextIOWrapper sounds like a necessary thing to have About that: is read1() argument mandatory or not? In _pyio, BufferedIOBase.read1() argument is optional (default: None); BytesIO.read1(), BufferedReader.read1(), BufferedRWPair.read1(), BufferedRandom.read1() argument is mandatory. In _io, BufferedIOBase.read1() raises directly an exception, without checking the arguments; BufferedReader.read1() argument is mandatory. In the io doc, BufferedIOBase.read1() argument is optional (default: -1), BytesIO.read1() has no argument (!) and BufferedReader.read1() argument is mandatory. |
|||
| msg129215 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月23日 16:52 | |
It would probably be ok to fallback on read() when read1() isn't implemented. read1() is supposed to be implemented by all BufferedIO-compliant classes, but in all honesty I don't think it's very useful in practice. It's supposed to be an optimization, and I think it's a misguided one; the generalized prefetch() primitive I proposed last year would certainly be more useful: see http://mail.python.org/pipermail/python-dev/2010-September/104194.html |
|||
| msg132757 - (view) | Author: Filip Gruszczyński (gruszczy) | Date: 2011年04月01日 16:52 | |
Is following change in GzipFile class enough: def read1(self, n): return self.read(n) ? This satisfies TextIOWrapper to run readline correctly. |
|||
| msg132899 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年04月03日 22:28 | |
> Is following change in GzipFile class enough: > > def read1(self, n): > return self.read(n) > > ? This satisfies TextIOWrapper to run readline correctly. Looks good to me. By the way, BZ2File now works correctly - the fix for issue5863 adds read1(). |
|||
| msg132902 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年04月03日 22:41 | |
> Nadeem Vawda <nadeem.vawda@gmail.com> added the comment: > > > Is following change in GzipFile class enough: > > > > def read1(self, n): > > return self.read(n) > > > > ? This satisfies TextIOWrapper to run readline correctly. > > Looks good to me. Well, ideally, read1() should satisfy the condition stated in the BufferedIOBase documentation - namely, that it issues at most one read() call on the underlying stream. |
|||
| msg132931 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年04月04日 10:56 | |
Here's an implementation of read1() that satisfies that condition, along with some relevant unit tests. |
|||
| msg132948 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年04月04日 14:18 | |
> Here's an implementation of read1() that satisfies that condition, along with > some relevant unit tests. Something looks fishy: what happens if size is -1 and EOFError is not raised? |
|||
| msg132953 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年04月04日 17:01 | |
> Something looks fishy: what happens if size is -1 and EOFError is not raised? You're right - I missed that possibility. In that case, extrasize and offset get updated incorrectly, which will break subsequent calls to seek() and tell(). However, it seems that subsequent reads work fine, because slicing a bytes object with a too-large upper bound doesn't raise an exception. The attached patch fixes this bug, and updates test_read1() to catch regressions. |
|||
| msg132959 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年04月04日 19:01 | |
New changeset 9775d67c9af9 by Antoine Pitrou in branch 'default': Issue #10791: Implement missing method GzipFile.read1(), allowing GzipFile http://hg.python.org/cpython/rev/9775d67c9af9 |
|||
| msg132960 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年04月04日 19:02 | |
Patch now committed, thank you! Since the patch adds a new API (GzipFile.read1()), I think it's better not to backport it. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:10 | admin | set | github: 55000 |
| 2011年04月07日 13:20:42 | jcea | set | nosy:
+ jcea |
| 2011年04月04日 19:02:53 | pitrou | set | status: open -> closed versions: - Python 2.7, Python 3.2 type: behavior -> enhancement messages: + msg132960 resolution: fixed stage: needs patch -> resolved |
| 2011年04月04日 19:01:58 | python-dev | set | nosy:
+ python-dev messages: + msg132959 |
| 2011年04月04日 17:01:25 | nadeem.vawda | set | files:
+ gzipfile_read1.diff messages: + msg132953 |
| 2011年04月04日 14:18:21 | pitrou | set | messages: + msg132948 |
| 2011年04月04日 10:56:01 | nadeem.vawda | set | files:
+ gzipfile_read1.diff keywords: + patch messages: + msg132931 |
| 2011年04月03日 22:41:02 | pitrou | set | messages: + msg132902 |
| 2011年04月03日 22:28:48 | nadeem.vawda | set | messages: + msg132899 |
| 2011年04月02日 10:27:46 | nadeem.vawda | set | nosy:
+ nadeem.vawda |
| 2011年04月01日 16:52:06 | gruszczy | set | nosy:
+ gruszczy messages: + msg132757 |
| 2011年02月23日 18:26:10 | alex | set | nosy:
+ alex |
| 2011年02月23日 16:52:33 | pitrou | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages: + msg129215 |
| 2011年02月23日 15:04:04 | vstinner | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages: + msg129205 |
| 2011年02月23日 14:56:00 | r.david.murray | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages: + msg129203 versions: + Python 3.2, Python 3.3 |
| 2011年02月23日 14:46:41 | dabeaz | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages: + msg129201 |
| 2011年02月23日 14:42:58 | vstinner | set | nosy:
+ vstinner messages: + msg129200 versions: - Python 3.2 |
| 2011年02月23日 14:37:08 | dabeaz | set | nosy:
pitrou, eric.araujo, r.david.murray, dabeaz messages: + msg129199 |
| 2011年02月23日 14:29:18 | r.david.murray | set | nosy:
pitrou, eric.araujo, r.david.murray, dabeaz messages: + msg129198 |
| 2011年02月23日 13:14:41 | dabeaz | set | nosy:
pitrou, eric.araujo, r.david.murray, dabeaz messages: + msg129187 |
| 2011年01月02日 18:34:37 | eric.araujo | set | nosy:
+ eric.araujo |
| 2010年12月29日 23:17:37 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124888 |
| 2010年12月29日 23:14:21 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124887 |
| 2010年12月29日 23:12:11 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124886 |
| 2010年12月29日 23:10:48 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124885 |
| 2010年12月29日 23:05:18 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124884 |
| 2010年12月29日 22:58:09 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124882 |
| 2010年12月29日 22:56:33 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124881 |
| 2010年12月29日 22:55:00 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124880 |
| 2010年12月29日 22:49:32 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124879 |
| 2010年12月29日 22:12:09 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124878 |
| 2010年12月29日 22:01:39 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124877 versions: + Python 2.7 |
| 2010年12月29日 22:00:18 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages: + msg124876 versions: - Python 3.1, Python 2.7 |
| 2010年12月29日 21:59:11 | r.david.murray | set | versions:
+ Python 3.1, Python 2.7 nosy: + r.david.murray, pitrou messages: + msg124875 stage: needs patch |
| 2010年12月29日 19:53:51 | dabeaz | create | |