homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: fileinput.hook_compressed returning bytes from gz file
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.araujo Nosy List: amaury.forgeotdarc, eric.araujo, jaraco, methane, mnewman, r.david.murray, tati_alchueyr
Priority: normal Keywords: patch

Created on 2009年04月14日 23:48 by mnewman, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
example.zip mnewman, 2009年04月14日 23:47 ZIP file containing test example
issue5758_fileinput_gzip_with_encoding.patch tati_alchueyr, 2012年03月13日 00:10 Fixed issue #5758. fileinput.hook_compressed returns string (and not bytes) from gz file again. review
issue5758_fileinput_gzip_with_encoding_v2.patch tati_alchueyr, 2012年03月13日 07:36 New version of the patch after reviewer's considerations review
Pull Requests
URL Status Linked Edit
PR 25272 merged methane, 2021年04月08日 03:31
Messages (21)
msg85978 - (view) Author: Michael Newman (mnewman) Date: 2009年04月14日 23:47
The attached ZIP file contains "test.bat" which runs "test.py" with
Python 2.6 and Python 3.0.
Python 2.6 behaves as expected (see "py26.out"), since it returns
strings from both "mike.txt" and "mike.txt.gz". However, the same test
with Python 3.0 returns bytes from "mike.txt.gz", as shown in "py30.out":
Output: Hello from Mike.
Output: This is the second line.
Output: Why did the robot cross the road?
Output: b'Hello from Mike.'
Output: b'This is the second line.'
Output: b'Why did the robot cross the road?'
For reference, I tested this on Python versions:
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit
(Intel)] on win32
msg111063 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010年07月21日 14:23
gzip.open() only implements the "rb" mode, and returns bytes.
fileinput could certainly wrap it with a io.TextIOWrapper.
Then the encoding issue arises.
fileinput.FileInput should grow an "encoding" parameter instead of always relying on the default encoding.
msg155471 - (view) Author: Tatiana Al-Chueyr (tati_alchueyr) * Date: 2012年03月12日 20:09
Working on this
msg155532 - (view) Author: Tatiana Al-Chueyr (tati_alchueyr) * Date: 2012年03月13日 00:10
I had a lot of help of R. David Murray, could someone review our patch, please?
We are specially concerned about backward compatibility.
msg155538 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年03月13日 00:30
Thanks for the patch. I commented on not-so-important things on the code review site (which should have sent an email to Tatiana) and wanted to raise an issue here.
Is it best that the encoding used defaults to Python’s default encoding (UTF-8) or the I/O default encoding (depends on the OS and locale), like what open does?
msg155540 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月13日 00:34
I think it should use the same default as open, but frankly I couldn't remember the right way to achieve that. Suggestions welcome :)
msg155543 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年03月13日 00:40
We can use locale.getpreferredencoding, or be smarter and just pass down encoding=encoding to TextIOWrapper and let *it* call getpreferredencoding if it’s given a None :)
msg155548 - (view) Author: Tatiana Al-Chueyr (tati_alchueyr) * Date: 2012年03月13日 01:03
Excellent! Thanks for all remarks, Éric! I'll follow your suggestions and will submit a new patch.
msg155553 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月13日 01:26
Duh. I should have remembered that the io module used None to represent the default encoding. Thanks, Éric.
msg155554 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年03月13日 01:27
Nobody can remember everything: I tried in a shell and looked at the docs to find it :)
msg155577 - (view) Author: Tatiana Al-Chueyr (tati_alchueyr) * Date: 2012年03月13日 07:36
Improved patch, according to eric.araujo's suggestions and mnewman's guidance.
msg155595 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年03月13日 12:11
Thanks. Unless another core dev wants to do a complementary review I will slightly tweak the patch and commit it. I need to finish waking up and eat some food before I do that :)
Technically adding a new argument means that this is a new feature and cannot be applied to the stable 3.2 version, but something needs to be done for this bug in 3.2 too, like a recipe in the docs for a hook_compressed that returns strings (i.e. a function that calls fileinput.hook_compressed and wraps it in a TextIOWrapper), or at least a note to warn about this bug.
msg155618 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月13日 14:41
For 3.2 could we use the same fix, but without exposing the ability to *change* the encoding? That is, we use TextIOWrapper but always with the default None for encoding.
It also occurs to me that this really exposes a weakness in the design. What if the user wants to specify other open parameters? I wonder if we should say that for better future-proofing openhooks should always take **kw. You could even envision fileinput accepting **kw and passing them along to the openhook. I think charset is the most important open paramenter in this context, though, so I don't think we have to solve the general problem in this fix.
msg155622 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月13日 15:11
It also occurs to me that this fix makes the charset hook look rather odd. We could render it redundant by passing charset to open in the non-openhook case, and mark it deprecated.
There is also a bug in the hook_encoding docs. It says the file is opened with codecs.open, but that is not the case, regular open is used.
msg155624 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年03月13日 15:20
> For 3.2 could we use the same fix, but without exposing the ability to *change* the encoding?
> That is, we use TextIOWrapper but always with the default None for encoding.
Yes!
> It also occurs to me that this really exposes a weakness in the design. What if the user wants to
> specify other open parameters? I wonder if we should say that for better future-proofing openhooks
> should always take **kw. You could even envision fileinput accepting **kw and passing them along
> to the openhook. I think charset is the most important open paramenter in this context, though, so
> I don't think we have to solve the general problem in this fix.
I concur. I’ve never had to care about buffering for example, but mode is another parameter of open that people may want to give. I’ll commit the minimal fix to 3.2 and merge in 3.3, and then we can discuss on a new RFE bug about adding encoding vs. **kwargs for 3.3.
Agreed on deprecating the charset hook when it becomes redundant.
Will fix the doc bug about codecs.open too.
msg348642 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年07月29日 11:54
This issue is not newcomer friendly, I remove the easy keyword.
msg394286 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021年05月25日 02:32
The patch for this change has broken code that relied on the old behavior. For example:
```
import fileinput
import bz2
import pathlib
target = pathlib.Path('data.bz2')
target.write_bytes(bz2.compress(b'Foo\nBar\nBiz'))
inp = fileinput.FileInput([str(target)], openhook=fileinput.hook_compressed)
for line in inp:
 print(line.decode().strip())
```
That code passes on Python3.10a6 but on 3.10b1 fails with:
```
Traceback (most recent call last):
 File "/Users/jaraco/code/main/cmdix/text.py", line 10, in <module>
 print(line.decode().strip())
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?
```
I encountered this issue in [this function and its test](https://github.com/jaraco/cmdix/blob/dc5fac3817ff9815b2f8d9a1dfad4258c14b1693/cmdix/lib.py#L165-L193).
I'm struggling with how to adapt the code to provide a uniform interface on Python 3.6+. Perhaps a backport of hook_compressed is in order.
msg394287 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021年05月25日 02:59
A backport now exists (https://pypi.org/project/backports.hook_compressed) and addresses the issue (https://github.com/jaraco/cmdix/actions/runs/873404846).
msg394290 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021年05月25日 04:04
> I'm struggling with how to adapt the code to provide a uniform interface on Python 3.6+.
It's easy. Just passing `mode="rb"`.
```
FileInput(filelist, mode="rb", openhook=fileinput.hook_compressed)
```
This is better than `backports.hook_compressed`, because it returns bytes always regardless file is compressed or uncompressed.
msg394338 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021年05月25日 12:37
I did consider and confirm that mode="rb" does also provide a uniform solution, but it also regresses the behavior of uncompressed inputs, making them bytes where they were text. This approach feels like the "Python 1" compatibility approach.
At least in my case, the preference was to migrate to the future behavior, where all content is decoded to Unicode text, without having to wait for Python 3.9 to sunset.
msg394413 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021年05月26日 03:46
> I did consider and confirm that mode="rb" does also provide a uniform solution, but it also regresses the behavior of uncompressed inputs, making them bytes where they were text.
Of course, I suggested to use "rb" when you want to read bytes.
* When reading binary from uncompressed file, mode="rb" just works for 3.6+.
* When reading text from uncompressed file, mode="r" just works for 3.6+.
* When reading binary from compressed file, mode="rb" just works for 3.6+.
* Reading text from compressed file is not supported until Python 3.10.
History
Date User Action Args
2022年04月11日 14:56:47adminsetgithub: 50008
2021年05月26日 03:46:18methanesetmessages: + msg394413
2021年05月25日 12:37:27jaracosetmessages: + msg394338
2021年05月25日 04:04:56methanesetmessages: + msg394290
2021年05月25日 02:59:20jaracosetmessages: + msg394287
2021年05月25日 02:32:44jaracosetnosy: + jaraco
messages: + msg394286
2021年04月14日 05:13:39methanesetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021年04月08日 03:31:16methanesetnosy: + methane

pull_requests: + pull_request24008
stage: needs patch -> patch review
2021年04月08日 01:34:40methanelinkissue36865 superseder
2021年04月07日 10:01:51vstinnersetnosy: - vstinner
2021年04月07日 07:54:20methanesetversions: + Python 3.10, - Python 3.2, Python 3.3
2019年07月29日 11:54:09vstinnersetkeywords: - easy

messages: + msg348642
2012年03月13日 15:20:46eric.araujosetmessages: + msg155624
2012年03月13日 15:11:24r.david.murraysetmessages: + msg155622
2012年03月13日 14:41:45r.david.murraysetmessages: + msg155618
2012年03月13日 12:11:43eric.araujosetassignee: eric.araujo
messages: + msg155595
versions: + Python 3.3, - Python 3.1
2012年03月13日 07:36:44tati_alchueyrsetfiles: + issue5758_fileinput_gzip_with_encoding_v2.patch

messages: + msg155577
2012年03月13日 01:27:17eric.araujosetmessages: + msg155554
2012年03月13日 01:26:25r.david.murraysetmessages: + msg155553
2012年03月13日 01:03:13tati_alchueyrsetmessages: + msg155548
2012年03月13日 00:40:43eric.araujosetmessages: + msg155543
2012年03月13日 00:34:20r.david.murraysetnosy: + r.david.murray
messages: + msg155540
2012年03月13日 00:30:35eric.araujosetnosy: + eric.araujo, vstinner
messages: + msg155538
2012年03月13日 00:10:05tati_alchueyrsetfiles: + issue5758_fileinput_gzip_with_encoding.patch
keywords: + patch
messages: + msg155532
2012年03月12日 20:09:15tati_alchueyrsetnosy: + tati_alchueyr
messages: + msg155471
2010年07月21日 14:23:43amaury.forgeotdarcsetversions: - Python 2.7
nosy: + amaury.forgeotdarc

messages: + msg111063

keywords: + easy
2010年07月10日 16:20:54BreamoreBoysetstage: needs patch
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 3.0
2009年04月14日 23:48:01mnewmancreate

AltStyle によって変換されたページ (->オリジナル) /