This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年04月14日 23:48 by mnewman, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| example.zip | mnewman, 2009年04月14日 23:47 | ZIP file containing test example | ||
| issue5758_fileinput_gzip_with_encoding.patch | tati_alchueyr, 2012年03月13日 00:10 | Fixed issue #5758. fileinput.hook_compressed returns string (and not bytes) from gz file again. | review | |
| issue5758_fileinput_gzip_with_encoding_v2.patch | tati_alchueyr, 2012年03月13日 07:36 | New version of the patch after reviewer's considerations | review | |
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 25272 | merged | methane, 2021年04月08日 03:31 | |
| Messages (21) | |||
|---|---|---|---|
| msg85978 - (view) | Author: Michael Newman (mnewman) | Date: 2009年04月14日 23:47 | |
The attached ZIP file contains "test.bat" which runs "test.py" with Python 2.6 and Python 3.0. Python 2.6 behaves as expected (see "py26.out"), since it returns strings from both "mike.txt" and "mike.txt.gz". However, the same test with Python 3.0 returns bytes from "mike.txt.gz", as shown in "py30.out": Output: Hello from Mike. Output: This is the second line. Output: Why did the robot cross the road? Output: b'Hello from Mike.' Output: b'This is the second line.' Output: b'Why did the robot cross the road?' For reference, I tested this on Python versions: Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32 Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) [MSC v.1500 32 bit (Intel)] on win32 |
|||
| msg111063 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2010年07月21日 14:23 | |
gzip.open() only implements the "rb" mode, and returns bytes. fileinput could certainly wrap it with a io.TextIOWrapper. Then the encoding issue arises. fileinput.FileInput should grow an "encoding" parameter instead of always relying on the default encoding. |
|||
| msg155471 - (view) | Author: Tatiana Al-Chueyr (tati_alchueyr) * | Date: 2012年03月12日 20:09 | |
Working on this |
|||
| msg155532 - (view) | Author: Tatiana Al-Chueyr (tati_alchueyr) * | Date: 2012年03月13日 00:10 | |
I had a lot of help of R. David Murray, could someone review our patch, please? We are specially concerned about backward compatibility. |
|||
| msg155538 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2012年03月13日 00:30 | |
Thanks for the patch. I commented on not-so-important things on the code review site (which should have sent an email to Tatiana) and wanted to raise an issue here. Is it best that the encoding used defaults to Python’s default encoding (UTF-8) or the I/O default encoding (depends on the OS and locale), like what open does? |
|||
| msg155540 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年03月13日 00:34 | |
I think it should use the same default as open, but frankly I couldn't remember the right way to achieve that. Suggestions welcome :) |
|||
| msg155543 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2012年03月13日 00:40 | |
We can use locale.getpreferredencoding, or be smarter and just pass down encoding=encoding to TextIOWrapper and let *it* call getpreferredencoding if it’s given a None :) |
|||
| msg155548 - (view) | Author: Tatiana Al-Chueyr (tati_alchueyr) * | Date: 2012年03月13日 01:03 | |
Excellent! Thanks for all remarks, Éric! I'll follow your suggestions and will submit a new patch. |
|||
| msg155553 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年03月13日 01:26 | |
Duh. I should have remembered that the io module used None to represent the default encoding. Thanks, Éric. |
|||
| msg155554 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2012年03月13日 01:27 | |
Nobody can remember everything: I tried in a shell and looked at the docs to find it :) |
|||
| msg155577 - (view) | Author: Tatiana Al-Chueyr (tati_alchueyr) * | Date: 2012年03月13日 07:36 | |
Improved patch, according to eric.araujo's suggestions and mnewman's guidance. |
|||
| msg155595 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2012年03月13日 12:11 | |
Thanks. Unless another core dev wants to do a complementary review I will slightly tweak the patch and commit it. I need to finish waking up and eat some food before I do that :) Technically adding a new argument means that this is a new feature and cannot be applied to the stable 3.2 version, but something needs to be done for this bug in 3.2 too, like a recipe in the docs for a hook_compressed that returns strings (i.e. a function that calls fileinput.hook_compressed and wraps it in a TextIOWrapper), or at least a note to warn about this bug. |
|||
| msg155618 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年03月13日 14:41 | |
For 3.2 could we use the same fix, but without exposing the ability to *change* the encoding? That is, we use TextIOWrapper but always with the default None for encoding. It also occurs to me that this really exposes a weakness in the design. What if the user wants to specify other open parameters? I wonder if we should say that for better future-proofing openhooks should always take **kw. You could even envision fileinput accepting **kw and passing them along to the openhook. I think charset is the most important open paramenter in this context, though, so I don't think we have to solve the general problem in this fix. |
|||
| msg155622 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年03月13日 15:11 | |
It also occurs to me that this fix makes the charset hook look rather odd. We could render it redundant by passing charset to open in the non-openhook case, and mark it deprecated. There is also a bug in the hook_encoding docs. It says the file is opened with codecs.open, but that is not the case, regular open is used. |
|||
| msg155624 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2012年03月13日 15:20 | |
> For 3.2 could we use the same fix, but without exposing the ability to *change* the encoding? > That is, we use TextIOWrapper but always with the default None for encoding. Yes! > It also occurs to me that this really exposes a weakness in the design. What if the user wants to > specify other open parameters? I wonder if we should say that for better future-proofing openhooks > should always take **kw. You could even envision fileinput accepting **kw and passing them along > to the openhook. I think charset is the most important open paramenter in this context, though, so > I don't think we have to solve the general problem in this fix. I concur. I’ve never had to care about buffering for example, but mode is another parameter of open that people may want to give. I’ll commit the minimal fix to 3.2 and merge in 3.3, and then we can discuss on a new RFE bug about adding encoding vs. **kwargs for 3.3. Agreed on deprecating the charset hook when it becomes redundant. Will fix the doc bug about codecs.open too. |
|||
| msg348642 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年07月29日 11:54 | |
This issue is not newcomer friendly, I remove the easy keyword. |
|||
| msg394286 - (view) | Author: Jason R. Coombs (jaraco) * (Python committer) | Date: 2021年05月25日 02:32 | |
The patch for this change has broken code that relied on the old behavior. For example:
```
import fileinput
import bz2
import pathlib
target = pathlib.Path('data.bz2')
target.write_bytes(bz2.compress(b'Foo\nBar\nBiz'))
inp = fileinput.FileInput([str(target)], openhook=fileinput.hook_compressed)
for line in inp:
print(line.decode().strip())
```
That code passes on Python3.10a6 but on 3.10b1 fails with:
```
Traceback (most recent call last):
File "/Users/jaraco/code/main/cmdix/text.py", line 10, in <module>
print(line.decode().strip())
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?
```
I encountered this issue in [this function and its test](https://github.com/jaraco/cmdix/blob/dc5fac3817ff9815b2f8d9a1dfad4258c14b1693/cmdix/lib.py#L165-L193).
I'm struggling with how to adapt the code to provide a uniform interface on Python 3.6+. Perhaps a backport of hook_compressed is in order.
|
|||
| msg394287 - (view) | Author: Jason R. Coombs (jaraco) * (Python committer) | Date: 2021年05月25日 02:59 | |
A backport now exists (https://pypi.org/project/backports.hook_compressed) and addresses the issue (https://github.com/jaraco/cmdix/actions/runs/873404846). |
|||
| msg394290 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2021年05月25日 04:04 | |
> I'm struggling with how to adapt the code to provide a uniform interface on Python 3.6+. It's easy. Just passing `mode="rb"`. ``` FileInput(filelist, mode="rb", openhook=fileinput.hook_compressed) ``` This is better than `backports.hook_compressed`, because it returns bytes always regardless file is compressed or uncompressed. |
|||
| msg394338 - (view) | Author: Jason R. Coombs (jaraco) * (Python committer) | Date: 2021年05月25日 12:37 | |
I did consider and confirm that mode="rb" does also provide a uniform solution, but it also regresses the behavior of uncompressed inputs, making them bytes where they were text. This approach feels like the "Python 1" compatibility approach. At least in my case, the preference was to migrate to the future behavior, where all content is decoded to Unicode text, without having to wait for Python 3.9 to sunset. |
|||
| msg394413 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2021年05月26日 03:46 | |
> I did consider and confirm that mode="rb" does also provide a uniform solution, but it also regresses the behavior of uncompressed inputs, making them bytes where they were text. Of course, I suggested to use "rb" when you want to read bytes. * When reading binary from uncompressed file, mode="rb" just works for 3.6+. * When reading text from uncompressed file, mode="r" just works for 3.6+. * When reading binary from compressed file, mode="rb" just works for 3.6+. * Reading text from compressed file is not supported until Python 3.10. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:47 | admin | set | github: 50008 |
| 2021年05月26日 03:46:18 | methane | set | messages: + msg394413 |
| 2021年05月25日 12:37:27 | jaraco | set | messages: + msg394338 |
| 2021年05月25日 04:04:56 | methane | set | messages: + msg394290 |
| 2021年05月25日 02:59:20 | jaraco | set | messages: + msg394287 |
| 2021年05月25日 02:32:44 | jaraco | set | nosy:
+ jaraco messages: + msg394286 |
| 2021年04月14日 05:13:39 | methane | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2021年04月08日 03:31:16 | methane | set | nosy:
+ methane pull_requests: + pull_request24008 stage: needs patch -> patch review |
| 2021年04月08日 01:34:40 | methane | link | issue36865 superseder |
| 2021年04月07日 10:01:51 | vstinner | set | nosy:
- vstinner |
| 2021年04月07日 07:54:20 | methane | set | versions: + Python 3.10, - Python 3.2, Python 3.3 |
| 2019年07月29日 11:54:09 | vstinner | set | keywords:
- easy messages: + msg348642 |
| 2012年03月13日 15:20:46 | eric.araujo | set | messages: + msg155624 |
| 2012年03月13日 15:11:24 | r.david.murray | set | messages: + msg155622 |
| 2012年03月13日 14:41:45 | r.david.murray | set | messages: + msg155618 |
| 2012年03月13日 12:11:43 | eric.araujo | set | assignee: eric.araujo messages: + msg155595 versions: + Python 3.3, - Python 3.1 |
| 2012年03月13日 07:36:44 | tati_alchueyr | set | files:
+ issue5758_fileinput_gzip_with_encoding_v2.patch messages: + msg155577 |
| 2012年03月13日 01:27:17 | eric.araujo | set | messages: + msg155554 |
| 2012年03月13日 01:26:25 | r.david.murray | set | messages: + msg155553 |
| 2012年03月13日 01:03:13 | tati_alchueyr | set | messages: + msg155548 |
| 2012年03月13日 00:40:43 | eric.araujo | set | messages: + msg155543 |
| 2012年03月13日 00:34:20 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg155540 |
| 2012年03月13日 00:30:35 | eric.araujo | set | nosy:
+ eric.araujo, vstinner messages: + msg155538 |
| 2012年03月13日 00:10:05 | tati_alchueyr | set | files:
+ issue5758_fileinput_gzip_with_encoding.patch keywords: + patch messages: + msg155532 |
| 2012年03月12日 20:09:15 | tati_alchueyr | set | nosy:
+ tati_alchueyr messages: + msg155471 |
| 2010年07月21日 14:23:43 | amaury.forgeotdarc | set | versions:
- Python 2.7 nosy: + amaury.forgeotdarc messages: + msg111063 keywords: + easy |
| 2010年07月10日 16:20:54 | BreamoreBoy | set | stage: needs patch versions: + Python 3.1, Python 2.7, Python 3.2, - Python 3.0 |
| 2009年04月14日 23:48:01 | mnewman | create | |