homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: restore accepting detached stdin in fileinput binary mode
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: akira, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014年10月23日 07:16 by akira, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fileinput-detached-stdin.diff akira, 2014年10月23日 07:16 review
Messages (8)
msg229859 - (view) Author: Akira Li (akira) * Date: 2014年10月23日 07:16
The patch for Issue #21075: "fileinput.FileInput now reads bytes from standard stream if binary mode is specified" broke code that used
sys.stdin = sys.stdin.detach() with FileInput(mode='rb') in Python 3.3
I've attached the patch that makes FileInput to accept detached sys.stdin 
(without 'buffer' attribute) in binary mode.
msg229865 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年10月23日 09:12
The code
 sys.stdin = sys.stdin.detach()
is incorrect because sys.stdin should be text stream, but detach() returns binary stream.
msg229868 - (view) Author: Akira Li (akira) * Date: 2014年10月23日 11:26
It is incorrect that sys.stdin is *always* a text stream. It often is,
but not always.
There are cases when it is not e.g., 
 $ tar zcf - stuff | gpg -e | ssh user@server 'cat - > stuff.tar.gz.gpg'
tar's stdout is *not* a text stream.
gpg's stdin/stdout are *not* text streams.
ssh's stdin is *not* a text stream.
etc.
If any of the steps are implemented in Python then it is useful to
consider sys.stdin as a binary stream.
Any script written before Python 3.4.1 (#21075) that used FileInput binary mode
*had to* use sys.stdin = sys.stdin.detach()
A bugfix release should not break working code.
msg229869 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年10月23日 11:49
> It is incorrect that sys.stdin is *always* a text stream. It often is,
> but not always.
> 
> There are cases when it is not e.g.,
> 
> $ tar zcf - stuff | gpg -e | ssh user@server 'cat - > stuff.tar.gz.gpg'
> 
> tar's stdout is *not* a text stream.
> gpg's stdin/stdout are *not* text streams.
> ssh's stdin is *not* a text stream.
> etc.
This is not related to Python. Terms "character", "string", "text", "file" can 
have different meaning in different domains. In Python we use Python 
terminology. There is no such thing as sys.stdin in Posix-compatible shell, 
because Posix-compatible shell has no the sys module and doesn't use a dot to 
access attributes.
> Any script written before Python 3.4.1 (#21075) that used FileInput binary
> mode *had to* use sys.stdin = sys.stdin.detach()
> 
> A bugfix release should not break working code.
Correct solution in this case would be to use the workaround "sys.stdin = 
sys.stdin.detach()" conditionally, only in Python versions which have a bug.
msg229870 - (view) Author: Akira Li (akira) * Date: 2014年10月23日 12:38
> This is not related to Python. Terms "character", "string", "text", "file" can have different meaning in different domains. In Python we use Python terminology. There is no such thing as sys.stdin in Posix-compatible shell, because Posix-compatible shell has no the sys module and doesn't use a dot to access attributes.
I use Python terminology (text - Unicode string, binary data - bytes).
Though text vs. binary data distinction is language independent (
it doesn't matter how Unicode type is called in a particular language).
Python can be used to implement `tar`, `gpg`, `ssh`, `7z`, etc. I don't
see what POSIX has anything to do with that fact.
It is very simple actually: 
 text -> encode <character encoding> -> bytes
 bytes -> decode <character encoding> -> text
In most cases text should be human readable.
It doesn't make sense to encode/decode input/output of gpg-like utilities using a character encoding. *Therefore* the notion of 
sys.stdin being a bytes stream (io.BufferedReader) can be useful
in this case.
The lines produced by FileInput are often (after optional processing)
written to sys.stdout. If binary mode is used then FileInput(mode='rb') 
yields bytes therefore it is also useful to consider sys.stdout
a binary stream (io.BufferedWriter) in this case.
It introduces a nice symmetry:
 text FileInput mode -> text streams
 binary FileInput mode -> binary streams
By design, FileInput treats stdin as any other file. It
even supports a special name for it: '-'. A file may be in
binary mode; stdin should be able too.
sys.stdout is used outside of FileInput therefore no changes in 
FileInput itself are necessary but sys.stdin is used inside FileInput
that is why the change is needed.
> Correct solution in this case would be to use the workaround "sys.stdin = 
sys.stdin.detach()" conditionally, only in Python versions which have a bug.
Do you mean every Python 3 version before Python 3.4.1?
Correct solution is to avoid blaming users 
(your fault -> you change your programs) for our mistakes 
and fix the bug in Python itself. The patch is attached.
msg229874 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014年10月23日 14:32
I actually agree that this should be applied not only for backward compatibility reasons, but because it is better duck typing. It unfortunately leaves code still having to potentially deal with "if python version is 3.4.1 or 3.4.2", but there is nothing that can be done about that.
msg257361 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年01月02日 20:45
New changeset ded1336bff49 by R David Murray in branch '3.5':
#22709: Use stdin as-is if it does not have a buffer attribute.
https://hg.python.org/cpython/rev/ded1336bff49
New changeset 688d32cdbc0c by R David Murray in branch 'default':
Merge: #22709: Use stdin as-is if it does not have a buffer attribute.
https://hg.python.org/cpython/rev/688d32cdbc0c 
msg257363 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016年01月02日 20:46
Hopefully 'better late than never' applies to this. Sigh.
History
Date User Action Args
2022年04月11日 14:58:09adminsetgithub: 66898
2016年01月02日 20:46:26r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg257363

stage: commit review -> resolved
2016年01月02日 20:45:09python-devsetnosy: + python-dev
messages: + msg257361
2015年12月03日 19:38:58r.david.murraysetstage: commit review
2014年10月23日 14:32:22r.david.murraysetnosy: + r.david.murray

messages: + msg229874
versions: - Python 3.6
2014年10月23日 12:38:24akirasetmessages: + msg229870
2014年10月23日 11:49:35serhiy.storchakasetmessages: + msg229869
2014年10月23日 11:26:21akirasetmessages: + msg229868
2014年10月23日 09:12:08serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg229865
2014年10月23日 07:16:00akiracreate

AltStyle によって変換されたページ (->オリジナル) /