homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: add 'rbU' mode to open()
Type: Stage:
Components: Library (Lib) Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, georg.brandl, skip.montanaro, techtonik
Priority: normal Keywords:

Created on 2008年07月15日 05:21 by techtonik, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (18)
msg69673 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月15日 05:21
'rU' universal newline support is useless, because read lines end with
'\n' regardless of actual line end in the source file. Applications that
care about line ends still open file in binary mode and gather the stats
manually. 
So, to make this mode useful - the 'rbU' should be addded. Otherwise it
doesn't worth complication both in C code and in documentation.
msg69679 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008年07月15日 11:47
The whole idea of universal newline mode is that the various possible
line endings ('\r', '\n' and '\r\n') are all mapped to '\n' precisely
so the user doesn't have to detect and fiddle with them. Using 'b' and
'U' together makes no sense.
* If you really want to see the line endings use 'rb'.
* If you don't care about the line endings regardless of source, use 'rU'.
* Otherwise use 'r'.
msg69709 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月15日 19:05
If you open file with 'r' - all line endings will be mapped precisely to
'\n' anyways, so it has nothing to do with 'U' mode.
msg69742 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008年07月16日 01:39
> If you open file with 'r' - all line endings will be mapped precisely to
> '\n' anyways, so it has nothing to do with 'U' mode.
No they won't -- only the platform-specific newline will. On Unix, 'r'
and 'rb' are the same.
msg69764 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月16日 05:08
That's weird and the worst is that it is not documented. Manual says:
"If Python is built without universal newline support a mode with 'U' is
the same as normal text mode." 
but no information about what is "normal text mode" behaviour.
The way Python works that you describe is weird, but true. If developer
uses Windows platform - Unix and Windows files will be handled in the
same way, but not files from Mac platform. The worst that developer
can't know this, because he is unlikely to have any Mac files to test.
This behavior is like a long standing mine to collate Windows and Mac
Python users. Why not to fix it?
msg69845 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008年07月16日 22:05
This behavior is inherited from the C-level fopen() and therefore
"normal text mode" is whatever that defines.
Is this really nowhere documented?
msg69862 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008年07月17日 01:01
anatoly> If you open file with 'r' - all line endings will be mapped
 anatoly> precisely to '\n' anyways, so it has nothing to do with 'U'
 anatoly> mode.
Before 3.0 at least, if you copy a text file from, say, Windows to Mac, and
open it with 'r', you get lines which end in '\r\n'. Here's a simple
example:
 >>> open("dos.txt", "rb").read()
 'a single line\r\nanother line\r\n'
 >>> f = open("dos.txt")
 >>> f.next()
 'a single line\r\n'
 >>> f = open("dos.txt", "r")
 >>> f.next()
 'a single line\r\n'
 >>> f.next()
 'another line\r\n'
If, on the other hand, you open it with 'rU', the '\r\n' literal line ending
is converted, even though CRLF is not the canonical Mac line ending:
 >>> f = open("dos.txt", "rU")
 >>> f.next()
 'a single line\n'
 >>> f.next()
 'another line\n'
Skip
msg69876 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月17日 06:46
> This behavior is inherited from the C-level fopen() and therefore
> "normal text mode" is whatever that defines.
> Is this really nowhere documented?
Relation to fopen() function may be documented, but there is no
explanation of what "normal text mode" is. Is it really pythonic that a
script writer without former experience with C, stdio and fopen should
be aware of inherited fopen "behavior" when programming Python?
msg70030 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008年07月19日 13:50
At least the 2.6 docs say
"The default is to use text mode, which may convert ``'\n'`` characters
to a platform-specific representation on writing and back on reading."
msg70068 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月20日 09:09
That's fine with me. I just need a 'rbU' mode to know in which format
should I write the output file if I want to preserve proper line endings
regardless of platform.
As for Python 2.6 note - I would replace "may convert" with "converts".
msg70069 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008年07月20日 10:03
If you want to write your own line endings, read with "rU" and write
with "rb".
msg70098 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月21日 06:12
If lineends are mixed I would like to leave them as is.
msg70130 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008年07月22日 01:10
Did you look at the io.open() function?
It's a new module in python2.6, but also the builtin "open" in py3k!
"""
 * On input, if newline is None, universal newlines mode is
 enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 these are translated into '\n' before being returned to the
 caller. If it is '', universal newline mode is enabled, but line
 endings are returned to the caller untranslated. If it has any of
 the other legal values, input lines are only terminated by the given
 string, and the line ending is returned to the caller untranslated.
"""
I suggest to try
 io.open(filename, newline="")
msg70180 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008年07月23日 18:14
As I indicated in msg69679 if you want to see the line endings just open
the file in binary mode ('rb').
msg70202 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月24日 12:39
Thanks for the hints. It appeared that "universal text mode" is not for
crossplatform but for platform-specific programming. =)
So I gave it up and ended with my own 'rb' newlines counter and 'wb'
writer which inserts lines in required format.
As for 2.6 io.open()
http://docs.python.org/dev/library/io.html#module-io
- can anybody point what's the difference between text mode with
newlines='' and binary mode?
- the comment about newline=<string>
"If it is '', universal newline mode is enabled, but line endings are
returned to the caller untranslated. If it has any of the other legal
values, input lines are only terminated by the given string, and the
line ending is returned to the caller untranslated."
does it mean that if newline='\r\n' is specified all single '\n'
characters are returned inline?
msg70204 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008年07月24日 14:20
> does it mean that if newline='\r\n' is specified all single '\n'
> characters are returned inline?
Yes.
Let's take a file with mixed newlines:
>>> io.open("c:/temp/t", "rb").read()
'a\rb\r\nc\nd\n'
rb mode splits only on '\r\n' (I'm on Windows)
>>> io.open("c:/temp/t", "rb").readlines()
['a\rb\r\n', 'c\n', 'd\n']
rU mode splits on every newline, and converts everything to \n
>>> io.open("c:/temp/t", "rU").readlines()
[u'a\n', u'b\n', u'c\n', u'd\n']
newline='' splits like rU, but does not translate newlines:
>>> io.open("c:/temp/t", newline='').readlines()
[u'a\r', u'b\r\n', u'c\n', u'd\n']
newline='\r\n' only splits on the specified string:
>>> io.open("c:/temp/t", newline='\r\n').readlines()
[u'a\rb\r\n', u'c\nd\n']
msg70218 - (view) Author: anatoly techtonik (techtonik) Date: 2008年07月24日 17:32
This '\r' makes things worse. I am also on Windows and didn't thought
that "rb" processes '\r\n' linefeeds as a side-effect of '\n' being the
last character. Thanks.
newline='' is just what I need. I guess there is no alternative to it in
2.5 series except splitting lines returned from binary read manually.
What about file.newlines attribute - is it preserved in 2.6/Py3k?
BTW, it would be nice to have this example in manual.
msg70219 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008年07月24日 18:51
Please read
http://docs.python.org/dev/library/io.html#io.TextIOBase.newlines 
History
Date User Action Args
2022年04月11日 14:56:36adminsetgithub: 47609
2008年07月24日 18:51:45amaury.forgeotdarcsetmessages: + msg70219
2008年07月24日 17:32:57techtoniksetmessages: + msg70218
2008年07月24日 14:20:18amaury.forgeotdarcsetmessages: + msg70204
2008年07月24日 12:39:16techtoniksetmessages: + msg70202
2008年07月23日 18:14:19skip.montanarosetmessages: + msg70180
2008年07月22日 01:10:40amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg70130
2008年07月21日 06:13:00techtoniksetmessages: + msg70098
2008年07月20日 10:03:09georg.brandlsetmessages: + msg70069
2008年07月20日 09:09:51techtoniksetmessages: + msg70068
2008年07月19日 13:50:12georg.brandlsetmessages: + msg70030
2008年07月17日 06:46:41techtoniksetmessages: + msg69876
2008年07月17日 01:01:36skip.montanarosetmessages: + msg69862
2008年07月16日 22:05:15georg.brandlsetmessages: + msg69845
2008年07月16日 05:08:16techtoniksetmessages: + msg69764
2008年07月16日 01:39:10georg.brandlsetnosy: + georg.brandl
messages: + msg69742
2008年07月15日 19:05:40techtoniksetmessages: + msg69709
2008年07月15日 11:47:47skip.montanarosetstatus: open -> closed
resolution: not a bug
messages: + msg69679
nosy: + skip.montanaro
2008年07月15日 05:21:48techtonikcreate

AltStyle によって変換されたページ (->オリジナル) /