This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年07月15日 05:21 by techtonik, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (18) | |||
|---|---|---|---|
| msg69673 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月15日 05:21 | |
'rU' universal newline support is useless, because read lines end with '\n' regardless of actual line end in the source file. Applications that care about line ends still open file in binary mode and gather the stats manually. So, to make this mode useful - the 'rbU' should be addded. Otherwise it doesn't worth complication both in C code and in documentation. |
|||
| msg69679 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2008年07月15日 11:47 | |
The whole idea of universal newline mode is that the various possible
line endings ('\r', '\n' and '\r\n') are all mapped to '\n' precisely
so the user doesn't have to detect and fiddle with them. Using 'b' and
'U' together makes no sense.
* If you really want to see the line endings use 'rb'.
* If you don't care about the line endings regardless of source, use 'rU'.
* Otherwise use 'r'.
|
|||
| msg69709 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月15日 19:05 | |
If you open file with 'r' - all line endings will be mapped precisely to '\n' anyways, so it has nothing to do with 'U' mode. |
|||
| msg69742 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2008年07月16日 01:39 | |
> If you open file with 'r' - all line endings will be mapped precisely to > '\n' anyways, so it has nothing to do with 'U' mode. No they won't -- only the platform-specific newline will. On Unix, 'r' and 'rb' are the same. |
|||
| msg69764 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月16日 05:08 | |
That's weird and the worst is that it is not documented. Manual says: "If Python is built without universal newline support a mode with 'U' is the same as normal text mode." but no information about what is "normal text mode" behaviour. The way Python works that you describe is weird, but true. If developer uses Windows platform - Unix and Windows files will be handled in the same way, but not files from Mac platform. The worst that developer can't know this, because he is unlikely to have any Mac files to test. This behavior is like a long standing mine to collate Windows and Mac Python users. Why not to fix it? |
|||
| msg69845 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2008年07月16日 22:05 | |
This behavior is inherited from the C-level fopen() and therefore "normal text mode" is whatever that defines. Is this really nowhere documented? |
|||
| msg69862 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2008年07月17日 01:01 | |
anatoly> If you open file with 'r' - all line endings will be mapped
anatoly> precisely to '\n' anyways, so it has nothing to do with 'U'
anatoly> mode.
Before 3.0 at least, if you copy a text file from, say, Windows to Mac, and
open it with 'r', you get lines which end in '\r\n'. Here's a simple
example:
>>> open("dos.txt", "rb").read()
'a single line\r\nanother line\r\n'
>>> f = open("dos.txt")
>>> f.next()
'a single line\r\n'
>>> f = open("dos.txt", "r")
>>> f.next()
'a single line\r\n'
>>> f.next()
'another line\r\n'
If, on the other hand, you open it with 'rU', the '\r\n' literal line ending
is converted, even though CRLF is not the canonical Mac line ending:
>>> f = open("dos.txt", "rU")
>>> f.next()
'a single line\n'
>>> f.next()
'another line\n'
Skip
|
|||
| msg69876 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月17日 06:46 | |
> This behavior is inherited from the C-level fopen() and therefore > "normal text mode" is whatever that defines. > Is this really nowhere documented? Relation to fopen() function may be documented, but there is no explanation of what "normal text mode" is. Is it really pythonic that a script writer without former experience with C, stdio and fopen should be aware of inherited fopen "behavior" when programming Python? |
|||
| msg70030 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2008年07月19日 13:50 | |
At least the 2.6 docs say "The default is to use text mode, which may convert ``'\n'`` characters to a platform-specific representation on writing and back on reading." |
|||
| msg70068 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月20日 09:09 | |
That's fine with me. I just need a 'rbU' mode to know in which format should I write the output file if I want to preserve proper line endings regardless of platform. As for Python 2.6 note - I would replace "may convert" with "converts". |
|||
| msg70069 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2008年07月20日 10:03 | |
If you want to write your own line endings, read with "rU" and write with "rb". |
|||
| msg70098 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月21日 06:12 | |
If lineends are mixed I would like to leave them as is. |
|||
| msg70130 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年07月22日 01:10 | |
Did you look at the io.open() function? It's a new module in python2.6, but also the builtin "open" in py3k! """ * On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated. """ I suggest to try io.open(filename, newline="") |
|||
| msg70180 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2008年07月23日 18:14 | |
As I indicated in msg69679 if you want to see the line endings just open the file in binary mode ('rb'). |
|||
| msg70202 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月24日 12:39 | |
Thanks for the hints. It appeared that "universal text mode" is not for crossplatform but for platform-specific programming. =) So I gave it up and ended with my own 'rb' newlines counter and 'wb' writer which inserts lines in required format. As for 2.6 io.open() http://docs.python.org/dev/library/io.html#module-io - can anybody point what's the difference between text mode with newlines='' and binary mode? - the comment about newline=<string> "If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated." does it mean that if newline='\r\n' is specified all single '\n' characters are returned inline? |
|||
| msg70204 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年07月24日 14:20 | |
> does it mean that if newline='\r\n' is specified all single '\n'
> characters are returned inline?
Yes.
Let's take a file with mixed newlines:
>>> io.open("c:/temp/t", "rb").read()
'a\rb\r\nc\nd\n'
rb mode splits only on '\r\n' (I'm on Windows)
>>> io.open("c:/temp/t", "rb").readlines()
['a\rb\r\n', 'c\n', 'd\n']
rU mode splits on every newline, and converts everything to \n
>>> io.open("c:/temp/t", "rU").readlines()
[u'a\n', u'b\n', u'c\n', u'd\n']
newline='' splits like rU, but does not translate newlines:
>>> io.open("c:/temp/t", newline='').readlines()
[u'a\r', u'b\r\n', u'c\n', u'd\n']
newline='\r\n' only splits on the specified string:
>>> io.open("c:/temp/t", newline='\r\n').readlines()
[u'a\rb\r\n', u'c\nd\n']
|
|||
| msg70218 - (view) | Author: anatoly techtonik (techtonik) | Date: 2008年07月24日 17:32 | |
This '\r' makes things worse. I am also on Windows and didn't thought that "rb" processes '\r\n' linefeeds as a side-effect of '\n' being the last character. Thanks. newline='' is just what I need. I guess there is no alternative to it in 2.5 series except splitting lines returned from binary read manually. What about file.newlines attribute - is it preserved in 2.6/Py3k? BTW, it would be nice to have this example in manual. |
|||
| msg70219 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年07月24日 18:51 | |
Please read http://docs.python.org/dev/library/io.html#io.TextIOBase.newlines |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:36 | admin | set | github: 47609 |
| 2008年07月24日 18:51:45 | amaury.forgeotdarc | set | messages: + msg70219 |
| 2008年07月24日 17:32:57 | techtonik | set | messages: + msg70218 |
| 2008年07月24日 14:20:18 | amaury.forgeotdarc | set | messages: + msg70204 |
| 2008年07月24日 12:39:16 | techtonik | set | messages: + msg70202 |
| 2008年07月23日 18:14:19 | skip.montanaro | set | messages: + msg70180 |
| 2008年07月22日 01:10:40 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg70130 |
| 2008年07月21日 06:13:00 | techtonik | set | messages: + msg70098 |
| 2008年07月20日 10:03:09 | georg.brandl | set | messages: + msg70069 |
| 2008年07月20日 09:09:51 | techtonik | set | messages: + msg70068 |
| 2008年07月19日 13:50:12 | georg.brandl | set | messages: + msg70030 |
| 2008年07月17日 06:46:41 | techtonik | set | messages: + msg69876 |
| 2008年07月17日 01:01:36 | skip.montanaro | set | messages: + msg69862 |
| 2008年07月16日 22:05:15 | georg.brandl | set | messages: + msg69845 |
| 2008年07月16日 05:08:16 | techtonik | set | messages: + msg69764 |
| 2008年07月16日 01:39:10 | georg.brandl | set | nosy:
+ georg.brandl messages: + msg69742 |
| 2008年07月15日 19:05:40 | techtonik | set | messages: + msg69709 |
| 2008年07月15日 11:47:47 | skip.montanaro | set | status: open -> closed resolution: not a bug messages: + msg69679 nosy: + skip.montanaro |
| 2008年07月15日 05:21:48 | techtonik | create | |