detecting newline character

Sat Apr 23 19:30:29 EDT 2011

Daniel Geržo wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Chris Rebert wrote:
>>> Daniel Geržo wrote:
>>>> [f.newlines is None after f.readlines()
>>>> when f = codecs.open(…, mode='rU', encoding='ascii'),
>>>> but not when f = codecs.open(…, mode='rU')]
>>>>>> […]
>>> I would speculate that the upshot of this is that codecs.open() ends
>>> up calling built-in open() with a nonsense `mode` of "rUb" or similar,
>>> resulting in strange behavior.
>>>>>> If this explanation is correct, then there are 2 bugs:
>>> 1. Built-in open() should treat "b" and "U" as mutually exclusive and
>>> reject mode strings which involve both.
>>> 2. codecs.open() should either reject modes involving "U", or be fixed
>>> so that they work as expected.
>>>> You might be correct that it is a bug (already fixed in versions newer
>> than 2.5), since codecs.open() from my Python 2.6 reads as follows:
>> Well I am doing this on:
> Python 2.7.1 (r271:86832, Mar 7 2011, 14:28:09)
> [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
>> So what do you guys advise me to do?

RTSL, fix when necessary (see my other follow-up), check the trunk, and if 
necessary submit a patch. 
For an immediate solution, do not do what is not supposed to work (calling 
codecs.open(…, mode='U')). You can find the three kinds of newlines in the 
text with, e.g.
 self.newline = list(
 set(re.findall(r'\r?\n|\r', ''.join(fobj.readlines()))))
Please trim your quotes to the relevant minimum (see above for example).
-- 
PointedEars