homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author r.david.murray
Recipients pitrou, r.david.murray
Date 2012年06月12日.01:15:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1339463724.73.0.621165878269.issue15049@psf.upfronthosting.co.za>
In-reply-to
Content
rdmurray@hey:~/python/p32>cat bad.py
 This line is just ascii
 A second line for good measure.
 This comment contains undecodable stuff: "�" or "\\xe9" in "pass�"" cannot be decoded.
The last line above is in latin-1, with an é inside those quotes.
 rdmurray@hey:~/python/p32>cat bug.py 
 import sys
 with open('./bad.py', buffering=int(sys.argv[1])) as f:
 for line in f:
 print(line, end='')
 rdmurray@hey:~/python/p32>python3 bug.py -1
 Traceback (most recent call last):
 File "bug.py", line 3, in <module>
 for line in f:
 File "/usr/lib/python3.2/codecs.py", line 300, in decode
 (result, consumed) = self._buffer_decode(data, self.errors, final)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 99: invalid continuation byte
 rdmurray@hey:~/python/p32>python3 bug.py 1 
 Traceback (most recent call last):
 File "bug.py", line 3, in <module>
 for line in f:
 File "/usr/lib/python3.2/codecs.py", line 300, in decode
 (result, consumed) = self._buffer_decode(data, self.errors, final)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 99:
 invalid continuation byte
 rdmurray@hey:~/python/p32>python3 bug.py 2
 This line is just ascii
 A second line for good measure.
 Traceback (most recent call last):
 File "bug.py", line 3, in <module>
 for line in f:
 File "/usr/lib/python3.2/codecs.py", line 300, in decode
 (result, consumed) = self._buffer_decode(data, self.errors, final)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 0: invalid
 continuation byte
So, line buffering does not appear to buffer line by line.
I ran into this problem because I had a much larger file that I thought
was in utf-8. When I got the encoding error, I was annoyed that the
error message didn't really tell me which line the error was on, but I
figured, OK, I'll just set line buffering and then I'll be able to tell.
But that didn't work. Fortunately using '2' did work....but at a minimum
the docs need to be updated to indicate when line buffering really is
line buffering and when it isn't.
History
Date User Action Args
2012年06月12日 01:15:24r.david.murraysetrecipients: + r.david.murray, pitrou
2012年06月12日 01:15:24r.david.murraysetmessageid: <1339463724.73.0.621165878269.issue15049@psf.upfronthosting.co.za>
2012年06月12日 01:15:23r.david.murraylinkissue15049 messages
2012年06月12日 01:15:22r.david.murraycreate

AltStyle によって変換されたページ (->オリジナル) /