Issue 1379393: StreamReader.readline doesn't advance on decode errors

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42686

classification

Type:	Stage:
Title:	StreamReader.readline doesn't advance on decode errors
Components:	Library (Lib)	Versions:

process

Dependencies:	Superseder:
Status:	closed	Resolution:	wont fix
Assigned To:	doerwalter	Nosy List:	doerwalter, donut, georg.brandl
Priority:	normal	Keywords:

Created on 2005年12月13日 10:35 by donut, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
python-test_streamreader.py	donut, 2005年12月13日 10:35	script to demonstrate the problem
skipbadlines.py	doerwalter, 2005年12月16日 17:25

Messages (4)
msg27059 - (view)	Author: Matthew Mueller (donut)	Date: 2005年12月13日 10:35
In previous versions of python, when there was a unicode decode error, StreamReader.readline() would advance to the next line. In the current version(2.4.2 and trunk), it doesn't. Testing under Linux AMD64 (Ubuntu 5.10) Attaching an example script. In python2.3 it prints: hi~ hi error: 'utf8' codec can't decode byte 0x80 in position 2: unexpected code byte error: 'utf8' codec can't decode byte 0x81 in position 2: unexpected code byte all done In python2.4 and trunk it prints: hi~ hi error: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte error: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte error: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte [repeats forever] Maybe this isn't actually supposed to work (the docs don't mention what should happen with strict error checking..), but it would be nice, given the alternatives: 1. use errors='replace' and then search the result for the replacement character. (ick) 2. define a custom error handler similar to ignore or replace, that also sets some flag. (slightly less ick, but more work.)
msg27060 - (view)	Author: Georg Brandl (georg.brandl) * (Python committer)	Date: 2005年12月15日 21:42
Logged In: YES user_id=1188172 I don't know what should be correct. Walter?
msg27061 - (view)	Author: Walter Dörwald (doerwalter) * (Python committer)	Date: 2005年12月16日 17:25
Logged In: YES user_id=89016 IMHO the current behaviour is more consistent. To read the broken utf-8 stream from the test script the appropriate error handler should be used. What is the desired outcome? If only the broken byte sequence should be skipped errors="replace" is appropriate. To skip a complete line that contains a broken byte sequence do something like in the attached skipbadlines.py. The StreamReader can't know which behaviour is wanted.
msg27062 - (view)	Author: Georg Brandl (georg.brandl) * (Python committer)	Date: 2006年02月19日 00:58
Logged In: YES user_id=1188172 Closing as Won't Fix, then.

History
Date	User	Action	Args
2022年04月11日 14:56:14	admin	set	github: 42686
2005年12月13日 10:35:47	donut	create

homepage