homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: StreamReader.readline doesn't advance on decode errors
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: doerwalter Nosy List: doerwalter, donut, georg.brandl
Priority: normal Keywords:

Created on 2005年12月13日 10:35 by donut, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-test_streamreader.py donut, 2005年12月13日 10:35 script to demonstrate the problem
skipbadlines.py doerwalter, 2005年12月16日 17:25
Messages (4)
msg27059 - (view) Author: Matthew Mueller (donut) Date: 2005年12月13日 10:35
In previous versions of python, when there was a
unicode decode error, StreamReader.readline() would
advance to the next line. In the current version(2.4.2
and trunk), it doesn't. Testing under Linux AMD64
(Ubuntu 5.10)
Attaching an example script. In python2.3 it prints:
hi~
hi
error: 'utf8' codec can't decode byte 0x80 in position
2: unexpected code byte
error: 'utf8' codec can't decode byte 0x81 in position
2: unexpected code byte
all done
In python2.4 and trunk it prints:
hi~
hi
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
[repeats forever]
Maybe this isn't actually supposed to work (the docs
don't mention what should happen with strict error
checking..), but it would be nice, given the alternatives:
1. use errors='replace' and then search the result for
the replacement character. (ick)
2. define a custom error handler similar to ignore or
replace, that also sets some flag. (slightly less ick,
but more work.)
msg27060 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005年12月15日 21:42
Logged In: YES 
user_id=1188172
I don't know what should be correct. Walter?
msg27061 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005年12月16日 17:25
Logged In: YES 
user_id=89016
IMHO the current behaviour is more consistent. To read the
broken utf-8 stream from the test script the appropriate
error handler should be used. What is the desired outcome?
If only the broken byte sequence should be skipped
errors="replace" is appropriate. To skip a complete line
that contains a broken byte sequence do something like in
the attached skipbadlines.py. The StreamReader can't know
which behaviour is wanted.
msg27062 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006年02月19日 00:58
Logged In: YES 
user_id=1188172
Closing as Won't Fix, then.
History
Date User Action Args
2022年04月11日 14:56:14adminsetgithub: 42686
2005年12月13日 10:35:47donutcreate

AltStyle によって変換されたページ (->オリジナル) /