Issue1379393
Created on 2005年12月13日 10:35 by donut, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (4) |
|
msg27059 - (view) |
Author: Matthew Mueller (donut) |
Date: 2005年12月13日 10:35 |
In previous versions of python, when there was a
unicode decode error, StreamReader.readline() would
advance to the next line. In the current version(2.4.2
and trunk), it doesn't. Testing under Linux AMD64
(Ubuntu 5.10)
Attaching an example script. In python2.3 it prints:
hi~
hi
error: 'utf8' codec can't decode byte 0x80 in position
2: unexpected code byte
error: 'utf8' codec can't decode byte 0x81 in position
2: unexpected code byte
all done
In python2.4 and trunk it prints:
hi~
hi
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
[repeats forever]
Maybe this isn't actually supposed to work (the docs
don't mention what should happen with strict error
checking..), but it would be nice, given the alternatives:
1. use errors='replace' and then search the result for
the replacement character. (ick)
2. define a custom error handler similar to ignore or
replace, that also sets some flag. (slightly less ick,
but more work.)
|
|
msg27060 - (view) |
Author: Georg Brandl (georg.brandl) * (Python committer) |
Date: 2005年12月15日 21:42 |
Logged In: YES
user_id=1188172
I don't know what should be correct. Walter?
|
|
msg27061 - (view) |
Author: Walter Dörwald (doerwalter) * (Python committer) |
Date: 2005年12月16日 17:25 |
Logged In: YES
user_id=89016
IMHO the current behaviour is more consistent. To read the
broken utf-8 stream from the test script the appropriate
error handler should be used. What is the desired outcome?
If only the broken byte sequence should be skipped
errors="replace" is appropriate. To skip a complete line
that contains a broken byte sequence do something like in
the attached skipbadlines.py. The StreamReader can't know
which behaviour is wanted.
|
|
msg27062 - (view) |
Author: Georg Brandl (georg.brandl) * (Python committer) |
Date: 2006年02月19日 00:58 |
Logged In: YES
user_id=1188172
Closing as Won't Fix, then.
|
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2022年04月11日 14:56:14 | admin | set | github: 42686 |
| 2005年12月13日 10:35:47 | donut | create |