homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: codecs error handler is called with a UnicodeDecodeError with the same args
Type: behavior Stage:
Components: Unicode Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, doerwalter, ezio.melotti, lemburg, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2012年01月19日 19:56 by amaury.forgeotdarc, last changed 2022年04月11日 14:57 by admin.

Messages (4)
msg151650 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012年01月19日 19:56
The script below shows that the error handler is always called with the same error object. The 'start', 'end', and 'reason' properties are correctly updated, but the 'args' is always the same and holds the values used for the first call.
It's a bit weird that error.args[2] is not equal to error.start, for example. All versions are affected: 2.7, 3.2, 3.3.
And by the way, I could not find where these are attributes documented.
def custom_handler(error):
 print(error.args,
 (error.start, error.end, error.reason))
 return b'?'.decode(), error.end
import codecs
codecs.register_error('custom', custom_handler)
b'\x80\xd0'.decode('utf-8', 'custom')
msg152528 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2012年02月03日 15:56
See this ancient posting about this problem:
 http://mail.python.org/pipermail/python-dev/2002-August/027661.html
(see point 4.). So I guess somebody did finally complain! ;)
The error attributes are documented in PEP 293. The existence of the attributes is documented in Doc/c-api/exceptions.rst, but not their meaning.
msg152573 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年02月04日 00:52
Codec encoders reuse the same exception object for speed, but set some attributes (start, end and reason). Recreate the args tuple each time that a attribute is set. UnicodeEncodeError and UnicodeDecodeError should maybe override args getter to create a new tuple at each call.
msg313062 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年02月28日 18:24
For reference, this behavior was from beginning, since implementing PEP 293 in issue432401.
History
Date User Action Args
2022年04月11日 14:57:25adminsetgithub: 58038
2018年02月28日 18:24:13serhiy.storchakasetmessages: + msg313062
2018年02月28日 11:32:44serhiy.storchakasetnosy: + serhiy.storchaka
2012年02月04日 00:52:46vstinnersetmessages: + msg152573
2012年02月03日 15:56:39doerwaltersetnosy: + doerwalter
messages: + msg152528
2012年02月03日 14:37:07eric.araujosetnosy: + lemburg, vstinner
2012年01月19日 19:56:36amaury.forgeotdarccreate

AltStyle によって変換されたページ (->オリジナル) /