Message 210075 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	terry.reedy
Recipients	gpolo, kbk, loewis, roger.serwy, serhiy.storchaka, terry.reedy
Date	2014年02月03日.02:21:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1391394066.96.0.578116780492.issue20368@psf.upfronthosting.co.za>

Content
The core of the patch is a wrapper that traps UnicodeDecodeErrors, corrects the strings, and re-decodes. A Python version might look like def unicodeFromTclStringAndSize(s, size): try: return <PyUnicode_DecodeUTF8(s, size, NULL)> except UnicodeDecodeError: if b'\xc0\x80' in s: s.replace(b'\xc0\x80', b'\x00') return <PyUnicode_DecodeUTF8(s, size, NULL)> else: raise This is used in a couple of additional wrappers and all direct decode calls are replaced with wrappers. New tests are added. Overall, a great idea, and I want to see this patch in 3.4. But, how many of the replacement sites are exercised by the tests? There are a few changes that seem unrelated to nulls, which might have been left for another patch. Example: -#if TCL_UTF_MAX==3 return PyUnicode_FromKindAndData( - PyUnicode_2BYTE_KIND, Tcl_GetUnicode(value), + sizeof(Tcl_UniChar), Tcl_GetUnicode(value), Tcl_GetCharLength(value)); -#else - return PyUnicode_FromKindAndData( - PyUnicode_4BYTE_KIND, Tcl_GetUnicode(value), - Tcl_GetCharLength(value)); -#endif Do you know if this code block is tested.

Content

The core of the patch is a wrapper that traps UnicodeDecodeErrors, corrects the strings, and re-decodes. A Python version might look like
def unicodeFromTclStringAndSize(s, size):
 try:
 return <PyUnicode_DecodeUTF8(s, size, NULL)>
 except UnicodeDecodeError:
 if b'\xc0\x80' in s:
 s.replace(b'\xc0\x80', b'\x00')
 return <PyUnicode_DecodeUTF8(s, size, NULL)>
 else:
 raise
This is used in a couple of additional wrappers and all direct decode calls are replaced with wrappers. New tests are added. Overall, a great idea, and I want to see this patch in 3.4. But, how many of the replacement sites are exercised by the tests?
There are a few changes that seem unrelated to nulls, which might have been left for another patch. Example:
-#if TCL_UTF_MAX==3
 return PyUnicode_FromKindAndData(
- PyUnicode_2BYTE_KIND, Tcl_GetUnicode(value),
+ sizeof(Tcl_UniChar), Tcl_GetUnicode(value),
 Tcl_GetCharLength(value));
-#else
- return PyUnicode_FromKindAndData(
- PyUnicode_4BYTE_KIND, Tcl_GetUnicode(value),
- Tcl_GetCharLength(value));
-#endif
Do you know if this code block is tested.

History
Date	User	Action	Args
2014年02月03日 02:21:07	terry.reedy	set	recipients: + terry.reedy, loewis, kbk, gpolo, roger.serwy, serhiy.storchaka
2014年02月03日 02:21:06	terry.reedy	set	messageid: <1391394066.96.0.578116780492.issue20368@psf.upfronthosting.co.za>
2014年02月03日 02:21:06	terry.reedy	link	issue20368 messages
2014年02月03日 02:21:06	terry.reedy	create

homepage