Message91728
| Author |
lemburg |
| Recipients |
Arfrever, amaury.forgeotdarc, lemburg, loewis, vstinner |
| Date |
2009年08月19日.12:49:56 |
| SpamBayes Score |
9.277135e-12 |
| Marked as misclassified |
No |
| Message-id |
<4A8BF4F2.8020603@egenix.com> |
| In-reply-to |
<1250685240.55.0.880064208852.issue6697@psf.upfronthosting.co.za> |
| Content |
Amaury Forgeot d'Arc wrote:
>
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
>
> The problem is actually wider::
> >>> getattr(None, "\udc80")
> Segmentation fault
> An idea would be to change _PyUnicode_AsDefaultEncodedString and allow
> unpaired surrogates (utf8+surrogateescape, as explained in PEP383), but
> I fear the consequences...
>
> The code that fails seems pretty common:
> PyErr_Format(PyExc_AttributeError,
> "'%.50s' object has no attribute '%.400s'",
> tp->tp_name, _PyUnicode_AsString(name));
> It would be unfortunate to replace all usages of _PyUnicode_AsString to
> check the return value.
The use of _PyUnicode_AsString() is wrong here. There are several
cases where it can fail, e.g. MemoryErrors, embedded NULs, encoding
errors.
The same is true for _PyUnicode_AsStringAndSize(), which is why
I turned them into Python interpreter private APIs before 3.0
shipped.
If you want a fail-safe stringified version of a Unicode object,
your only choice is to create a new API that does error checking,
properly clears the error and then returns a reference to a constant
string, e.g. "<repr-error>". |
|