Message100687
| Author |
vstinner |
| Recipients |
vstinner |
| Date |
2010年03月09日.01:11:54 |
| SpamBayes Score |
2.0574147e-07 |
| Marked as misclassified |
No |
| Message-id |
<1268097120.16.0.950101613769.issue8092@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
This issue is a regression introduced by r72208 to fix the issue #3672.
Attached patch fixes PyUnicode_EncodeUTF8() if unicode_encode_call_errorhandler() returns an unicode string (eg. backslackreplace error handler). I don't know unicodeobject.c code (very well), and my patch should be far from being perfect.
I suppose that the maximum length of an escaped characters is 8 bytes (xmlcharrefreplace error error for U+DFFFF). When the first lone surrogate is found, reallocate the buffer to size*8 bytes. The escaped character have to be an ASCII character or an UnicodeEncodeError is raised.
Note: unicode_encode_ucs1() doesn't have hardcoded for the maximum length ot escaped string. Its code might be reused in PyUnicode_EncodeUTF8() to remove the hardcoded limits. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2010年04月20日 19:36:44 | vstinner | unlink | issue8092 messages |
| 2010年03月09日 01:12:00 | vstinner | set | recipients:
+ vstinner |
| 2010年03月09日 01:12:00 | vstinner | set | messageid: <1268097120.16.0.950101613769.issue8092@psf.upfronthosting.co.za> |
| 2010年03月09日 01:11:58 | vstinner | link | issue8092 messages |
| 2010年03月09日 01:11:57 | vstinner | create |
|