[Python-Dev] PEP 393 Summer of Code Project

Wed Aug 24 10:17:58 CEST 2011

Le 24/08/2011 04:41, Torsten Becker a écrit :
> On Tue, Aug 23, 2011 at 18:27, Victor Stinner
> <victor.stinner at haypocalc.com> wrote:
>> I posted a patch to re-add it:
>> http://bugs.python.org/issue12819#msg142867
>> Thank you for the patch! Note that this patch adds the fast path only
> to the helper function which determines the length of the string and
> the maximum character. The decoding part is still without a fast path
> for ASCII runs.

Ah? If utf8_max_char_size_and_has_errors() returns no error hand 
maxchar=127: memcpy() is used. You mean that memcpy() is too slow? :-)
maxchar = utf8_max_char_size_and_has_errors(s, size, &unicode_size,
 &has_errors);
if (has_errors) {
 ...
}
else {
 unicode = (PyUnicodeObject *)PyUnicode_New(unicode_size, maxchar);
 if (!unicode) return NULL;
 /* When the string is ASCII only, just use memcpy and return. */
 if (maxchar < 128) {
 assert(unicode_size == size);
 Py_MEMCPY(PyUnicode_1BYTE_DATA(unicode), s, unicode_size);
 return (PyObject *)unicode;
 }
 ...
}
But yes, my patch only optimize ASCII only strings, not "mostly-ASCII" 
strings (e.g. 100 ASCII + 1 latin1 character). It can be optimized 
later. I didn't benchmark my patch.
Victor