Message171413
| Author |
vstinner |
| Recipients |
BreamoreBoy, ezio.melotti, serhiy.storchaka, thomaslee, vstinner |
| Date |
2012年09月28日.08:28:00 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<CAMpsgwah4_2G9nr6VYpjTarTqj0zwv6DUYU7gXSBd4c9Nuo8Lw@mail.gmail.com> |
| In-reply-to |
<1348814388.88.0.637801900455.issue16061@psf.upfronthosting.co.za> |
| Content |
Python 3.3 is 2x faster than Python 3.2 to replace a character with
another if the string only contains the character 3 times. This is not
acceptable, Python 3.3 must be as slow as Python 3.2!
$ python3.2 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='à'; s.replace(ch, after)"
100000 loops, best of 3: 3.62 usec per loop
$ python3.3 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='à'; s.replace(ch, after)"
1000000 loops, best of 3: 1.36 usec per loop
$ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='Ł'; s.replace(ch, after)"
100000 loops, best of 3: 3.15 usec per loop
$ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='Ł'; s.replace(ch, after)"
1000000 loops, best of 3: 1.91 usec per loop
More seriously, I changed the algorithm of str.replace(before, after)
when before and after are only one character: changeset c802bfc8acfc.
The code is now using the heavily optimized findchar() function.
PyUnicode_READ() is slow and should be avoided when possible:
PyUnicode_READ() macro is expanded to 2 if, whereas findchar() uses
directly pointer of the right type (Py_UCS1*, Py_UCS2* or Py_UCS4*).
In Python 3.2, the code looks like:
for (i = 0; i < u->length; i++) {
if (u->str[i] == u1) {
if (--maxcount < 0)
break;
u->str[i] = u2;
}
}
In Python 3.3, the code looks like:
pos = findchar(sbuf, PyUnicode_KIND(self), slen, u1, 1);
if (pos < 0)
goto nothing;
...
while (--maxcount)
{
pos++;
src += pos * PyUnicode_KIND(self);
slen -= pos;
index += pos;
pos = findchar(src, PyUnicode_KIND(self), slen, u1, 1);
if (pos < 0)
break;
PyUnicode_WRITE(rkind, PyUnicode_DATA(u), index + pos, u2);
} |
|