homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: performance regression in string replace for 3.3
Type: performance Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: BreamoreBoy, benjamin.peterson, ezio.melotti, jcea, kushal.das, loewis, pitrou, python-dev, serhiy.storchaka, terry.reedy, thomaslee, vstinner
Priority: normal Keywords: 3.3regression, patch

Created on 2012年09月27日 16:03 by BreamoreBoy, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unicode.patch vstinner, 2012年10月10日 20:38 review
unicode_2.patch serhiy.storchaka, 2012年10月11日 08:36 review
replacebench.py serhiy.storchaka, 2012年10月12日 18:39 Microbenchmark for regularly distributed replacements
replacebench2.py serhiy.storchaka, 2012年10月12日 18:39 Microbenchmark for randomly distributed replacements
str_replace_1char.patch serhiy.storchaka, 2012年10月13日 18:03 review
str_replace_1char_2.patch serhiy.storchaka, 2013年04月07日 21:36 review
Messages (22)
msg171378 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2012年09月27日 16:03
Quoting Steven D'Aprano on c.l.p.
"But add a call to replace, and things are very different:
[steve@ando ~]$ python2.7 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 9.3 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 5.43 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 18.3 usec per loop
Three times slower, even for pure-ASCII strings. I get comparable results for Unicode. Notice how slow Python 2.7 is:
[steve@ando ~]$ python2.7 -m timeit -s "s = u'你'*1000" "s.replace(u'你', u'a')"
10000 loops, best of 3: 65.6 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
100000 loops, best of 3: 2.79 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
10000 loops, best of 3: 23.7 usec per loop
Even with the performance regression, it is still over twice as fast as
Python 2.7.
Nevertheless, I think there is something here. The consequences are nowhere near as dramatic as jmf claims, but it does seem that replace() has taken a serious performance hit. Perhaps it is unavoidable, but perhaps not.
If anyone else can confirm similar results, I think this should be raised as a performance regression.
Quoting Serhiy Storchaka in response.
"Yes, I confirm, it's a performance regression. It should be avoidable.
Almost any PEP393 performance regression can be avoided. At least for
such corner case. Just no one has yet optimized this case."
msg171407 - (view) Author: Thomas Lee (thomaslee) (Python committer) Date: 2012年09月28日 06:39
My results aren't quite as dramatic as yours, but there does appear to be a regression:
$ ./python -V
Python 2.7.3+
$ ./python -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 16.5 usec per loop
$ ./python -V
Python 3.3.0rc3+
$ ./python -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
10000 loops, best of 3: 22.7 usec per loop
msg171413 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年09月28日 08:28
Python 3.3 is 2x faster than Python 3.2 to replace a character with
another if the string only contains the character 3 times. This is not
acceptable, Python 3.3 must be as slow as Python 3.2!
$ python3.2 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='à'; s.replace(ch, after)"
100000 loops, best of 3: 3.62 usec per loop
$ python3.3 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='à'; s.replace(ch, after)"
1000000 loops, best of 3: 1.36 usec per loop
$ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='Ł'; s.replace(ch, after)"
100000 loops, best of 3: 3.15 usec per loop
$ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
after='Ł'; s.replace(ch, after)"
1000000 loops, best of 3: 1.91 usec per loop
More seriously, I changed the algorithm of str.replace(before, after)
when before and after are only one character: changeset c802bfc8acfc.
The code is now using the heavily optimized findchar() function.
PyUnicode_READ() is slow and should be avoided when possible:
PyUnicode_READ() macro is expanded to 2 if, whereas findchar() uses
directly pointer of the right type (Py_UCS1*, Py_UCS2* or Py_UCS4*).
In Python 3.2, the code looks like:
 for (i = 0; i < u->length; i++) {
 if (u->str[i] == u1) {
 if (--maxcount < 0)
 break;
 u->str[i] = u2;
 }
 }
In Python 3.3, the code looks like:
 pos = findchar(sbuf, PyUnicode_KIND(self), slen, u1, 1);
 if (pos < 0)
 goto nothing;
 ...
 while (--maxcount)
 {
 pos++;
 src += pos * PyUnicode_KIND(self);
 slen -= pos;
 index += pos;
 pos = findchar(src, PyUnicode_KIND(self), slen, u1, 1);
 if (pos < 0)
 break;
 PyUnicode_WRITE(rkind, PyUnicode_DATA(u), index + pos, u2);
 }
msg172602 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月10日 20:38
> The code is now using the heavily optimized findchar() function.
I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters:
 * Replace 100% (rewrite all characters): find is 12.5x slower than a loop
 * Replace 50%: find is 3.3x slower
 * Replace only 2 characters (0.001%): find is 10.4x faster
In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase.
Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark:
python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=(a+' ')*(100000//2)" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)"
--
An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine.
msg172628 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月11日 08:36
> I compared performances of the two methods: dummy loop vs find.
You can hybridize them. First just compare chars and if not match then use 
memcmp(). This speed up the case of repeated chars.
msg172691 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月11日 20:38
> You can hybridize them. First just compare chars and if not match then use
> memcmp(). This speed up the case of repeated chars.
Oh, you're patch is simple and it's amazing fast! I compare unicode with
Python 2.7, 3.2, 3.4 and 3.4 patched, and bytes with 2.7. Using your patch,
Python 3.4 is the fastest implemented in most cases.
Common platform:
CPU model: Intel(R) Core(TM) i5 CPU 661 @ 3.33GHz
Bits: int=32, long=32, long long=64, pointer=32
Platform: Linux-3.2.0-31-generic-pae-i686-with-debian-wheezy-sid
Platform of campaign 2.7-bytes:
Python unicode implementation: UTF-16
Python version: 2.7.3+ (2.7:19d37c8d1882+, Oct 9 2012, 14:37:36) [GCC 4.6.3]
CFLAGS: -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012年10月11日 14:41:49
Platform of campaign 2.7-unicode:
Python unicode implementation: UTF-16
Python version: 2.7.3+ (2.7:19d37c8d1882+, Oct 9 2012, 14:37:36) [GCC 4.6.3]
CFLAGS: -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012年10月11日 14:42:55
Platform of campaign 3.2-wide:
Python unicode implementation: UCS-4
Python version: 3.2.3+ (3.2:f7615ee43318, Sep 27 2012, 15:00:15) [GCC 4.6.3]
CFLAGS: -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012年10月11日 14:41:30
Platform of campaign 3.4:
Python unicode implementation: PEP 393
Python version: 3.4.0a0 (default:ad51ed93377c, Oct 11 2012, 14:40:51) [GCC
4.6.3]
CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012年10月11日 14:40:52
Platform of campaign 3.4-patch:
Date: 2012年10月11日 14:40:25
Python version: 3.4.0a0 (default:ad51ed93377c+, Oct 11 2012, 14:33:04) [GCC
4.6.3]
CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
SCM: hg revision=ad51ed93377c+ tag=tip branch=default date="2012-10-11
00:11 -0700"
Python unicode implementation: PEP 393
----------------+-----------------+-----------------+-----------------+-----------------+----------------
Tests | 2.7-bytes | 2.7-unicode | 3.2-wide | 3.4 | 3.4-patch
----------------+-----------------+-----------------+-----------------+-----------------+----------------
all | 7.83 ms (+552%) | 2.05 ms (+71%) | 3.45 ms (+188%) | 15 ms (+1152%) |
1.2 ms (*)
replace 50% | 4.14 ms (+135%) | 1.76 ms (*) | 3.17 ms (+81%) | 7.76 ms
(+342%) | 4.18 ms (+138%)
replace 10% | 1.21 ms (*) | 1.52 ms (+26%) | 3.01 ms (+150%) | 2.01 ms
(+67%) | 1.23 ms
replace 1% | 490 us | 1.55 ms (+217%) | 2.94 ms (+501%) | 589 us (+20%) |
489 us (*)
replace 2 chars | 398 us | 1.47 ms (+271%) | 2.89 ms (+632%) | 398 us | 395
us (*)
----------------+-----------------+-----------------+-----------------+-----------------+----------------
Total | 14.1 ms (+88%) | 8.34 ms (+11%) | 15.5 ms (+106%) | 25.8 ms (+244%)
| 7.49 ms (*)
----------------+-----------------+-----------------+-----------------+-----------------+----------------
**
Compare 3.2, 3.4 and 3.4 patched:
----------------+-------------+-----------------+---------------
Tests | 3.2-wide | 3.4 | 3.4-patch
----------------+-------------+-----------------+---------------
all | 3.45 ms (*) | 15 ms (+335%) | 1.2 ms (-65%)
replace 50% | 3.17 ms (*) | 7.76 ms (+145%) | 4.18 ms (+32%)
replace 10% | 3.01 ms (*) | 2.01 ms (-33%) | 1.23 ms (-59%)
replace 1% | 2.94 ms (*) | 589 us (-80%) | 489 us (-83%)
replace 2 chars | 2.89 ms (*) | 398 us (-86%) | 395 us (-86%)
----------------+-------------+-----------------+---------------
Total | 15.5 ms (*) | 25.8 ms (+67%) | 7.49 ms (-52%)
----------------+-------------+-----------------+---------------
The patch should be completed to optimize also other Unicode kinds.
msg172769 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月12日 18:39
> The patch should be completed to optimize also other Unicode kinds.
I'm working on it.
Here are benchmark scripts which I use. First tests regular strings (replace 
every n-th char), second tests random strings (replace 1/n of total randomly 
distributed chars).
msg172780 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月12日 21:05
The performance numbers are very nice, but the patch needs a comment about the optimization, IMO.
msg172821 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月13日 18:03
After much experimentation, I suggest the new patch.
Benchmark results (time of replacing 1 of n character (ch1 to ch2) in 100000-
char string).
Py3.2 Py3.3 patch n ch1 ch2 fill
231 (-13%) 3025 (-93%) 200 1 'a' 'b' 'c'
626 (-18%) 2035 (-75%) 511 2 'a' 'b' 'c'
444 (-26%) 957 (-66%) 327 5 'a' 'b' 'c'
349 (-30%) 530 (-54%) 243 10 'a' 'b' 'c'
306 (-40%) 300 (-38%) 185 20 'a' 'b' 'c'
280 (-54%) 169 (-23%) 130 50 'a' 'b' 'c'
273 (-62%) 123 (-15%) 105 100 'a' 'b' 'c'
265 (-70%) 82 (-4%) 79 1000 'a' 'b' 'c'
230 (+4%) 3012 (-92%) 239 1 '\u010a' '\u010b' '\u010c'
624 (-17%) 1907 (-73%) 518 2 '\u010a' '\u010b' '\u010c'
442 (-16%) 962 (-62%) 370 5 '\u010a' '\u010b' '\u010c'
347 (-5%) 566 (-42%) 330 10 '\u010a' '\u010b' '\u010c'
305 (-10%) 357 (-23%) 275 20 '\u010a' '\u010b' '\u010c'
285 (-26%) 241 (-12%) 212 50 '\u010a' '\u010b' '\u010c'
280 (-33%) 190 (-2%) 187 100 '\u010a' '\u010b' '\u010c'
263 (-41%) 170 (-8%) 156 1000 '\u010a' '\u010b' '\u010c'
3355 (-85%) 3309 (-85%) 498 1 '\U0001000a' '\U0001000b' '\U0001000c'
2290 (-65%) 2267 (-65%) 800 2 '\U0001000a' '\U0001000b' '\U0001000c'
1598 (-62%) 1279 (-52%) 612 5 '\U0001000a' '\U0001000b' '\U0001000c'
1313 (-60%) 950 (-45%) 519 10 '\U0001000a' '\U0001000b' '\U0001000c'
1195 (-61%) 824 (-44%) 464 20 '\U0001000a' '\U0001000b' '\U0001000c'
1055 (-59%) 640 (-32%) 434 50 '\U0001000a' '\U0001000b' '\U0001000c'
982 (-55%) 549 (-20%) 439 100 '\U0001000a' '\U0001000b' '\U0001000c'
941 (-56%) 473 (-12%) 417 1000 '\U0001000a' '\U0001000b' '\U0001000c'
On other platforms other numbers are possible. Especially I'm interested in 
the results on Windows and on 64-bit. For the test I used the script 
replacebench2.py, and compared the results with the help of script 
https://bitbucket.org/storchaka/cpython-stuff/raw/default/bench/bench-diff.py 
.
msg178604 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月30日 19:23
I going speed up other cases for replace(), but for now I have only this patch. Is it good? Should I apply it to 3.3 as there is a 3.3 regression?
msg178606 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012年12月30日 19:38
As __ap__ says, it would be nice to have a comment.
msg178607 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年12月30日 19:43
64-bit linux results:
3.2 3.3 patch
133 (-28%) 1343 (-93%) 96 1 'a' 'b' 'c'
414 (-9%) 704 (-47%) 375 2 'a' 'b' 'c'
319 (-8%) 491 (-40%) 293 3 'a' 'b' 'c'
253 (-7%) 384 (-39%) 235 4 'a' 'b' 'c'
216 (-8%) 320 (-38%) 199 5 'a' 'b' 'c'
192 (-9%) 278 (-37%) 175 6 'a' 'b' 'c'
175 (-10%) 247 (-36%) 157 7 'a' 'b' 'c'
162 (-11%) 223 (-35%) 144 8 'a' 'b' 'c'
153 (-14%) 204 (-35%) 132 9 'a' 'b' 'c'
145 (-15%) 188 (-35%) 123 10 'a' 'b' 'c'
108 (-36%) 112 (-38%) 69 20 'a' 'b' 'c'
86 (-59%) 53 (-34%) 35 50 'a' 'b' 'c'
78 (-71%) 31 (-26%) 23 100 'a' 'b' 'c'
73 (-84%) 12 (+0%) 12 1000 'a' 'b' 'c'
81 (-88%) 10 (+0%) 10 10000 'a' 'b' 'c'
133 (-23%) 1470 (-93%) 103 1 '\u010a' '\u010b' '\u010c'
414 (-10%) 799 (-54%) 371 2 '\u010a' '\u010b' '\u010c'
319 (-5%) 576 (-47%) 303 3 '\u010a' '\u010b' '\u010c'
254 (-1%) 461 (-46%) 251 4 '\u010a' '\u010b' '\u010c'
216 (+2%) 391 (-44%) 220 5 '\u010a' '\u010b' '\u010c'
193 (+4%) 341 (-41%) 200 6 '\u010a' '\u010b' '\u010c'
175 (+5%) 303 (-39%) 184 7 '\u010a' '\u010b' '\u010c'
163 (+6%) 275 (-37%) 172 8 '\u010a' '\u010b' '\u010c'
153 (+6%) 252 (-36%) 162 9 '\u010a' '\u010b' '\u010c'
145 (+7%) 235 (-34%) 155 10 '\u010a' '\u010b' '\u010c'
108 (-1%) 133 (-20%) 107 20 '\u010a' '\u010b' '\u010c'
86 (-27%) 66 (-5%) 63 50 '\u010a' '\u010b' '\u010c'
79 (-44%) 44 (+0%) 44 100 '\u010a' '\u010b' '\u010c'
74 (-66%) 24 (+4%) 25 1000 '\u010a' '\u010b' '\u010c'
75 (-71%) 22 (+0%) 22 10000 '\u010a' '\u010b' '\u010c'
1687 (-91%) 1362 (-89%) 150 1 '\U0001000a' '\U0001000b' '\U0001000c'
1146 (-58%) 817 (-41%) 479 2 '\U0001000a' '\U0001000b' '\U0001000c'
919 (-61%) 627 (-43%) 358 3 '\U0001000a' '\U0001000b' '\U0001000c'
802 (-63%) 521 (-44%) 294 4 '\U0001000a' '\U0001000b' '\U0001000c'
729 (-64%) 446 (-42%) 259 5 '\U0001000a' '\U0001000b' '\U0001000c'
678 (-65%) 394 (-40%) 237 6 '\U0001000a' '\U0001000b' '\U0001000c'
643 (-66%) 350 (-37%) 220 7 '\U0001000a' '\U0001000b' '\U0001000c'
617 (-66%) 313 (-34%) 207 8 '\U0001000a' '\U0001000b' '\U0001000c'
598 (-67%) 283 (-30%) 198 9 '\U0001000a' '\U0001000b' '\U0001000c'
581 (-67%) 258 (-27%) 189 10 '\U0001000a' '\U0001000b' '\U0001000c'
511 (-71%) 152 (-3%) 148 20 '\U0001000a' '\U0001000b' '\U0001000c'
472 (-76%) 89 (+28%) 114 50 '\U0001000a' '\U0001000b' '\U0001000c'
461 (-78%) 68 (+47%) 100 100 '\U0001000a' '\U0001000b' '\U0001000c'
452 (-81%) 48 (+81%) 87 1000 '\U0001000a' '\U0001000b' '\U0001000c'
452 (-81%) 46 (+85%) 85 10000 '\U0001000a' '\U0001000b' '\U0001000c'
msg178609 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年12月30日 19:48
64-bit windows results:
3.3 patched
925 (-90%) 97 1 'a' 'b' 'c'
881 (-54%) 405 2 'a' 'b' 'c'
623 (-51%) 308 3 'a' 'b' 'c'
482 (-48%) 252 4 'a' 'b' 'c'
396 (-44%) 223 5 'a' 'b' 'c'
344 (-40%) 208 6 'a' 'b' 'c'
306 (-38%) 190 7 'a' 'b' 'c'
276 (-37%) 173 8 'a' 'b' 'c'
254 (-36%) 162 9 'a' 'b' 'c'
241 (-35%) 156 10 'a' 'b' 'c'
158 (-24%) 120 20 'a' 'b' 'c'
107 (-12%) 94 50 'a' 'b' 'c'
87 (+7%) 93 100 'a' 'b' 'c'
70 (+3%) 72 1000 'a' 'b' 'c'
63 (+8%) 68 10000 'a' 'b' 'c'
1332 (-92%) 106 1 '\u010a' '\u010b' '\u010c'
1137 (-60%) 459 2 '\u010a' '\u010b' '\u010c'
836 (-58%) 347 3 '\u010a' '\u010b' '\u010c'
660 (-56%) 288 4 '\u010a' '\u010b' '\u010c'
567 (-55%) 256 5 '\u010a' '\u010b' '\u010c'
503 (-51%) 245 6 '\u010a' '\u010b' '\u010c'
455 (-47%) 242 7 '\u010a' '\u010b' '\u010c'
414 (-44%) 231 8 '\u010a' '\u010b' '\u010c'
387 (-41%) 227 9 '\u010a' '\u010b' '\u010c'
365 (-35%) 237 10 '\u010a' '\u010b' '\u010c'
256 (-21%) 201 20 '\u010a' '\u010b' '\u010c'
185 (-9%) 168 50 '\u010a' '\u010b' '\u010c'
186 (-19%) 150 100 '\u010a' '\u010b' '\u010c'
137 (-1%) 136 1000 '\u010a' '\u010b' '\u010c'
131 (+3%) 135 10000 '\u010a' '\u010b' '\u010c'
1346 (-88%) 160 1 '\U0001000a' '\U0001000b' '\U0001000c'
1247 (-62%) 469 2 '\U0001000a' '\U0001000b' '\U0001000c'
965 (-64%) 352 3 '\U0001000a' '\U0001000b' '\U0001000c'
845 (-64%) 303 4 '\U0001000a' '\U0001000b' '\U0001000c'
720 (-65%) 251 5 '\U0001000a' '\U0001000b' '\U0001000c'
655 (-65%) 230 6 '\U0001000a' '\U0001000b' '\U0001000c'
604 (-58%) 256 7 '\U0001000a' '\U0001000b' '\U0001000c'
570 (-62%) 214 8 '\U0001000a' '\U0001000b' '\U0001000c'
546 (-63%) 203 9 '\U0001000a' '\U0001000b' '\U0001000c'
515 (-63%) 190 10 '\U0001000a' '\U0001000b' '\U0001000c'
404 (-61%) 157 20 '\U0001000a' '\U0001000b' '\U0001000c'
339 (-62%) 130 50 '\U0001000a' '\U0001000b' '\U0001000c'
308 (-60%) 122 100 '\U0001000a' '\U0001000b' '\U0001000c'
284 (-54%) 130 1000 '\U0001000a' '\U0001000b' '\U0001000c'
281 (-60%) 113 10000 '\U0001000a' '\U0001000b' '\U0001000c'
msg178612 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月30日 20:04
> As __ap__ says, it would be nice to have a comment.
Oh, I thought I had already done this.
msg178615 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年12月30日 21:36
str_replace_1char.patch: why not implementing replace_1char_inplace() in stringlib, with one version per character type (UCS1, UCS2, UCS4)?
I prefer unicode_2.patch algorithm because it's simpler: only one loop (vs two loops for str_replace_1char.patch, with a threshold of 10 different characters).
Why do you changed your algorithm? Is str_replace_1char.patch algorithm more efficient than unicode_2.patch algorithm? Is the speedup really interesting?
msg178666 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月31日 11:21
> str_replace_1char.patch: why not implementing replace_1char_inplace() in
> stringlib, with one version per character type (UCS1, UCS2, UCS4)?
Because there are no benefits to do it. All three versions (UCS1, UCS2, and 
UCS4) have no any common code. The best implementation used for every kind of 
strings. For UCS1 it uses fast memchr() (findchar() has some overhead here), 
for UCS2 it uses findchar(), and for UCS4 it uses a dumb loop, because 
findchar() will be too ineffective here.
> I prefer unicode_2.patch algorithm because it's simpler: only one loop (vs
> two loops for str_replace_1char.patch, with a threshold of 10 different
> characters).
Yes, UCS1-implementation in str_replace_1char.patch is more complicated, but 
it is faster for more input strings. memchr() is more effective than a simple 
loop when the replaceable characters are rare. But when they meet often, a 
simple cycle is more efficient. The "attempts" counter determines how many 
characters will be checked before using memchr(). This speeds up the 
replacement in strings with frequent replacements, but a little slow down the 
replacement in strings with rare replacements. 10 is a compromise. 
str_replace_1char.patch speed up not only case when *each* character replaced, 
but when 1/2, 1/3, 1/5,... characters replaced.
> Why do you changed your algorithm? Is str_replace_1char.patch algorithm
> more efficient than unicode_2.patch algorithm? Is the speedup really
> interesting?
You can run benchmarks and compare results. str_replace_1char.patch provides 
not the best performance, but most stable results for wide sort of strings, 
and has no regressions comparing with 3.2.
msg185864 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年04月02日 22:42
How can we move this issu forward? I still prefer unicode_2.patch over str_replace_1char.patch because the code is simpler and so easier to maintain.
str_replace_1char.patch has a "bug": replace_1char() does not use "pos" for the latin1 path.
msg185948 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013年04月03日 20:00
My experiments last September, before this was filed, showed that str.find (index) had most of the relative slowdown of str.replace. I assumed at that time that .replace used .find or .index to find substrings to replace, so that the fix for .replace would include speeding up .find/index. Looking at the patches, I see that I was wrong; I guess because .replace copies as it goes. I will open a separate issue sometime unless .find gets fixed here.
For both .find and .replace, the regression was worse on Windows than on linux, so the patches should be tested on Windows also. If I can get vc++ express 2008 installed and working on my current substitute machine. I will give them a try.
msg186248 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年04月07日 21:36
Here is an updated patch. Some comments added (I will be grateful for help in the improvement of these comments), an implementation moved to stringlib (a new file Objects/stringlib/replace.h added).
unicode_2.patch optimizes only too special case and I consider this is not worth the effort. str_replace_1char*.patch cover a wider area and designed to be faster than 3.2 and 3.3 in most cases and not to be significant slower in corner cases.
msg186249 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年04月07日 21:47
str_replace_1char_2.patch looks good to me. Just one nit: please add a reference to this issue in the comment (in replace.h).
msg186807 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年04月13日 19:45
New changeset d396e0716bf4 by Serhiy Storchaka in branch 'default':
Issue #16061: Speed up str.replace() for replacing 1-character strings.
http://hg.python.org/cpython/rev/d396e0716bf4 
msg186808 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年04月13日 19:47
Thanks to Ezio Melotti and Daniel Shahaf for their great help in correcting my clumsy wording.
History
Date User Action Args
2022年04月11日 14:57:36adminsetgithub: 60265
2013年04月13日 19:47:21serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg186808

stage: patch review -> resolved
2013年04月13日 19:45:50python-devsetnosy: + python-dev
messages: + msg186807
2013年04月07日 21:47:14vstinnersetmessages: + msg186249
2013年04月07日 21:36:05serhiy.storchakasetfiles: + str_replace_1char_2.patch

messages: + msg186248
2013年04月03日 20:00:57terry.reedysetnosy: + terry.reedy
messages: + msg185948
2013年04月02日 22:42:52vstinnersetmessages: + msg185864
2013年02月18日 16:09:09jceasetnosy: + jcea
2012年12月31日 11:21:26serhiy.storchakasetmessages: + msg178666
2012年12月30日 21:36:50vstinnersetmessages: + msg178615
2012年12月30日 20:04:34serhiy.storchakasetmessages: + msg178612
2012年12月30日 19:48:07pitrousetmessages: + msg178609
2012年12月30日 19:43:24pitrousetmessages: + msg178607
2012年12月30日 19:38:46benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg178606
2012年12月30日 19:23:56serhiy.storchakasetkeywords: + 3.3regression

messages: + msg178604
2012年12月29日 22:20:06serhiy.storchakasetassignee: serhiy.storchaka
2012年10月13日 18:03:28serhiy.storchakasetfiles: + str_replace_1char.patch

messages: + msg172821
2012年10月13日 14:01:46pitrousetstage: needs patch -> patch review
2012年10月12日 21:05:02pitrousetnosy: + pitrou
messages: + msg172780
2012年10月12日 18:39:14serhiy.storchakasetfiles: + replacebench.py, replacebench2.py

messages: + msg172769
2012年10月11日 20:38:25vstinnersetmessages: + msg172691
2012年10月11日 08:36:27serhiy.storchakasetfiles: + unicode_2.patch

messages: + msg172628
2012年10月10日 20:42:48vstinnersetnosy: + loewis
2012年10月10日 20:38:40vstinnersetfiles: + unicode.patch
keywords: + patch
messages: + msg172602
2012年10月09日 09:16:24kushal.dassetnosy: + kushal.das
2012年09月28日 08:28:01vstinnersetmessages: + msg171413
2012年09月28日 06:39:48thomasleesetnosy: + thomaslee
messages: + msg171407
2012年09月27日 18:56:21ezio.melottisetnosy: + vstinner, ezio.melotti
stage: needs patch

components: + Unicode
versions: + Python 3.4, - Python 3.3
2012年09月27日 17:33:47serhiy.storchakasetnosy: + serhiy.storchaka
components: + Interpreter Core
2012年09月27日 16:03:07BreamoreBoycreate

AltStyle によって変換されたページ (->オリジナル) /