Message172602
| Author |
vstinner |
| Recipients |
BreamoreBoy, ezio.melotti, kushal.das, serhiy.storchaka, thomaslee, vstinner |
| Date |
2012年10月10日.20:38:39 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1349901520.73.0.991489146519.issue16061@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
> The code is now using the heavily optimized findchar() function.
I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters:
* Replace 100% (rewrite all characters): find is 12.5x slower than a loop
* Replace 50%: find is 3.3x slower
* Replace only 2 characters (0.001%): find is 10.4x faster
In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase.
Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark:
python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=(a+' ')*(100000//2)" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)"
--
An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine. |
|