Message 172602 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	vstinner
Recipients	BreamoreBoy, ezio.melotti, kushal.das, serhiy.storchaka, thomaslee, vstinner
Date	2012年10月10日.20:38:39
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1349901520.73.0.991489146519.issue16061@psf.upfronthosting.co.za>

Content
> The code is now using the heavily optimized findchar() function. I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters: * Replace 100% (rewrite all characters): find is 12.5x slower than a loop * Replace 50%: find is 3.3x slower * Replace only 2 characters (0.001%): find is 10.4x faster In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase. Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark: python -m timeit -s "a='a'; b='b'; text=a100000" "text.replace(a, b)" python -m timeit -s "a='a'; b='b'; text=(a+' ')(100000//2)" "text.replace(a, b)" python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)" -- An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine.

Content

> The code is now using the heavily optimized findchar() function.
I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters:
 * Replace 100% (rewrite all characters): find is 12.5x slower than a loop
 * Replace 50%: find is 3.3x slower
 * Replace only 2 characters (0.001%): find is 10.4x faster
In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase.
Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark:
python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=(a+' ')*(100000//2)" "text.replace(a, b)"
python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)"
--
An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine.

History
Date	User	Action	Args
2012年10月10日 20:38:40	vstinner	set	recipients: + vstinner, thomaslee, ezio.melotti, BreamoreBoy, serhiy.storchaka, kushal.das
2012年10月10日 20:38:40	vstinner	set	messageid: <1349901520.73.0.991489146519.issue16061@psf.upfronthosting.co.za>
2012年10月10日 20:38:40	vstinner	link	issue16061 messages
2012年10月10日 20:38:40	vstinner	create

homepage