Issue 16001: small ints: cache string representation

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60205

classification

Title:	small ints: cache string representation
Type:	performance	Stage:
Components:	Unicode	Versions:	Python 3.4

process

Dependencies:	Superseder:
Status:	closed	Resolution:	rejected
Assigned To:	Nosy List:	jcea, loewis, mark.dickinson, pitrou, serhiy.storchaka, terry.reedy, vstinner
Priority:	normal	Keywords:	patch

Created on 2012年09月22日 00:11 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
small_ints_cache_str.patch	vstinner, 2012年09月22日 00:11	review
bench_int_str.py	vstinner, 2012年09月24日 23:25
small_ints_cache_str-2.patch	vstinner, 2012年09月24日 23:28	review

Messages (9)
msg170938 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年09月22日 00:11
Integers in range [-5; 256] are singleton. It is possible to cache their string representation (in base 10) to make str(int) faster, but also to reduce memory consumption (and memory fragmentation, Unicode strings don't use free list). Attached patch implements this micro-optimization. It reuses singleton for latin1 characters (so digits 0..9): str(0) is chr(48). - /* Special-case boolean: we want 0/1 */ - if (PyBool_Check(val)) - result = PyNumber_ToBase(val, 10); - else - result = Py_TYPE(val)->tp_str(val); + result = PyNumber_ToBase(val, 10); This change is required because Py_TYPE(val)->tp_str(val); may return a singleton, whereas formatlong() requires a "mutable" string (string not used yet). See also issue #10044.
msg170943 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年09月22日 00:46
Honestly, I'm not sure I understand the point of this optimization. Have you identified a workload where this would help?
msg170947 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年09月22日 01:13
Also, can you report benchmark results?
msg171202 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年09月24日 23:25
Here is a micro benchmark: --- # run it using: # benchmark.py script bench_int_str.py [--file=output] # https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py def run_benchmark(bench): bench.timeit('S(123)', setup='S=str') bench.timeit('S(1) == S(2)', setup='S=str') bench.timeit('S(12345)', setup='S=str') bench.timeit('{S(x): x for x in data}', setup='data=tuple(range(100)); S=str') bench.timeit('"x=%s" % x', setup='x=123') bench.timeit('"x=%s" % x', setup='x=12345') --- Output: -------------------------------------------------------+-------------+--------------- Tests \| unpatched \| patched -------------------------------------------------------+-------------+--------------- S=str; S(123) \| 158 ns () \| 112 ns (-29%) S=str; S(1) == S(2) \| 329 ns () \| 248 ns (-25%) S=str; S(12345) \| 161 ns () \| 161 ns data=tuple(range(100)); S=str; {S(x): x for x in data} \| 23 us () \| 16.5 us (-28%) x=123; "x=%s" % x \| 145 ns () \| 133 ns (-8%) x=12345; "x=%s" % x \| 149 ns () \| 145 ns -------------------------------------------------------+-------------+--------------- Total \| 23.9 us (*) \| 17.3 us (-27%) -------------------------------------------------------+-------------+--------------- I expected more important speedup.
msg171203 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年09月24日 23:25
My initial idea was to cache str(int) for any integer, but it may waste memory for huge numbers.
msg171204 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年09月24日 23:28
Oops, I forgot to attach the new patch: small_ints_cache_str-2.patch optimizes also str % args (copy the string when needed if the reference count is not 1).
msg171211 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年09月25日 03:12
-1 for this entire effort.
msg171321 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2012年09月25日 21:58
Small int caching saves both time and space. On a nearly fresh IDLE session: >>> sys.getrefcount(0) 772 >>> sys.getrefcount(1) 854 >>> sum(sys.getrefcount(i) for i in range(-5, 257)) 4878 While an interesting idea, I do not see the same gain here, and agree with Martin. Array lookup is faster than string conversion: >>> ti.repeat(setup = "ar = [str(i) for i in range(101)]", stmt = "ar[100]") [0.058166605132757995, 0.03438449234832762, 0.034402937150259674] >>> ti.repeat(setup = "S = str", stmt = 'S(100)') [0.21833603908330446, 0.19469564386039195, 0.1947128590088596] but 1: converting ints to decimal digits is nearly always done for output, and conversion is blazingly fast compared to output, so output time will dominate. >>> ti.repeat(setup = "S = str", stmt = 'S(100)', number = 20) [1.0144641009901534e-05, 8.914987631669646e-06, 8.914987574826228e-06] >>> ti.repeat(setup = "p = print", stmt = 'p(100)', number = 20) ... [0.11873041968999587, 0.039060557051357137, 0.03859697769621562] 2. I presume the conversion of 0 - 9 to '0' - '9' within the conversion routines is already optimized. I don't see that 10 - 259 should be more common that 257 - 999, let alone more common than all higher ints. So the limited optimization can have only limited effect. 3. Much production numerical output is float or decimal rather than int. The 3.3 optimization of ascii-only strings to bytes helped here.
msg171575 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年09月29日 16:14
str(int) is less common than "%s" % int or "%i" % int, and str%args has now "fast path" in Python 3.3 for integers. So I agree that my patch adds useless complexity and I'm closing this idea, thanks.

History
Date	User	Action	Args
2022年04月11日 14:57:36	admin	set	github: 60205
2012年09月29日 16:14:28	vstinner	set	status: open -> closed resolution: rejected messages: + msg171575
2012年09月28日 08:51:37	jcea	set	nosy: + jcea
2012年09月25日 21:58:41	terry.reedy	set	nosy: + terry.reedy messages: + msg171321
2012年09月25日 03:12:12	loewis	set	messages: + msg171211
2012年09月24日 23:28:31	vstinner	set	files: + small_ints_cache_str-2.patch messages: + msg171204
2012年09月24日 23:25:56	vstinner	set	messages: + msg171203
2012年09月24日 23:25:20	vstinner	set	files: + bench_int_str.py messages: + msg171202
2012年09月22日 01:13:44	loewis	set	nosy: + loewis messages: + msg170947
2012年09月22日 00:46:41	pitrou	set	nosy: + mark.dickinson messages: + msg170943
2012年09月22日 00:12:01	vstinner	set	nosy: - ezio.melotti
2012年09月22日 00:11:49	vstinner	set	nosy: + serhiy.storchaka
2012年09月22日 00:11:19	vstinner	create

homepage