homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Building a list of tuples has non-linear performance
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Rhamphoryncus, collinwinter, gregory.p.smith, jcea, loewis, pitrou
Priority: normal Keywords: patch

Created on 2008年10月08日 04:00 by gregory.p.smith, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tuple_gc_hell.py pitrou, 2008年12月17日 21:24
gctrigger5.patch pitrou, 2008年12月17日 23:36
gctrigger6.patch pitrou, 2009年01月09日 21:08
Messages (18)
msg74512 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年10月08日 04:00
The attached script simply loops building a list of tuples. It has
horrible performance as the list gets larger compared to something
appending simple objects like ints to the list.
% python tuple_gc_hell.py ~
...
1000000 1.3615500927
2000000 2.95893096924
3000000 4.53531980515
4000000 5.62795209885
5000000 7.70974612236
...
The time it takes to execute grows linearly with the number of tuples
already appended to the list.
Turning on gc.debug(gc.DEBUG_STATS) shows why as does running python
under a profiler:
 % cumulative self self total 
 time seconds seconds calls s/call s/call name 
 27.85 115.84 115.84 14270 0.01 0.02 collect
 21.19 204.01 88.17 1115120314 0.00 0.00 tupletraverse
 13.14 258.65 54.64 1655624731 0.00 0.00 visit_reachable
 10.72 303.25 44.60 1655624731 0.00 0.00 visit_decref
 5.97 328.10 24.85 338 0.07 1.10 PyEval_EvalFrame
It appears the cyclic gc is rechecking all of these tuples repeatedly. 
disabling gc during the loop or using a very high gc.set_threshold hides
the problem.
msg74513 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年10月08日 05:09
This is a known problem; see the GC discussions in June for an example, e.g.
http://mail.python.org/pipermail/python-dev/2008-June/080579.html 
msg74598 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年10月09日 20:19
Someone should try implementing Martin's suggestion one day :)
msg77802 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月14日 16:11
Here is a simple patch implementing Martin's proposal with a few basic
tweaks.
Using Greg's script, we get:
-> without patch:
1000000 2.64134001732
2000000 3.60712885857
3000000 5.40855813026
4000000 6.46308898926
5000000 8.65147781372
6000000 10.3949871063
7000000 12.1376650333
8000000 12.7046239376
9000000 15.4652290344
-> with patch:
1000000 2.692289114
2000000 1.91455483437
3000000 2.02900099754
4000000 1.72369813919
5000000 1.87697696686
6000000 2.08952093124
7000000 1.08168196678
8000000 2.32068109512
9000000 1.1202070713
-> with GC disabled:
1000000 1.6810901165
2000000 0.955595970154
3000000 0.959649085999
4000000 0.933673858643
5000000 0.954123973846
6000000 0.929254055023
7000000 0.901160001755
8000000 0.921751022339
9000000 0.894830942154 
msg77812 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008年12月14日 18:07
I didn't test it, but the patch looks okay to me.
msg77966 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 15:00
This new patch adds another improvement where tuples can be "optimized".
Optimized means that tuples which don't contain any GC-tracked object
become themselves untracked. Since tuples are immutable this
optimization is valid, and since it is common to store lots of tuples as
very simple containers of atomic objects this can be an interesting
optimization.
msg77967 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 15:21
This new patch also adds a function named gc.is_tracked() which returns
True if the object is tracked by the GC:
>>> import gc
>>> gc.is_tracked(1)
False
>>> gc.is_tracked([])
True
>>> gc.is_tracked(())
True
>>> gc.is_tracked((0,1))
False
>>> gc.is_tracked((0,"a"))
False
>>> gc.is_tracked((0,[]))
True
>>> gc.is_tracked((0,(1,2)))
False
>>> gc.is_tracked((0,(1,0.55)))
False
>>> gc.is_tracked((0,(1,{})))
True
>>> gc.is_tracked((None, True, False, "a", (1,2,u"z",("another",
"nested", u"tuple")), int))
False
>>> gc.is_tracked(gc)
True
(as you see the empty tuple is considered tracked, this is not important
since it is a singleton anyway)
msg77974 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 17:18
Here is a new patch adding tests for gc.is_tracked().
msg77975 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 17:20
FWIW, with the tuple optimization, the number of objects in
gc.get_objects() after the regression tests has fallen from 150000 to
140000.
msg77985 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 21:24
New version of Greg's script, with different choices (list of tuples,
list of lists, list of dicts, dict of tuples).
msg77987 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 21:36
Now an additional patch which applies over the basic gctrigger4.patch.
It adds dict optimizations so that dicts with only atomic or immutable
elements are also untracked (they get tracked later when other trackable
elements are added).
Since dicts are often used as mappings of simple objects (int, str,
tuple) over other simple objects, this can be very useful.
msg77994 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 23:36
Cleanup: this patch only has the algorithmic change. Tuple and dict opts
will go to a separate tracker issue.
msg77997 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年12月17日 23:49
The optimization issue for tuples and dicts is #4688.
msg79445 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009年01月08日 21:52
Looking at gctrigger5.patch, my only thought is that it would be 
helpful for posterity to include more verbose comments about why this 
new behaviour was chosen; chunks of Martin's proposal from the 
referenced python-dev thread could be included verbatim.
Otherwise, LGTM.
msg79506 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年01月09日 21:08
Patch with updated comments.
msg79510 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009年01月09日 21:26
LGTM. Go ahead and commit this.
msg79520 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年01月09日 22:28
Ok, committed in trunk and py3k. Thanks!
msg79523 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009年01月09日 23:10
> Ok, committed in trunk and py3k. Thanks!
Thanks!
History
Date User Action Args
2022年04月11日 14:56:40adminsetgithub: 48324
2009年01月09日 23:10:37loewissetmessages: + msg79523
2009年01月09日 22:28:31pitrousetstatus: open -> closed
resolution: fixed
messages: + msg79520
2009年01月09日 21:26:58collinwintersetmessages: + msg79510
2009年01月09日 21:08:25pitrousetfiles: + gctrigger6.patch
messages: + msg79506
2009年01月08日 21:52:12collinwintersetmessages: + msg79445
2008年12月30日 22:22:00jceasetnosy: + jcea
2008年12月17日 23:49:27pitrousetmessages: + msg77997
2008年12月17日 23:36:56pitrousetfiles: + gctrigger5.patch
messages: + msg77994
2008年12月17日 23:36:07pitrousetfiles: - gctrigger4.patch
2008年12月17日 23:36:04pitrousetfiles: - gctrigger3.patch
2008年12月17日 23:36:01pitrousetfiles: - gctrigger2.patch
2008年12月17日 23:35:58pitrousetfiles: - gctrigger.patch
2008年12月17日 23:35:54pitrousetfiles: - dictopts.patch
2008年12月17日 21:36:42pitrousetfiles: + dictopts.patch
messages: + msg77987
2008年12月17日 21:24:13pitrousetfiles: + tuple_gc_hell.py
messages: + msg77985
2008年12月17日 21:23:38pitrousetfiles: - tuple_gc_hell.py
2008年12月17日 17:20:21pitrousetmessages: + msg77975
2008年12月17日 17:18:53pitrousetfiles: + gctrigger4.patch
messages: + msg77974
2008年12月17日 15:21:13pitrousetfiles: + gctrigger3.patch
messages: + msg77967
2008年12月17日 15:00:34pitrousetfiles: + gctrigger2.patch
messages: + msg77966
2008年12月17日 03:25:51collinwintersetnosy: + collinwinter
2008年12月14日 18:07:19Rhamphoryncussetnosy: + Rhamphoryncus
messages: + msg77812
2008年12月14日 16:11:53pitrousetfiles: + gctrigger.patch
keywords: + patch
messages: + msg77802
stage: patch review
2008年10月09日 20:19:40pitrousetnosy: + pitrou
messages: + msg74598
versions: + Python 3.1, - Python 2.6, Python 2.5
2008年10月08日 05:09:33loewissetnosy: + loewis
messages: + msg74513
2008年10月08日 04:00:51gregory.p.smithcreate

AltStyle によって変換されたページ (->オリジナル) /