homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: weak dict iterators are fragile because of unpredictable GC runs
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, benjamin.peterson, dcjim, dfortunov, elachuni, jon, kristjan.jonsson, mark.dickinson, neologix, pitrou, python-dev, qelan, tseaver, vdupras
Priority: normal Keywords: patch

Created on 2009年10月11日 17:29 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
weakiter3.patch pitrou, 2009年10月13日 17:24
issue7105.patch kristjan.jonsson, 2013年11月18日 12:14
weakref.patch kristjan.jonsson, 2013年12月01日 23:04
Pull Requests
URL Status Linked Edit
PR 20687 open dfortunov, 2020年06月06日 23:43
Messages (30)
msg93863 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月11日 17:29
As mentioned in issue7060, weak dict iterators are easily broken by
cyclic garbage collection changing the size of the underlying dict storage:
 File "/home/rdmurray/python/py3k/Lib/weakref.py", line 121, in items
 for wr in self.data.values():
RuntimeError: dictionary changed size during iteration
One possible solution is to delay all removals until all iterators are
done. Here is a context manager-based solution.
msg93865 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009年10月11日 17:57
> delay all removals until all iterators are done
+1
msg93912 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月12日 23:50
This new patch makes it possible to mutate the dict without messing with
the delayed removal when an iterator exists.
msg93934 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月13日 17:24
It occurs weaksets have the same problem, here is a new patch fixing
them as well.
msg97392 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010年01月08日 02:11
LGTM
msg97425 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年01月08日 17:57
Committed in r77365 (py3k) and r77366 (3.1). Thank you.
msg111259 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年07月23日 09:57
The issue still exists in 2.6 and 2.7.
Closing issue 839159 as a duplicate of this one.
msg142392 - (view) Author: Matthew Schwickerath (qelan) Date: 2011年08月18日 18:31
Any plans on actually patching this in 2.7 any time soon? This is affecting our software and hanging it on random occasions.
msg203250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年11月18日 08:14
What is the status of this issue? Does anyone still want to backport the fix to Python 2.7?
(I found this issue while searching for test_multiprocessing failures, and I found #7060 which refers this issue.)
msg203286 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年11月18日 12:14
Attached is a version for 2.7
Btw, I think a cleaner way to deal with unpredictable GC runs is to be able to temporarily disable GC with a context manager.
msg203318 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年11月18日 20:11
> Btw, I think a cleaner way to deal with unpredictable GC runs is to be able to temporarily disable GC with a context manager
Disabling the GC in a library function sounds very ugly to me.
msg203345 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年11月19日 09:51
No matter how it sounds, it certainly looks cleaner in code.
Look at all this code, designed to work around an unexpected GC collection with various pointy bits and edge cases and special corners.
Compare to explicitly just asking GC to relent, for a bit:
def getitems(self):
 with gc.disabled():
 for each in self.data.items():
 yield each
That's it.
While a native implementation of such a context manager would be better (faster, and could be made overriding), a simple one can be constructed thus:
@contextlib.contextmanagerd
def gc_disabled():
 enabled = gc.isenabled()
 gs.disable()
 try:
 yield
 finally:
 if enabled:
 gc.enable()
 
Such global "atomic" context managers are well known to stackless programmers. It's a very common idiom when building higher level primitives (such as locks) from lower level ones.
with stackless.atomic():
 do()
 various()
 stuff_that_does_not_like_being_interrupted()
(stackless.atomic prevents involuntary tasklet switching _and_ involuntary thread switching)
msg203349 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年11月19日 10:24
> No matter how it sounds, it certainly looks cleaner in code.
It's also unsafe and invasive, since it's a process-wide setting. An
iterator can be long-lived if it's being consumed slowly, so you've
disabled garbage collection for an unknown amount of time, without the
user knowing about it. Another thread could kick in and perhaps
re-enable it for whatever reason.
Oh if someone calls gc.collect() explicitly, the solution is suddenly
defeated.
msg203354 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年11月19日 10:34
Yes, the "long iterator" scenario is the reason it is not ideal for this scenario.
The other one (gc.collect()) is easily solved by implementing this construct natively. It can be done rather simply by adding an overriding "pause" property to gc, with the following api:
def pause(increment):
 """
 pause or unpause garbage collection. A positive value
 increases the pause level, while a negative one reduces it.
 when paused, gc won't happen even when explicitly requested with
 gc.collect(), until the pause level drops to 0.
 """
I'm sure there are other places in the code with local execution that would benefit from not having an accidental GC run happen. I'm sure I've seen such places, with elaborate scaffolding to safeguard itself from such cases.
Anyway, my 2 aurar worth of lateral thinking applied to the problem at hand :)
What about the patch itself?
msg203355 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年11月19日 10:39
Speaking of which, this is not only about GC runs, although it is the most annoying scenario (since you basically don't control when it happens). Regular resource reaping because of reference counting can also wreak havoc.
About the patch, I don't know. It introduces new complexity in 2.7 which should be fairly stable by now. The one thing I don't like is your replacement of "iterator" by "iterable" in the docs.
(you also have one line commented out in the tests)
msg203356 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年11月19日 10:47
The changes for the docs are just a port of the original patch. And indeed, these functions don't return an iterator, but a generator object.
I admit I'm confused by the difference, since next() can be called directly on the generator. Still, a lot of code in the test explicitly calls iter() on the iterators before doing next(). Not sure why.
The commented out line is an artifact, I'll remove it, the correct one is the test for 20 items.
Otherwise, I have no vested interest in getting this in. My porting this is just me contributing to 2.7. If it's vetoed, we'll just put it in 2.8.
msg204975 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年12月01日 23:04
Here's a different approach.
Simply avoid the use of iterators over the underlying container.
Instead, we iterate over lists of items/keys/values etc.
msg204980 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013年12月02日 00:19
I appreciate the simplicity, but I don't think it is acceptable -- if the dict is large, materializing the entire list of keys/values/items might allocate a prohibitive amount of memory. It's worse if you have code that is expected to break out of the loop.
msg205013 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年12月02日 14:11
Yes, the old memory argument.
But is it valid? Is there a conceivable application where a dict of weak references would be storing a large chunk of the application memory?
Remember, all of the data must be referred to from elsewhere, or else, the weak refs would not exist. An extra list of pointers is unlikely to make a difference.
I think the chief reason to use iterators has to do with performance by avoiding the creation of temporary objects, not saving memory per-se.
Before the invention of "iteritems()" and friends, all such iteration was by lists (and hence, memory usage). We should try to remain nimble enough so that we can undo an optimization previously done, if the requirements merit us doing so.
As a completely unrelated example of such nimbleness: Faced with stricter regulations in the 70s, american car makers had to sell their muscle cars with increasingly less powerful engines, efectively rolling back previous optimizations :)
Anyway, it's not for me to decide. We have currently three options:
a) my first patch, which is a duplication of the 3.x work but is non-trivial and could bring stability issues
b) my second patch, which will increase memory use, but to no more than previous versions of python used while iterating
c) do nothing and have iterations over weak dicts randomly break when an underlying cycle is unraveled during iteration.
Cheers!
msg205014 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年12月02日 14:30
> Anyway, it's not for me to decide. We have currently three options:
> a) my first patch, which is a duplication of the 3.x work but is
> non-trivial and could bring stability issues
> b) my second patch, which will increase memory use, but to no more
> than previous versions of python used while iterating
> c) do nothing and have iterations over weak dicts randomly break when
> an underlying cycle is unraveled during iteration.
Either a) or c), for me. We shouldn't change semantics in bugfix
releases.
msg205019 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013年12月02日 15:41
I'm with Antoine. Have we heard of any problems with the 3.x version of the patch? How different is it?
msg205090 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年12月03日 09:16
Strictly speaking b) is not a semantic change. Depending on your semantic definition of semantics. At any rate it is even less so than a) since the temporary list is hidden from view and the only side effect is additional memory usage.
msg205092 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年12月03日 09:19
d), We could also simply issue a (documentation) warning, that the "iterator" methods of these dictionares are known to be fragile, and recommend that people use the keys(), values() and items() instead.
msg205093 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年12月03日 09:20
d) sounds like a good enough resolution at this point.
msg205124 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013年12月03日 15:53
I'm not sure I understand the hesitation about backporting the Python 3 solution. We're acknowledging it's a bug, so the fix is not a feature. The Python 3 solution is the future. So why not fix it?
msg205215 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013年12月04日 10:57
That's the spirit, Guido :)
I just think people are being extra careful after the "regression" introduced in 2.7.5.
However, IMHO we must never let the odd mistake scare us from making necessary moves.
Unless Antoine explicitly objects, I think I'll submit my patch from november and we'll just watch what happens.
msg205224 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年12月04日 15:07
I'm ok with the backport.
msg205285 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年12月05日 10:05
New changeset 03fcc12282fc by Kristján Valur Jónsson in branch '2.7':
Issue #7105: weak dict iterators are fragile because of unpredictable GC runs
http://hg.python.org/cpython/rev/03fcc12282fc 
msg217444 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014年04月28日 23:32
Since this is backported, shouldn't it be closed?
msg266818 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2016年06月01日 14:37
Shouldn't the documentation be updated?
https://docs.python.org/3.6/library/weakref.html#weakref.WeakKeyDictionary
Note Caution: Because a WeakKeyDictionary is built on top of a Python dictionary, it must not change size when iterating over it. This can be difficult to ensure for a WeakKeyDictionary because actions performed by the program during iteration may cause items in the dictionary to vanish "by magic" (as a side effect of garbage collection).
History
Date User Action Args
2022年04月11日 14:56:53adminsetgithub: 51354
2020年06月08日 05:00:46gvanrossumsetnosy: - gvanrossum
2020年06月07日 22:00:59vstinnersetnosy: - vstinner
2020年06月06日 23:43:08dfortunovsetnosy: + dfortunov

pull_requests: + pull_request19902
2016年06月01日 14:37:00neologixsetnosy: + neologix
messages: + msg266818
2014年04月29日 10:06:43kristjan.jonssonsetstatus: open -> closed
resolution: fixed
2014年04月28日 23:32:37pitrousetmessages: + msg217444
2014年02月03日 15:44:09BreamoreBoysetnosy: - BreamoreBoy
2013年12月05日 10:05:11python-devsetnosy: + python-dev
messages: + msg205285
2013年12月04日 15:07:27pitrousetmessages: + msg205224
2013年12月04日 10:57:27kristjan.jonssonsetmessages: + msg205215
2013年12月03日 15:53:12gvanrossumsetmessages: + msg205124
2013年12月03日 09:20:18pitrousetmessages: + msg205093
2013年12月03日 09:19:21kristjan.jonssonsetmessages: + msg205092
2013年12月03日 09:16:39kristjan.jonssonsetmessages: + msg205090
2013年12月02日 15:41:01gvanrossumsetmessages: + msg205019
2013年12月02日 14:30:24pitrousetmessages: + msg205014
2013年12月02日 14:11:53kristjan.jonssonsetmessages: + msg205013
2013年12月02日 00:19:31gvanrossumsetmessages: + msg204980
2013年12月01日 23:04:43kristjan.jonssonsetfiles: + weakref.patch

messages: + msg204975
2013年11月19日 10:47:25kristjan.jonssonsetmessages: + msg203356
2013年11月19日 10:39:12pitrousetmessages: + msg203355
2013年11月19日 10:34:59kristjan.jonssonsetmessages: + msg203354
2013年11月19日 10:24:05pitrousetmessages: + msg203349
2013年11月19日 09:51:03kristjan.jonssonsetmessages: + msg203345
2013年11月18日 20:11:32pitrousetmessages: + msg203318
2013年11月18日 12:14:04kristjan.jonssonsetfiles: + issue7105.patch
nosy: + kristjan.jonsson
messages: + msg203286

2013年11月18日 08:14:57vstinnersetnosy: + vstinner
messages: + msg203250
2011年08月18日 18:31:12qelansetnosy: + qelan
messages: + msg142392
2010年07月23日 09:58:05mark.dickinsonsetnosy: + dcjim, tseaver, ajaksu2, vdupras, elachuni, BreamoreBoy
2010年07月23日 09:57:23mark.dickinsonsetstatus: closed -> open

versions: + Python 2.6, Python 2.7, - Python 3.1, Python 3.2
nosy: + mark.dickinson

messages: + msg111259
resolution: fixed -> (no value)
stage: resolved ->
2010年07月23日 09:56:04mark.dickinsonlinkissue839159 superseder
2010年01月08日 17:57:03pitrousetstatus: open -> closed
resolution: fixed
messages: + msg97425

stage: patch review -> resolved
2010年01月08日 02:11:35benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg97392
2009年11月13日 17:39:30pitrousetfiles: - weakiter2.patch
2009年11月13日 17:39:28pitrousetfiles: - weakiter.patch
2009年10月24日 12:55:46pitroulinkissue3578 superseder
2009年10月13日 17:24:59pitrousetfiles: + weakiter3.patch

messages: + msg93934
2009年10月12日 23:51:00pitrousetfiles: + weakiter2.patch

messages: + msg93912
2009年10月11日 17:57:27gvanrossumsetnosy: + gvanrossum
messages: + msg93865
2009年10月11日 17:36:34jonsetnosy: + jon
2009年10月11日 17:29:50pitroulinkissue7060 dependencies
2009年10月11日 17:29:21pitroucreate

AltStyle によって変換されたページ (->オリジナル) /