homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_multiprocessing dictionary changed size errors and hang
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: 7105 7318 Superseder:
Assigned To: jnoller Nosy List: asksol, benjamin.peterson, gregory.p.smith, jnoller, jon, kristjan.jonsson, pitrou, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2009年10月05日 01:40 by r.david.murray, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_multiprocessing.log r.david.murray, 2009年10月05日 01:40
Messages (30)
msg93570 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年10月05日 01:40
test_multiprocessing is producing tracebacks and hanging on py3k trunk.
 I think this started happening fairly recently, as it seemed to work in
one not-too-old checkout until I did an svn up and then it started
failing. 3.1 seems fine. I've attached the tracebacks, including the
tracebacks I get when I ctrl-C to break the hang. My system is Gentoo
Linux.
msg93571 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年10月05日 02:14
r75241 seems to be the cause. Tests passes on r75240, but gives the
dict error and hang on r75241.
msg93601 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009年10月05日 13:18
Adding gps - I haven't had a chance to grok the changes, and me getting to 
a fix is going to take a few.
msg93605 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009年10月05日 15:16
It's passing for me on trunk/py3k at least on OS/X - I'm going to need to 
dredge up a linuxvm
msg93815 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年10月10日 00:55
Benjamin tried reverting the weakref patch, but that didn't fix it. 
Turns out the problem is the other patch in that merge, that adds saving
of the exception to AssertRaises. Adding Kristjan as that was his patch.
(The tests pass with the weakref patch applied and the unittest patch
reverted, by the way.)
msg93825 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年10月10日 11:04
Interesting. I don't have a linux machine to debug this. My patch 
appears harmless enough. We are only storing an exception _object_, not 
any tracebacks or such.
If this were happening on my windows machine I would put in breakpoints 
where the ignored assertion is being raised and find out what's going on.
msg93854 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月11日 12:03
It was a bad idea to revert the change, because we may now forget about
the problem rather than fix it.
It is likely that the "dictionary changed size during iteration" occurs
because a GC collection gets triggered during the iteration on the
weakdict. A GC collection can destroy some objects in that case because
keeping an exception around (which the unittest change does) usually
creates reference cycles through the frame's locals.
It seems that WeakValueDictionary.items() isn't very robust in py3k ;)
We probably need to add list-returning variants, e.g. a listitems()
method (and, probably, listkeys() and listvalues() too)?
msg93855 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年10月11日 12:06
The unittest only keeps an exception _object_ around, not the associated 
traceback. There should be no frames and thus reference cycles associated 
with this.
I agree that the problem should be tackled, rather than swept under the 
carpet like a revert does.
msg93856 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月11日 12:14
> The unittest only keeps an exception _object_ around, not the associated 
> traceback.
In py3k, the traceback is attached to the exception (on the
__traceback__ attribute) ;)
msg93857 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009年10月11日 12:32
I disagree about reverting it. We have this bug report, and that change
is sitting on the merge queue again...
msg93859 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年10月11日 14:08
I didn't realize that the traceback was attached to the exception object 
in py3k. This makes the use of such objects more dangerous because of the 
circular references. The recommended practice of
exc_type, exc_obj = sys.exc_info()[:2]
is precisely to avoid any "accidents" with the tracebacks.
See also http://mail.python.org/pipermail/python-dev/2005-
August/055251.html
So, can one just clear the __traceback__ member? That would make sense 
for the unittests, but it would make this manifestation of a GC bug go 
into hiding again.
msg94413 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月24日 12:59
Kristján:
> So, can one just clear the __traceback__ member?
Yes, or use `exc_value.with_traceback(None)`.
msg95173 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009年11月13日 02:37
If this is not ported by the time we get to a 2.7 or 3.2 release, I'd
like to back it out.
msg95175 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年11月13日 09:16
I can easily port it again, and remove the traceback from the exception.
But won't we be sweeping that GC problem under the carpet then?
msg95178 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年11月13日 15:04
Please note there's a patch in issue7105 to allow safer iteration of
weak dicts.
msg95182 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年11月13日 16:10
Ok, I ported the change again in revision 76238, removing the traceback 
from the stored exception object.
msg95183 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年11月13日 16:12
> Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment:
> 
> Ok, I ported the change again in revision 76238, removing the traceback 
> from the stored exception object.
"versionchanged" should be 3.2, not 3.1.
Thanks!
msg95184 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年11月13日 16:16
Thanks. Fixed in revision 76239.
msg95188 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年11月13日 17:04
I don't know if it's your last change, but test_multiprocessing is now
hanging on all buildbots.
msg95194 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年11月13日 17:35
It is. TestZZZNumberOfObjects hangs at r76238, and does not hang at
r76237 on my Gentoo Linux box.
msg95195 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年11月13日 18:04
It seems the error is different, though. This is what I get after Ctrl-C
(I omit the tracebacks from the other processes):
Traceback (most recent call last):
 File "/home/antoine/py3k/__svn__/Lib/test/test_multiprocessing.py",
line 1071, in test_number_of_objects
 refs = self.manager._number_of_objects()
 File "Z:\py3k\__svn__\lib\multiprocessing\managers.py", line 569, in
_number_of_objects
 conn = self._Client(self._address, authkey=self._authkey)
 File "Z:\py3k\__svn__\lib\multiprocessing\connection.py", line 134, in
Client
 c = SocketClient(address)
 File "Z:\py3k\__svn__\lib\multiprocessing\connection.py", line 253, in
SocketClient
 s.connect(address)
socket.error: [Errno 2] No such file or directory
msg95196 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年11月13日 19:04
Ok, the problem is that a finalizer calls _decref which tries to open an
XMLRPC connection (!) to an address on which nobody listens anymore.
Then stupid SocketClient (in multiprocessing.connection) loops endlessly
because it is designed to do so when it gets ECONNREFUSED.
(when you Ctrl-C, you don't see all this because it happens in a
finalizer, and only a terse message is displayed)
So there are two problems:
- why does test_multiprocessing think it should open an XMLRPC
connection to a closed endpoint
- why doesn't SocketClient have some kind of timeout rather than looping
stupidly
msg95203 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年11月13日 21:39
Yes, this is quite different.
After seeing this, I created issue 7314.
I didn't realize that it had anything to do with my patch but rather 
blamed it on my brand new windows 7.
msg95204 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年11月13日 21:41
> Yes, this is quite different.
> After seeing this, I created issue 7314.
> I didn't realize that it had anything to do with my patch but rather 
> blamed it on my brand new windows 7.
I don't think it has anything to do with your patch. While diagnosing
the problem, I noticed that adding some random print() lines would
sometimes make the problem disappear. So it is really timing-dependent
and your patch happens to trigger the right (or wrong ;-)) timing.
msg95205 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年11月13日 21:42
See my analysis from issue 7314:
The other processes are all waiting to ReadFile(). Maybe they have yet 
to run to the point where they open the socket to which the master 
process is trying to connect to.
The mystery is also: Why does the harmless act of setting an instance 
attribute on a context manager (and without any traceback) trigger this 
behaviour?
msg95206 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009年11月13日 21:44
Ah, okay then. Let's focus on solving 7414 then.
msg95372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年11月17日 10:07
I don't think this is a blocker anymore. Kristján's patch has been
ported and the test_multiprocessing freezes have been "fixed" by adding
a timeout.
As for the original purpose of this ticket, issue7105 is still (part
of?) the answer.
msg203249 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年11月18日 08:13
I never seen this issue, can we close it?
#7105 has been fixed in Python 3, not in Python 2.7.
msg205226 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年12月04日 15:09
Okay, let's say it is fixed. Adding Richard to nosy so that he can review the issue if he's interested.
msg320627 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年06月27日 21:30
> RuntimeError: dictionary changed size during iteration
I didn't see this bug the last 5 years, so I consider that the issue has been fixed in the meanwhile :-)
History
Date User Action Args
2022年04月11日 14:56:53adminsetgithub: 51309
2018年06月27日 21:30:55vstinnersetstatus: open -> closed
resolution: out of date -> fixed
messages: + msg320627
2013年12月04日 15:09:49pitrousetresolution: out of date
messages: + msg205226
stage: needs patch -> resolved
2013年11月18日 08:13:54vstinnersetnosy: + vstinner
messages: + msg203249
2010年08月27日 14:13:22asksolsetnosy: + asksol
2009年11月17日 10:07:19pitrousetpriority: release blocker -> normal

messages: + msg95372
2009年11月13日 21:44:04kristjan.jonssonsetmessages: + msg95206
2009年11月13日 21:42:47kristjan.jonssonsetmessages: + msg95205
2009年11月13日 21:41:24pitrousetmessages: + msg95204
2009年11月13日 21:39:10kristjan.jonssonsetmessages: + msg95203
2009年11月13日 19:29:14pitrousetdependencies: + multiprocessing should not wait endlessly
2009年11月13日 19:04:46pitrousetmessages: + msg95196
2009年11月13日 18:04:33pitrousetmessages: + msg95195
2009年11月13日 17:35:39r.david.murraysetmessages: + msg95194
2009年11月13日 17:35:22r.david.murraysetmessages: - msg95193
2009年11月13日 17:34:57r.david.murraysetmessages: + msg95193
2009年11月13日 17:34:35r.david.murraysetmessages: - msg95190
2009年11月13日 17:26:16r.david.murraysetmessages: + msg95190
2009年11月13日 17:04:40pitrousetmessages: + msg95188
2009年11月13日 16:16:30kristjan.jonssonsetmessages: + msg95184
2009年11月13日 16:12:57pitrousetmessages: + msg95183
2009年11月13日 16:10:58kristjan.jonssonsetmessages: + msg95182
2009年11月13日 15:04:44pitrousetmessages: + msg95178
2009年11月13日 09:16:22kristjan.jonssonsetmessages: + msg95175
2009年11月13日 02:37:21benjamin.petersonsetpriority: normal -> release blocker

messages: + msg95173
2009年10月24日 12:59:50pitrousetmessages: + msg94413
2009年10月11日 17:32:42jonsetnosy: + jon
2009年10月11日 17:29:50pitrousetdependencies: + weak dict iterators are fragile because of unpredictable GC runs
2009年10月11日 14:08:33kristjan.jonssonsetmessages: + msg93859
2009年10月11日 12:32:10benjamin.petersonsetmessages: + msg93857
2009年10月11日 12:14:41pitrousetmessages: + msg93856
2009年10月11日 12:06:47kristjan.jonssonsetmessages: + msg93855
2009年10月11日 12:03:10pitrousetnosy: + pitrou, benjamin.peterson
messages: + msg93854
2009年10月10日 11:04:36kristjan.jonssonsetmessages: + msg93825
2009年10月10日 00:55:10r.david.murraysetnosy: + kristjan.jonsson
messages: + msg93815
2009年10月05日 15:16:21jnollersetmessages: + msg93605
2009年10月05日 13:18:11jnollersetnosy: + gregory.p.smith
messages: + msg93601
2009年10月05日 02:14:53r.david.murraysetmessages: + msg93571
2009年10月05日 01:41:17r.david.murraysetnosy: jnoller, r.david.murray
components: + Library (Lib)
2009年10月05日 01:40:55r.david.murraycreate

AltStyle によって変換されたページ (->オリジナル) /