Issue 10037: multiprocessing.pool processes started by worker handler stops working

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/54246

classification

Title:	multiprocessing.pool processes started by worker handler stops working
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.2, Python 3.3, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	later
Assigned To:	jnoller	Nosy List:	asksol, jafo, jnoller, nirai, sbt, ysj.ray
Priority:	critical	Keywords:	needs review, patch

Created on 2010年10月06日 11:25 by asksol, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
multiprocessing-worker-poll.patch	asksol, 2010年10月06日 11:25

Messages (8)
msg118062 - (view)	Author: Ask Solem (asksol) (Python committer)	Date: 2010年10月06日 11:25
While working on an "autoscaling" (yes, people call it that...) feature for Celery, I noticed that the processes created by the _handle_workers thread doesn't always work. I have reproduced this in general, by just using the maxtasksperchild feature and letting the workers terminate themselves so this seems to have always been an issue (just not easy to reproduce unless workers are created with some frequency) I'm not quite sure of the reason yet, but I finally managed to track it down to the workers being stuck while receiving from the queue. The patch attached seems to resolve the issue by polling the queue before trying to receive. I know this is short, I may have some more data later.
msg122262 - (view)	Author: ysj.ray (ysj.ray)	Date: 2010年11月24日 07:11
Could you give an example code which can reproduce this issue?
msg155556 - (view)	Author: Sean Reifschneider (jafo) * (Python committer)	Date: 2012年03月13日 02:04
The attached patch does change the semantics somewhat, but I don't fully understand how much. In particular: It changes the "get()" call to be turned into "get(timeout=1.0)" if inqueue doesn't have a _reader attribute. In the case that inqueue doesn't have a _reader attribute, and "inqueue._reader.poll(timeout)" is false, "get()" isn't called at all. It introduces a continue. I'd want Jesse to pronounce on this.
msg162412 - (view)	Author: Richard Oudkerk (sbt) * (Python committer)	Date: 2012年06月06日 14:12
It is not clear to me how to reproduce the bug. When you say "letting the workers terminate themselves" do mean calling sys.exit() or os._exit() in the submitted task? Are you trying to get the result of a task which caused the worker to exit? I'm not sure how the patch would change the current behaviour. The following seems to work for me: import sys, os import multiprocessing as mp if __name__ == '__main__': p = mp.Pool(4, maxtasksperchild=5) results = [] for i in range(100): if i % 10 == 0: results.append(p.apply_async(sys.exit)) else: results.append(p.apply_async(os.getpid)) for i, res in enumerate(results): if i % 10 != 0: print(res.get()) else: pass # trying res.get() would block forever
msg162467 - (view)	Author: Ask Solem (asksol) (Python committer)	Date: 2012年06月07日 09:09
Well, I still don't know exactly why restarting the socket read made it work, but the patch solved an issue where newly started pool processes would be stuck in socket read forever (happening to maybe 1/500 new processes) This and a dozen other pool related fixes are in my billiard fork of multiprocessing, e.g. what you describe in your comment: # trying res.get() would block forever works in billiard, where res.get() will raise WorkerLostError in that case. https://github.com/celery/billiard/ Earlier commit history for the pool can be found in Celery: https://github.com/ask/celery/commits/2.5/celery/concurrency/processes/pool.py My eventual goal is to merge these fixes back into Python, but except for people using Python 3.x, they would have to use billiard for quite some time anyway, so I don't feel in a hurry. I think this issue can be closed, the worker handler is simply borked and we could open up a new issue deciding how to fix it (merging billiard.Pool or someting else). (btw, Richard, you're sbt? I was trying to find your real name to give you credit for the no_execv patch in billiard)
msg162495 - (view)	Author: Richard Oudkerk (sbt) * (Python committer)	Date: 2012年06月07日 20:20
> I think this issue can be closed, the worker handler is simply borked and > we could open up a new issue deciding how to fix it (merging billiard.Pool > or someting else). OK. I am not sure which option under "Resolution" should be chosen. "Later"? > (btw, Richard, you're sbt? Yes. > I was trying to find your real name to give you credit for the no_execv > patch in billiard) The execv stuff certainly won't go in by Py3.3. There has not been consensus that adding it is a good idea. (I also have the unit tests passing with a "fork server": the server process is forked at the beginning of the program and then forked children of the server process are started on request. It is about 10 times faster then using execv, and almost as fast as simple forking.)
msg162497 - (view)	Author: Ask Solem (asksol) (Python committer)	Date: 2012年06月07日 20:57
Later works, or just close it. I can open up a new issue to merge the improvements in billiard later. > The execv stuff certainly won't go in by Py3.3. There has not been > consensus that adding it is a good idea. > (I also have the unit tests passing with a "fork server": the server >process is forked at the beginning of the program and then forked >children of the server process are started on request. It is about 10 >times faster then using execv, and almost as fast as simple forking.) Ah, a working 'fork server' would be just as good. Btw, Billiard now supports running Pool without threads, using epoll/kqueue/select instead. So Celery uses that when it can be nonblocking, and execv when it can't. It performs way better without threads, and in addition shutdown + replacing worker processes is much more responsive. Changing the default Pool is not going to happen, but ncluding a simple select() based Pool would be possible, and then it could also easily work with Twisted, Eventlet, Gevent, etc. (especially now that the Connection is rewritten in pure python).
msg162502 - (view)	Author: Richard Oudkerk (sbt) * (Python committer)	Date: 2012年06月07日 21:13
> Ah, a working 'fork server' would be just as good. Only problem is that it depends on fd passing which is apparently broken on MacOSX. > Btw, Billiard now supports running Pool without threads, using > epoll/kqueue/select instead. So Celery uses that when it can be > nonblocking, and execv when it can't. It performs way better without > threads, and in addition shutdown + replacing worker processes is much > more responsive. If it were not for Windows I would have tried to avoid using threads.

History
Date	User	Action	Args
2022年04月11日 14:57:07	admin	set	github: 54246
2012年06月11日 15:28:01	sbt	set	status: open -> closed resolution: later stage: patch review -> resolved
2012年06月07日 21:13:42	sbt	set	messages: + msg162502
2012年06月07日 20:57:48	asksol	set	messages: + msg162497
2012年06月07日 20:20:37	sbt	set	messages: + msg162495
2012年06月07日 09:09:22	asksol	set	messages: + msg162467
2012年06月06日 14:12:46	sbt	set	messages: + msg162412
2012年06月06日 12:35:14	sbt	set	nosy: + sbt
2012年03月13日 02:04:01	jafo	set	assignee: jnoller messages: + msg155556 nosy: + jafo, jnoller
2011年06月12日 18:36:09	terry.reedy	set	versions: - Python 3.1
2011年04月15日 19:53:45	nirai	set	nosy: + nirai
2010年11月24日 07:11:02	ysj.ray	set	nosy: + ysj.ray messages: + msg122262
2010年10月06日 11:25:57	asksol	create

homepage