homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Support different contexts in multiprocessing
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Olivier.Grisel, jnoller, lars, python-dev, sbt, vstinner
Priority: normal Keywords: patch

Created on 2013年09月10日 14:53 by lars, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mp_getset_start_method.patch lars, 2013年09月10日 15:39 review
context.patch sbt, 2013年10月10日 14:35 review
Messages (26)
msg197444 - (view) Author: Lars (lars) Date: 2013年09月10日 14:53
The new multiprocessing based on forkserver (issue8713) looks great, but it has two problems. The first:
"set_start_method() should not be used more than once in the program."
The documentation does not explain what the effect of calling it twice would be. Judging from the documentation, it should be possible to do
start_method = get_start_method()
if start_method is None:
 set_start_method('forkserver')
but this code shows the second problem: it always succeeds with the (undocumented!) side-effect of setting the start method in get_start_method, to the system default, if it hasn't been set already.
I was just going to forge a patch for joblib (http://pythonhosted.org/joblib/) to set the start method to forkserver at import time. But in the current state of affairs, it would be impossible for the user to safely override the start method before importing joblib, because joblib can't figure out if it's been set already without setting it.
The enclosed patch solves the problem by making the new functions more robust:
* get_start_method no longer sets anything, but returns None if the start method has not been set already;
* set_start_method raises a RuntimeError (for want of a better exception type) when resetting the start method is attempted.
Unfortunately, I had to hack up the tests a bit, because they were violating the set_start_method contract. There is a test for the new set_start_method behavior, though, and all {fork,forkserver,spawn} tests pass on Linux.
msg197447 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013年09月10日 15:13
Related question: is there any good reason that would prevent to pass a custom `start_method` kwarg to the `Pool` constructor to make it use an alternative `Popen` instance (that is an instance different from the `multiprocessing._Popen` singleton)?
This would allow libraries such as joblib to keep minimal side effect by try to impact the default multiprocessing runtime as low as possible.
msg197450 - (view) Author: Lars (lars) Date: 2013年09月10日 15:39
Cleaned up the patch.
msg197453 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年09月10日 16:04
With your patch, I think if you call get_start_method() without later calling set_start_method() then the helper process(es) will never be started.
With the current code, popen.Popen() automatically starts the helper processes if they have not already been started.
Also, set_start_method() can have the side-effect of starting helper process(es). I do not really approve of new processes being started as a side-effect of importing a library. But it is reasonable for a library to want a specific start method unless the user demands otherwise.
I will have to think this over.
BTW, the reason for discouraging using set_start_method() more than once is because some shared resources are created differently depending on what the current start method is.
For instance using the fork method semaphores are created and then immediately unlinked. But with the other start methods we must not unlink the semaphore until we are finished with it (while being paranoid about cleanup).
Maybe it would be better to have separate contexts for each start method. That way joblib could use the forkserver context without interfering with the rest of the user's program.
 from multiprocessing import forkserver_context as mp
 l = mp.Lock()
 p = mp.Process(...)
 with mp.Pool() as pool:
 ...
msg197467 - (view) Author: Lars (lars) Date: 2013年09月10日 21:15
In my patched version, the private popen.get_start_method gets a kwarg set_if_needed=True. popen.Popen calls that as before, so its behavior should not change, while the public get_start_method sets the kwarg to False.
I realise now that this has the side effect that get_start_method's output changes when multiprocessing has first been used, but then that reflects how the library works. Maybe this should be documented.
As for the contexts, those would be great.
msg197468 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年09月10日 21:48
> In my patched version, the private popen.get_start_method gets a kwarg 
> set_if_needed=True. popen.Popen calls that as before, so its behavior 
> should not change, while the public get_start_method sets the kwarg to 
> False.
My mistake.
msg197486 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013年09月11日 11:02
> Maybe it would be better to have separate contexts for each start method. That way joblib could use the forkserver context without interfering with the rest of the user's program.
Yes in general it would be great if libraries could customize the multiprocessing default options without impacting any of the module singletons. That also include the ForkingPickler registry for custom: now it's a class attribute. It would be great to be able to scope custom reducers registration to a given multiprocessing.Pool or multiprocessing.Process instance.
Now how to implement that kind of isolation: it could either be done by adding new constructor parameters or new public methods to the Process and Pool classes to be able to customize their behavior while sticking to the OOP paradigm if possible or by using a context manager as you suggested.
I am not sure which option is best. Prototyping both is probably the best way to feel the tradeoffs.
msg197518 - (view) Author: Lars (lars) Date: 2013年09月12日 10:51
I don't really see the benefit of a context manager over an argument. It's a power user feature anyway, and context managers (at least to me) signal cleanup actions, rather than construction options.
msg197523 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年09月12日 12:37
By "context" I did not really mean a context manager. I just meant an object (possibly a singleton or module) which implements the same interface as multiprocessing.
(However, it may be a good idea to also make it a context manager whose __enter__() method starts the helper processes, and whose __exit__() method shuts them down.)
msg197524 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013年09月12日 12:54
The process pool executor [1] from the concurrent futures API would be suitable to explicitly start and stop the helper process for the `forkserver` mode.
[1] http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor
The point would be to have as few state as possible encoded in the multiprocessing module (and its singletons) and move that state information to be directly managed by multiprocessing Process and Pool class instances so that libraries could customize the behavior (start_method, executable, ForkingPIckler reducers registry and so on) without mutating the state of the multiprocessing module singletons.
msg197532 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年09月12日 15:10
There are lots of things that behave differently depending on the currently set start method: Lock(), Semaphore(), Queue(), Value(), ... It is not just when creating a Process or Pool that you need to know the start method.
Passing a context or start_method argument to all of these constructors would be very awkward, which is why I think it is better to treat the context as an object with methods Process(), Pool(), Lock(), Semaphore(), etc.
Unfortunately, I do not have time to work on this just now...
msg197534 - (view) Author: Olivier Grisel (Olivier.Grisel) * Date: 2013年09月12日 15:16
Richard Oudkerk: thanks for the clarification, that makes sense. I don't have the time either in the coming month, maybe later.
msg197562 - (view) Author: Lars (lars) Date: 2013年09月13日 08:53
Ok. Do you (or jnoller?) have time to review my proposed patch, at least before 3.4 is released? I didn't see it in the release schedule, so it's probably not planned soon, but I wouldn't want the API to change *again* in 3.5.
msg197669 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年09月13日 21:46
I'll review the patch. (According to http://www.python.org/dev/peps/pep-0429/ feature freeze is expected in late November, so there is not too much of rush.)
msg199389 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年10月10日 14:35
Attached is a patch which allows the use of separate contexts. For example
 try:
 ctx = multiprocessing.get_context('forkserver')
 except ValueError:
 ctx = multiprocessing.get_context('spawn')
 q = ctx.Queue()
 p = ctx.Process(target=foo, args=(q,))
 p.start()
 ...
Also, get_start_method(allow_none=True) will return None if the start method has not yet been fixed.
msg199390 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年10月10日 14:44
BTW, the context objects are singletons.
I could not see a sensible way to make ctx.Process be a picklable class (rather than a method) if there can be multiple instances of a context type. This means that the helper processes survive until the program closes down.
msg199706 - (view) Author: Lars (lars) Date: 2013年10月13日 14:12
> BTW, the context objects are singletons.
I haven't read all of your patch yet, but does this mean a forkserver will be started regardless of whether it is later used?
That would be a good thing, since starting the fork server after reading in large data sets would mean the fork server would hold on to large swaths of memory even when the data set is deallocated in the master process.
msg199709 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年10月13日 14:38
> I haven't read all of your patch yet, but does this mean a forkserver 
> will be started regardless of whether it is later used?
No, it is started on demand. But since it is started using _posixsbuprocess.fork_exec(), nothing is inherited from the main process.
msg199710 - (view) Author: Lars (lars) Date: 2013年10月13日 14:40
Ok, great.
msg200060 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年10月16日 15:44
New changeset 72a5ac909c7a by Richard Oudkerk in branch 'default':
Issue #18999: Make multiprocessing use context objects.
http://hg.python.org/cpython/rev/72a5ac909c7a 
msg200061 - (view) Author: Lars (lars) Date: 2013年10月16日 16:37
Thanks, much better than my solution!
msg200497 - (view) Author: Lars (lars) Date: 2013年10月19日 21:07
Strange, I can't actually get it to work:
>>> from multiprocessing import Pool, get_context
>>> forkserver = get_context('forkserver')
>>> Pool(context=forkserver)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: Pool() got an unexpected keyword argument 'context'
msg200499 - (view) Author: Lars (lars) Date: 2013年10月19日 21:12
I also tried
from multiprocessing.pool import Pool
but that died with
ImportError: cannot import name get_context
msg200507 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年10月19日 22:28
I guess this should be clarified in the docs, but multiprocessing.pool.Pool is a *class* whose constructor takes a context argument, where as multiprocessing.Pool() is a *bound method* of the default context. (In previous versions multiprocessing.Pool was a *function*.)
The only reason you might need the context argument is if you have subclassed multiprocessing.pool.Pool.
>>> from multiprocessing import pool, get_context
>>> forkserver = get_context('forkserver')
>>> p = forkserver.Pool()
>>> q = pool.Pool(context=forkserver)
>>> p, q
(<multiprocessing.pool.Pool object at 0xb71f3eec>, <multiprocessing.pool.Pool object at 0xb6edb06c>)
I suppose we could just make the bound methods accept a context argument which (if not None) is used instead of self.
msg213100 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年03月10日 22:11
New changeset b941a320601a by R David Murray in branch 'default':
whatsnew: multiprocessing start methods and context (#8713 and #18999)
http://hg.python.org/cpython/rev/b941a320601a 
msg367431 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年04月27日 15:45
It seems like this issue has been fixed, so I set its status to closed.
History
Date User Action Args
2022年04月11日 14:57:50adminsetgithub: 63199
2020年04月27日 15:45:22vstinnersetstatus: open -> closed


messages: + msg367431
nosy: + vstinner
2014年03月10日 22:11:24python-devsetmessages: + msg213100
2013年10月19日 22:28:50sbtsetstatus: closed -> open

messages: + msg200507
2013年10月19日 21:12:50larssetmessages: + msg200499
2013年10月19日 21:07:03larssetmessages: + msg200497
2013年10月16日 17:20:09sbtsetstatus: open -> closed
2013年10月16日 16:37:46larssetstatus: pending -> open

messages: + msg200061
2013年10月16日 15:48:23sbtsetstatus: open -> pending
type: behavior -> enhancement
title: Robustness issues in multiprocessing.{get,set}_start_method -> Support different contexts in multiprocessing
resolution: fixed
stage: resolved
2013年10月16日 15:44:24python-devsetnosy: + python-dev
messages: + msg200060
2013年10月13日 14:40:48larssetmessages: + msg199710
2013年10月13日 14:38:12sbtsetmessages: + msg199709
2013年10月13日 14:12:15larssetmessages: + msg199706
2013年10月10日 14:44:47sbtsetmessages: + msg199390
2013年10月10日 14:35:26sbtsetfiles: + context.patch

messages: + msg199389
2013年09月13日 21:46:34sbtsetmessages: + msg197669
2013年09月13日 08:53:03larssetmessages: + msg197562
2013年09月12日 15:16:57Olivier.Griselsetmessages: + msg197534
2013年09月12日 15:10:24sbtsetmessages: + msg197532
2013年09月12日 12:54:11Olivier.Griselsetmessages: + msg197524
2013年09月12日 12:37:02sbtsetmessages: + msg197523
2013年09月12日 10:51:26larssetmessages: + msg197518
2013年09月12日 10:33:26larssetnosy: + jnoller
2013年09月11日 11:02:26Olivier.Griselsetmessages: + msg197486
2013年09月10日 21:48:54sbtsetmessages: + msg197468
2013年09月10日 21:15:33larssetmessages: + msg197467
2013年09月10日 16:04:40sbtsetmessages: + msg197453
2013年09月10日 15:39:13larssetfiles: + mp_getset_start_method.patch

messages: + msg197450
2013年09月10日 15:38:48larssetfiles: - mp_getset_start_method.patch
2013年09月10日 15:13:25Olivier.Griselsetnosy: + Olivier.Grisel
messages: + msg197447
2013年09月10日 15:05:56larssettitle: Allow multiple calls to multiprocessing.set_start_method -> Robustness issues in multiprocessing.{get,set}_start_method
2013年09月10日 14:59:57larssetnosy: + sbt
2013年09月10日 14:53:41larscreate

AltStyle によって変換されたページ (->オリジナル) /