homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Race condition in subprocess using stdin
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: DragonSA, Pankrat, Rhamphoryncus, benjamin.peterson, bpcreech, chrismiles, ggenellina, gregory.p.smith, jyasskin, oefe, pitrou, rcronk, rosslagerwall
Priority: normal Keywords:

Created on 2008年03月17日 13:31 by Pankrat, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
subprocess_race_condition.py bpcreech, 2008年10月24日 22:49 Regression test for bug
Messages (17)
msg63637 - (view) Author: Ludwig Hähne (Pankrat) * Date: 2008年03月17日 13:31
Pythons subprocess module has a race condition when stdin is used. The
problem can be reproduced with the following script (based on the script
in issue "#1731717"/"msg32210" (slightly changed to use stdin):
----
import sys, os, threading, subprocess, time
class X(threading.Thread):
 def __init__(self, *args, **kwargs):
 super(X, self).__init__(*args, **kwargs)
 self.start()
def tt():
 s = subprocess.Popen(("cat"), stdin=subprocess.PIPE)
 s.communicate(input = '#')
for i in xrange(20):
 X(target = tt)
----
On multi-processor (or multi-core) machines the script hangs fairly
reliably.
Protecting the Popen call with a lock solves the problem. I searched the
documentation if using stdin with subprocess.Popen was not thread-safe,
but found no indication.
Tested with Python 2.5.1, 2.5.2 and 2.6a1. The problem can be reproduced
with all mentioned versions.
msg63640 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008年03月17日 13:38
Unless noted, none of Python's stdlib guarantees that it's thread safe.
So , you have to lock.
msg63850 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008年03月18日 02:34
Ludwig isn't really proposing that subprocess.Popen be thread-safe. That
would imply that you could mess with the same Popen instance
concurrently from separate threads, which shouldn't be allowed. But
instead, he's asking that it not be thread-hostile: that the constructor
can be called from multiple threads. Since every call in a threaded app
would need to be protected by the same lock, and there's no good place
to put that lock, it's a reasonable request. Most existing python types
provide this guarantee too: list() can be called concurrently from lots
of threads. So I think it's a real bug.
msg63912 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008年03月18日 13:32
Thanks for clarifying, Jeffrey! I agree the Popen should be reentrant,
since it's likey to be used that way.
msg64136 - (view) Author: Ludwig Hähne (Pankrat) * Date: 2008年03月20日 01:32
Hi, I narrowed the problem down to the creation of the error pipe in
_execute_child:
>> errpipe_read, errpipe_write = os.pipe()
>> self._set_cloexec_flag(errpipe_write)
If I protect these two lines with a lock, I cannot reproduce the hang
anymore (tested against Python 2.5.1).
I'm not yet sure why this should be atomic (as there are no classic race
condition problems involved AFAICT).
By the way, this is on Linux:
> uname -a
Linux jade 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686
GNU/Linux
msg64164 - (view) Author: Ludwig Hähne (Pankrat) * Date: 2008年03月20日 10:39
Just realized that passing 'close_fds=True' also circumvents the problem:
s = subprocess.Popen(("cat"), stdin=subprocess.PIPE, close_fds=True)
Should this issue be closed as it's that easy to avoid? I would still
like to know what happens here, though.
msg64178 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年03月20日 16:27
This is easily reproducable on my OS X 10.4 macbookpro.
However your suggested two lines with the os.pipe to lock to prevent the
problem are a red herring... Locking those does not fix it.
msg68199 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008年06月14日 07:00
This is messy. File descriptors from other threads are leaking into
child processes, and if the write end of a pipe never gets closed in all
of them the read end won't get EOF.
I suspect "cat"'s stdin is getting duplicated like that, but I haven't
been able to verify - /proc/<pid>/fd claims fd 0 is /dev/pts/2. Maybe
libc does some remapping.
msg69323 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年07月06日 07:42
lowering the priority on this back to normal as there is a workaround:
use close_fds=True.
I agree that this is messy, I'm not sure if we can really fix it or even
if we should.
Running lsof shows the /bin/cat processes on OS X all having a varying
numbers of the same PIPEs open supporting Adam's hypothesis.
msg72707 - (view) Author: David Naylor (DragonSA) * Date: 2008年09月06日 21:42
I'm currently developing a script that makes extensive use of threads 
and Popen, with threads being created dynamically and each thread 
creating a large number of Popen processes. 
If I limit the thread count to 2 (main + worker) then the problem 
appears to disappear (or is just intermittent) however if I run with 
more than 2 threads or from within winpdb then the dead lock occurres 
rather consistently (and in the case of winpdb, always)
According to winpdb the script hangs on line 1086 of subprocess.py 
(from 2.5.2), strangely all remaining worker threads hand at this 
point:
 # Wait for exec to fail or succeed; possibly raising exception
==> data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
 os.close(errpipe_read)
 if data != "":
 os.waitpid(self.pid, 0)
 child_exception = pickle.loads(data)
 raise child_exception
I tried the suggestion of adding close_fds=True or using a global lock 
but the script still hangs under winpdb. A solution that did appear 
to work was having both a global lock and adding close_fds=True to the 
call list. 
Running the script under pdb or cProfile appears to fix the problem as 
well...
NOTE: winpdb appears to bring out the worst case scenario and reliably 
reproduces the problem.
This is running on FreeBSD 8-Current amd64 (from early August) with 2 
cores.
msg75196 - (view) Author: Ben Creech (bpcreech) Date: 2008年10月24日 22:49
You can trigger this bug without use of stdin=subprocess.PIPE, or for
that matter any subprocess.PIPE's, although that makes it worse. (So
the name for this issue may be overly specific.)
I have seen this happening intermittently in our parallel job manager,
and verified the lockup via gdb.
I've attached a regression test that should reliably trigger it without
use of any pipes for stdin, stderr, or stdout. By kludging
subprocess.py with an extra "sleep", we can reliably trigger the race
condition.
The test launches two children, each from a thread. Since there are no
calls to Popen.wait(), the main process should exit quickly, but it
never does.
Indeed, setting close_fds=True will correct the problem, although there
may be rare instances wherein the user wants the child to inherit some
file descriptors (for instance, this is how you usually use UNIX domain
sockets).
The problem is, as hinted at by Adam Olsen, as follows.
Say we have two threads, A and B, concurrently executing
subprocess.Popen.__init__.
In subprocess.Popen._execute_child, thread B is sitting between the
following lines:
errpipe_read, errpipe_write = os.pipe()
self._set_cloexec_flag(errpipe_write)
... when thread A calls fork().
Thus, thread A's child inherits both pipes, and never closes either one
when it calls execv.
Until thread A's child exits, that errpipe_write handle created in
thread B will remain open.
So, thread B will not return from the "os.read(errpipe_read...)" line,
and thus Popen.__init__ until thread A's child exits. That is surely a bug.
On the other hand, it is hard to fix. Ludwig Hähne suggested wrapping
the pipe creation and cloexec in a mutex. As indicated by Gregory P.
Smith, that will not work reliably, as fork() could be called in one
thread while another thread is between those operations. The mutex
would have to block the fork from happening during the atomic
create/cloexec operations.
Additionally, a parent could be creating a stdin output pipe end and
stdout/stderr input pipe ends for a particular subprocess. Due to the
pooled nature of POSIX file descriptors, if another subprocess is
launched in a separate thread, it will inherit those pipe ends! That
could cause unforeseen artifacts such as Popen.stdout hanging after
termination of the process that stdout was intended (because ANOTHER
process also inherited those pipe ends, and is still running).
Anyways, like Mr. Olson said, this is a mess. close_fds=True is
probably the best option. Perhaps it would be best to add an argument
allowing for a list of FD's to NOT close, in case the user is
intentionally trying to pass on additional file descriptors (eg UNIX
domain sockets).
I would vote for a change to set close_fds=True by default, and adding
an explicit list of inherited_fds. This is based on the logic that
setting close_fds=False is unsafe and should not be default.
If that is too extreme, foolish users like myself at least need a huge
warning in the documentation, and an inherited_fds parameter would still
be useful.
msg89219 - (view) Author: Robert Cronk (rcronk) Date: 2009年06月10日 16:58
Could this problem be associated with issue4749? It was found that 
something goes wrong when two cmd children processes are spawned from 
different threads, when the first exits, it is closing file handles 
shared with the first (or something like that) and it's causing a 
problem with logging in issue4749. That bug has been closed since it's 
not a problem with logging itself so I'm searching for other similar 
bugs to see if we can create a new bug that documents the cause and 
link to these other bugs that are all showing different symptoms of the 
bug. Thoughts?
msg92674 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009年09月16日 09:52
@Robert: Yes, I'd say this is the same problem as issue4749 
msg125509 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月06日 06:36
This is likely solved in py3k 3.2 with the C _posixsubprocess.c module. The pipe creation and cloexec flag setting is all done atomically when possible, and at least with the GIL held when not. close_fds also now defaults to True.
msg125910 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年01月10日 18:01
I confirm that it works reliably under 3.2, while it hangs as reliably under 2.7 and 3.1. Since fixing involves a C extension that is too heavy to backport to bugfix branches, I suggest closing.
By the way: triagers, please don't set stage to "unit test needed". We don't need an unit test before a proper decision about the issue has been done, and actually we don't require reporters to write an unit test either (it's done as part of the patch, usually).
msg125936 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011年01月10日 22:03
btw, I will be backporting all recent subprocess changes to http://code.google.com/p/python-subprocess32/
there have been a lot of changes recently, i was waiting for that to settle down before bring it up to current. 3.2rc1 sounds like a good time.
msg133046 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011年04月05日 16:20
Closing this as fixed since it has been fixed on 3.2 and decided not to be backported on 3.1 and 2.7.
History
Date User Action Args
2022年04月11日 14:56:31adminsetgithub: 46573
2011年04月05日 16:20:40rosslagerwallsetstatus: open -> closed

nosy: + rosslagerwall
messages: + msg133046

resolution: out of date -> fixed
2011年01月10日 22:03:13gregory.p.smithsetstatus: pending -> open
nosy: gregory.p.smith, oefe, ggenellina, Rhamphoryncus, pitrou, jyasskin, benjamin.peterson, Pankrat, bpcreech, DragonSA, chrismiles, rcronk
messages: + msg125936
2011年01月10日 18:01:26pitrousetstatus: open -> pending

versions: + Python 3.1, - Python 3.2
nosy: + pitrou

messages: + msg125910
resolution: out of date
stage: test needed ->
2011年01月06日 06:36:24gregory.p.smithsetnosy: gregory.p.smith, oefe, ggenellina, Rhamphoryncus, jyasskin, benjamin.peterson, Pankrat, bpcreech, DragonSA, chrismiles, rcronk
messages: + msg125509
2010年08月04日 04:45:49terry.reedysetstage: test needed
versions: + Python 2.7, Python 3.2, - Python 2.6, Python 2.5
2010年03月04日 03:07:20gregory.p.smithsetassignee: gregory.p.smith
2009年09月16日 09:52:24ggenellinasetnosy: + ggenellina
messages: + msg92674
2009年09月08日 05:23:39chrismilessetnosy: + chrismiles
2009年06月10日 16:58:42rcronksetnosy: + rcronk
messages: + msg89219
2009年01月11日 16:44:07oefesetnosy: + oefe
2008年10月24日 22:49:17bpcreechsetfiles: + subprocess_race_condition.py
nosy: + bpcreech
messages: + msg75196
2008年09月06日 21:42:27DragonSAsetnosy: + DragonSA
messages: + msg72707
2008年07月06日 07:42:26gregory.p.smithsetpriority: critical -> normal
assignee: gregory.p.smith -> (no value)
messages: + msg69323
2008年06月14日 07:00:04Rhamphoryncussetmessages: + msg68199
2008年06月10日 19:16:38Rhamphoryncussetnosy: + Rhamphoryncus
2008年03月21日 18:24:24christian.heimessetpriority: critical
type: behavior
2008年03月20日 16:27:45gregory.p.smithsetmessages: + msg64178
2008年03月20日 15:41:51gregory.p.smithsetassignee: gregory.p.smith
nosy: + gregory.p.smith
2008年03月20日 10:39:14Pankratsetmessages: + msg64164
2008年03月20日 01:32:51Pankratsetmessages: + msg64136
2008年03月18日 13:32:01benjamin.petersonsetmessages: + msg63912
2008年03月18日 02:34:19jyasskinsetnosy: + jyasskin
messages: + msg63850
2008年03月17日 13:38:07benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg63640
2008年03月17日 13:31:51Pankratcreate

AltStyle によって変換されたページ (->オリジナル) /