I have a couple subprocess instances I'd like to string together into a pipeline, but I am stuck and would like to ask for advice.
For example, to mimic:
cat data | foo - | bar - > result
Or:
foo - < data | bar - > result
...I first tried the following, which hangs:
import subprocess, sys
firstProcess = subprocess.Popen(['foo', '-'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
secondProcess = subprocess.Popen(['bar', '-'], stdin=firstProcess.stdout,
stdout=sys.stdout)
for line in sys.stdin:
firstProcess.stdin.write(line)
firstProcess.stdin.flush()
firstProcess.stdin.close()
firstProcess.wait()
My second attempt uses one subprocess instance with the shell=True parameter, which works:
import subprocess, sys
pipedProcess = subprocess.Popen(" ".join(['foo', '-', '|', 'bar', '-']),
stdin=subprocess.PIPE, shell=True)
for line in sys.stdin:
pipedProcess.stdin.write(line)
pipedProcess.stdin.flush()
pipedProcess.stdin.close()
pipedProcess.wait()
What am I doing wrong with the first, chained subprocess approach? I read that it is best not to use shell=True and I'm curious what I'm doing wrong with the first approach. Thanks for your advice.
EDIT
I fixed a typo in my question and fixed the stdin parameter of secondProcess. It still hangs.
I also tried removing firstProcess.wait() which resolves the hang, but then I get a 0-byte file as result.
I'll stick with the pipedProcess, since it works fine. But if anyone knows why the first setup hangs or makes a 0-byte file as output, I'd be interested to know why as well.
2 Answers 2
shell=True works because you're asking the shell to interpret your entire command line and handle the piping itself. It is effectively as if you typed foo - | bar - directly into the shell.
(This is also why it can be unsafe to use shell=True; there are many ways to fool the shell into doing bad things that won't happen if you directly pass the command and arguments in as a list that isn't subject to parsing by any intermediaries.)
2 Comments
foo and bar though (sort and uniq, actually), reading some random text on stdin. (On further thought, that's probably because uniq wouldn't exit before sort...)To fix the first example, add foo_process.stdout.close() as the docs suggest. The following code emulates foo - | bar - command:
#!/usr/bin/python
from subprocess import Popen, PIPE
foo_process = Popen(['foo', '-'], stdout=PIPE)
bar_process = Popen(['bar', '-'], stdin=foo_process.stdout)
foo_process.stdout.close() # allow foo to know if bar ends
bar_process.communicate() # equivalent to bar_process.wait() in this case
You don't need to use sys.stdin, sys.stdout explicitly here unless their different from sys.__stdin__, sys.__stdout__.
To emulate foo - < data | bar - > result command:
#!/usr/bin/python
from subprocess import Popen, PIPE
with open('data','rb') as input_file, open('result', 'wb') as output_file:
foo = Popen(['foo', '-'], stdin=input_file, stdout=PIPE)
bar = Popen(['bar', '-'], stdin=foo.stdout, stdout=output_file)
foo.stdout.close() # allow foo to know if bar ends
bar.wait()
If you want to feed modified input line-by-line to the foo process i.e., to emulate python modify_input.py | foo - | bar - command:
#!/usr/bin/python
import sys
from subprocess import Popen, PIPE
foo_process = Popen(['foo', '-'], stdin=PIPE, stdout=PIPE)
bar_process = Popen(['bar', '-'], stdin=foo_process.stdout)
foo_process.stdout.close() # allow foo to know if bar ends
for line in sys.stdin:
print >>foo_process.stdin, "PY", line, # modify input, feed it to `foo`
foo_process.stdin.close() # tell foo there is no more input
bar_process.wait()
7 Comments
bar_process.communicate()? Wouldn't I write data to foo_process?communicate() would likely be standing in for wait(), since bar_process will never provide output to Python.foo. foo process can read from stdin by itself).bar_process.stdin is None so indeed bar_process.communicate() is just bar_process.wait() in this case.stdin. It doesn't look like for line in sys.stdin: ... bar_process.communicate(line) works.Explore related questions
See similar questions with these tags.
barbe the stdout offoorather than its stdin?firstProcess = Popen(['foo', '-'], stdin=sys.stdin, stdout=PIPE)works.