2

I have a couple subprocess instances I'd like to string together into a pipeline, but I am stuck and would like to ask for advice.

For example, to mimic:

cat data | foo - | bar - > result

Or:

foo - < data | bar - > result

...I first tried the following, which hangs:

import subprocess, sys
firstProcess = subprocess.Popen(['foo', '-'], stdin=subprocess.PIPE,
 stdout=subprocess.PIPE)
secondProcess = subprocess.Popen(['bar', '-'], stdin=firstProcess.stdout,
 stdout=sys.stdout)
for line in sys.stdin:
 firstProcess.stdin.write(line)
 firstProcess.stdin.flush()
firstProcess.stdin.close()
firstProcess.wait()

My second attempt uses one subprocess instance with the shell=True parameter, which works:

import subprocess, sys
pipedProcess = subprocess.Popen(" ".join(['foo', '-', '|', 'bar', '-']),
 stdin=subprocess.PIPE, shell=True)
for line in sys.stdin:
 pipedProcess.stdin.write(line)
 pipedProcess.stdin.flush()
pipedProcess.stdin.close()
pipedProcess.wait()

What am I doing wrong with the first, chained subprocess approach? I read that it is best not to use shell=True and I'm curious what I'm doing wrong with the first approach. Thanks for your advice.

EDIT

I fixed a typo in my question and fixed the stdin parameter of secondProcess. It still hangs.

I also tried removing firstProcess.wait() which resolves the hang, but then I get a 0-byte file as result.

I'll stick with the pipedProcess, since it works fine. But if anyone knows why the first setup hangs or makes a 0-byte file as output, I'd be interested to know why as well.

Mattie B
21.5k7 gold badges39 silver badges57 bronze badges
asked Mar 12, 2013 at 23:18
3
  • 2
    Shouldn't the stdin for bar be the stdout of foo rather than its stdin? Commented Mar 12, 2013 at 23:47
  • It should: I had a typo. I have fixed it in my question. Commented Mar 13, 2013 at 0:02
  • If you're only copying your stdin to the child's stdin then firstProcess = Popen(['foo', '-'], stdin=sys.stdin, stdout=PIPE) works. Commented Mar 13, 2013 at 15:31

2 Answers 2

2

shell=True works because you're asking the shell to interpret your entire command line and handle the piping itself. It is effectively as if you typed foo - | bar - directly into the shell.

(This is also why it can be unsafe to use shell=True; there are many ways to fool the shell into doing bad things that won't happen if you directly pass the command and arguments in as a list that isn't subject to parsing by any intermediaries.)

answered Mar 12, 2013 at 23:50
Sign up to request clarification or add additional context in comments.

2 Comments

I think I added a typo and meant to have it the way you have it. And indeed I checked and that's the case. I'll edit my question accordingly.
Gotcha. I removed the code as it is now superfluous. I should add that it worked for me with some basic standins for foo and bar though (sort and uniq, actually), reading some random text on stdin. (On further thought, that's probably because uniq wouldn't exit before sort...)
1

To fix the first example, add foo_process.stdout.close() as the docs suggest. The following code emulates foo - | bar - command:

#!/usr/bin/python
from subprocess import Popen, PIPE
foo_process = Popen(['foo', '-'], stdout=PIPE)
bar_process = Popen(['bar', '-'], stdin=foo_process.stdout)
foo_process.stdout.close() # allow foo to know if bar ends
bar_process.communicate() # equivalent to bar_process.wait() in this case 

You don't need to use sys.stdin, sys.stdout explicitly here unless their different from sys.__stdin__, sys.__stdout__.

To emulate foo - < data | bar - > result command:

#!/usr/bin/python
from subprocess import Popen, PIPE
with open('data','rb') as input_file, open('result', 'wb') as output_file:
 foo = Popen(['foo', '-'], stdin=input_file, stdout=PIPE)
 bar = Popen(['bar', '-'], stdin=foo.stdout, stdout=output_file)
 foo.stdout.close() # allow foo to know if bar ends
bar.wait()

If you want to feed modified input line-by-line to the foo process i.e., to emulate python modify_input.py | foo - | bar - command:

#!/usr/bin/python
import sys
from subprocess import Popen, PIPE
foo_process = Popen(['foo', '-'], stdin=PIPE, stdout=PIPE)
bar_process = Popen(['bar', '-'], stdin=foo_process.stdout)
foo_process.stdout.close() # allow foo to know if bar ends
for line in sys.stdin:
 print >>foo_process.stdin, "PY", line, # modify input, feed it to `foo`
foo_process.stdin.close() # tell foo there is no more input
bar_process.wait()
answered Mar 13, 2013 at 0:17

7 Comments

Why would I use bar_process.communicate()? Wouldn't I write data to foo_process?
communicate() would likely be standing in for wait(), since bar_process will never provide output to Python.
@AlexReynolds: the code emulates your 1st example (there is no need to first read from stdin, only to write it immediately to foo. foo process can read from stdin by itself).
@AlexReynolds: bar_process.stdin is None so indeed bar_process.communicate() is just bar_process.wait() in this case.
Unfortunately, this didn't work. I need to process data one line at a time, in any case, and also be able to handle file handles other than stdin. It doesn't look like for line in sys.stdin: ... bar_process.communicate(line) works.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.