Normally, pipelines in Unix are used to connect two commands and use the output of the first command as the input of the second command. However, I recently come up with the idea (which may not be new, but I didn't find much Googling) of using pipeline to run several commands in parallel, like this:
command1 | command2
This will invoke command1
and command2
in parallel even if command2
does not read from standard input and command1
does not write to standard output. A minimal example to illustrate this is (please run it in an interactive shell)
ls . -R 1>&2|ls . -R
My question is, are there any downsides to use pipeline to parallelize the execution of two commands in this way? Are there anything that I have missed in this idea?
Thank you very much in advance.
2 Answers 2
Command pipelines already run in parallel. With the command:
command1 | command2
Both command1
and command2
are started. If command2
is scheduled and the pipe is empty, it blocks waiting to read. If command1
tries to write to the pipe and its full, command1
blocks until there's room to write. Otherwise, both command1
and command2
execute in parallel, writing to and reading from the pipe.
-
Thank you for your explanation. So I can use this to parallelize commands without resorting to
parallel
, and there are no downsides. Or furthermore, this is normal practice. Is is fair to say that?Weijun Zhou– Weijun Zhou2017年12月08日 22:00:09 +00:00Commented Dec 8, 2017 at 22:00 -
1The behavior that you described is today's reality. If you run
command1 | command2
wherecommand1
does not write to standard output andcommand2
does not read from standard input, they will run in parallel.Andy Dalton– Andy Dalton2017年12月08日 22:02:59 +00:00Commented Dec 8, 2017 at 22:02 -
Thank you for clarifying, especially about the blocking mechanism.Weijun Zhou– Weijun Zhou2017年12月08日 22:13:26 +00:00Commented Dec 8, 2017 at 22:13
There are downsides...
- you cannot see the output of
command1
- if
command2
doesn't read the output ofcommand1
, the latter will hang after writing some amount of output (I have seen 4K, but experimentally the limit is around 58K at least for a python process, see below). This may depend on the runtime used bycommand1
. - if
command2
stops beforecommand1
andcommand1
writes to its stdout, it will get[Errno 32] Broken pipe
Experiment:
cmd1#! /usr/bin/python3
import sys,time
for i in range(64):
print ("*"*1023,file=sys.stdout)
print ("cmd1 here (%d)" % i,file=sys.stderr)
time.sleep(.1)
print ("cmd1 exiting",file=sys.stderr)
cmd2
#! /usr/bin/python3
import sys,time
for i in range(16):
print ("cmd2 here (%d)" % i,file=sys.stderr)
time.sleep(1)
print ("cmd2 exiting",file=sys.stderr)
Run:
./cmd1 | ./cmd2
You will see:
cmd1
stalling at iteration 58 (because cmd2 never reads anything from its output)cmd1
crashing (broken pipe) when cmd2 exits
So yes, maybe it can work. And maybe not.
-
Thank you for providing more insights to the problem. I have tested it and the result is like what you said if
cmd1
does write to standard output and I silently ignore it. However I see no problems ifcmd1
does not write to standard output at all (commenting out the line related to standard output), which is what I asked about in the original question.Weijun Zhou– Weijun Zhou2017年12月08日 23:00:27 +00:00Commented Dec 8, 2017 at 23:00 -
1If you want to, you can do
cmd >/dev/null | othercmd
so you don't have the blocking problem at least regarding the output ofcmd
. It looks rather silly, but works in bash, ksh and dash (not in zsh, but I think zsh splits the output to both redirections)ilkkachu– ilkkachu2017年12月08日 23:40:34 +00:00Commented Dec 8, 2017 at 23:40
command1 & command2
?|
with&
in my example and run in an interactive shell you can see the difference. Thank you anyway for your comment.|
with&
in your example created exactly the same output. The difference is that&
is specifically designed to execute several commands in parallel. A pipeline is simply the wrong tool for the task