Pipe to multiple files in the shell

Question 1

I have an application which will produce a large amount of data which I do not wish to store onto the disk. The application mostly outputs data which I do not wish to use, but a set of useful information that must be split into separate files. For example, given the following output:

JUNK
JUNK
JUNK
JUNK
A 1
JUNK
B 5
C 1
JUNK

I could run the application three times like so:

./app | grep A > A.out
./app | grep B > B.out
./app | grep C > C.out

This would get me what I want, but it would take too long. I also don't want to dump all the outputs to a single file and parse through that.

Is there any way to combine the three operations shown above in such a way that I only need to run the application once and still get three separate output files?

Question 2

If you have tee

./app | tee >(grep A > A.out) >(grep B > B.out) >(grep C > C.out) > /dev/null

(from here)

(about process substitution)

Question 3

Awesome, this could also be rendered as: ./app | tee >(grep A > A.out) >(grep B > B.out) | grep C > C.out

Question 4

This answer is currently the only accurate one, given the question's original title "pipe to multiple processes".

Question 5

+1. This is the most generally-applicable answer, since it doesn't depend on the fact that the specific filtering command was grep.

Question 6

I would agree that this is the best answer for the question posed and should be marked so. Parallel is another solution (as posted) but having done some timed comparisons the above example is more efficient. If the op instead involved highly cpu intensive operations such as multiple file compression or multiple mp3 conversion then no doubt the parallel solution should prove to be more effective.

Question 7

You can use awk

./app | awk '/A/{ print > "A.out"}; /B/{ print > "B.out"}; /C/{ print > "C.out"}'

Question 8

The question's title is pipe to multiple processes, this answer is about "piping" (dispatching by regex) to multiple files. Since this answer was accepted, the question's title should be changed accordingly.

Question 9

@PauloMadeira You are right. What do you think would be a better title?

Question 10

I've suggested a very small edit "Pipe to multiple files in the shell", it's pending revision, check it out. I was expecting to remove the comment if it was accepted.

Question 11

@PauloMadeira - I've changed the title. Didn't see your edit, but you're correct, the use of processes in the title was incorrect if this is the accepted answer.

Question 12

You could also use your shell's pattern matching abilities:

./app | while read line; do 
 [[ "$line" =~ A ]] && echo $line >> A.out; 
 [[ "$line" =~ B ]] && echo $line >> B.out; 
 [[ "$line" =~ C ]] && echo $line >> C.out; 
 done

Or even:

./app | while read line; do for foo in A B C; do 
 [[ "$line" =~ "$foo" ]] && echo $line >> "$foo".out; 
 done; done

A safer way that can deal with backslashes and lines starting with -:

./app | while IFS= read -r line; do for foo in A B C; do 
 [[ "$line" =~ "$foo" ]] && printf -- "$line\n" >> "$foo".out; 
 done; done

As @StephaneChazelas points out in the comments, this is not very efficient. The best solution is probably @AurélienOoms'.

Question 13

That assumes the input doesn't contain backslashes or blanks or wildcard characters, or lines that start with -n, -e... It's also going to be terribly inefficient as it means several system calls per line (one read(2) per character, the file being open, writing closed for each line...). Generally, using while read loops to process text in shells is bad practice.

Question 14

@StephaneChazelas I edited my answer. It should work with backslashes and -n etc now. As far as I can tell both versions work OK with blanks though, am I wrong?

Question 15

No, the first argument to printf is the format. There's no reason for leaving you variables unquoted in there.

Question 16

This will also break in bash (and other shells that use cstrings in a similar way) if there are nulls in the input.

Question 17

If you have multiple cores and you want the processes to be in parallel, you can do:

parallel -j 3 -- './app | grep A > A.out' './app | grep B > B.out' './app | grep C > C.out'

This will spawn three processes in parallel cores. If you want there to be some output to the console, or a master file, it has the advantage of keeping the output in some order, rather that mixing it.

The gnu utility parallel from Ole Tange can be obtained from most repos under the name parallel or moreutils. Source can be obtained from Savannah.gnu.org. Also an introductory instructional video is here.

Addendum

Using the more recent version of parallel (not necessarily the version in your distribution repo), you can use the more elegant construct:

./app | parallel -j3 -k --pipe 'grep {1} >> {1}.log' ::: 'A' 'B' 'C'

Which achieves the result of running one ./app and 3 parallel grep processes in separate cores or threads (as determined by parallel itself, also consider the -j3 to be optional, but it is supplied in this example for instructive purposes).

The newer version of parallel can be obtained by doing:

wget http://ftpmirror.gnu.org/parallel/parallel-20131022.tar.bz2

Then the usual unpack, cd to parallel-{date}, ./configure && make, sudo make install. This will install parallel, man page parallel and man page parallel_tutorial.

Question 18

Here's one in Perl:

./app | perl -ne 'BEGIN {open(FDA, ">A.out") and 
 open(FDB, ">B.out") and 
 open(FDC, ">C.out") or die("Cannot open files: $!\n")} 
 print FDA $_ if /A/; print FDB $_ if /B/; print FDC $_ if /C/'

Question 19

sed -ne/A/w\ A.out -e/B/w\ B.out -e/C/p <in >C.out

...if <in is readable all three outfiles will be truncated before anything is written to them.

Mmmh mmh Mmmh mmh 8981 gold badge7 silver badges7 bronze badges · Accepted Answer · 2013-10-26 18:05:01Z

80

If you have tee

./app | tee >(grep A > A.out) >(grep B > B.out) >(grep C > C.out) > /dev/null

(from here)

(about process substitution)

Share

Improve this answer

edited Apr 13, 2017 at 12:36

Community's user avatar

Community Bot

1

answered Oct 26, 2013 at 18:05

Mmmh mmh's user avatar

Mmmh mmh Mmmh mmh

8981 gold badge7 silver badges7 bronze badges

4

4

Awesome, this could also be rendered as: ./app | tee >(grep A > A.out) >(grep B > B.out) | grep C > C.out

evilsoup
– evilsoup

2013年10月26日 18:13:00 +00:00
Commented Oct 26, 2013 at 18:13
7

This answer is currently the only accurate one, given the question's original title "pipe to multiple processes".

acelent
– acelent

2013年10月26日 18:52:33 +00:00
Commented Oct 26, 2013 at 18:52
3

+1. This is the most generally-applicable answer, since it doesn't depend on the fact that the specific filtering command was grep.

ruakh
– ruakh

2013年10月26日 19:43:51 +00:00
Commented Oct 26, 2013 at 19:43
1

I would agree that this is the best answer for the question posed and should be marked so. Parallel is another solution (as posted) but having done some timed comparisons the above example is more efficient. If the op instead involved highly cpu intensive operations such as multiple file compression or multiple mp3 conversion then no doubt the parallel solution should prove to be more effective.

AsymLabs
– AsymLabs

2013年10月28日 17:01:19 +00:00
Commented Oct 28, 2013 at 17:01

Add a comment |

Stack Exchange Network

Pipe to multiple files in the shell

6 Answers 6

You must log in to answer this question.

Linked

Hot Network Questions

Pipe to multiple files in the shell

6 Answers 6

You must log in to answer this question.

Linked

Related

Hot Network Questions