In what order do piped commands run?

Question 1

I've never really thought about how the shell actually executes piped commands. I've always been told that the "stdout of one program gets piped into the stdin of another," as a way of thinking about pipes. So naturally, I thought that in the case of say, A | B, A would run first, then B gets the stdout of A, and uses the stdout of A as its input.

But I've noticed that when people search for a particular process in ps, they'd include grep -v "grep" at the end of the command to make sure that grep doesn't appear in the final output.
This means that in the command ps aux | grep "bash" | grep -v "grep" it is implied that ps knew that grep was running and therefore is in the output of ps. But if ps finishes running before its output gets piped to grep, how did it know that grep was running?

flamingtoast@FTOAST-UBUNTU: ~$ ps | grep ".*"
PID TTY TIME CMD
3773 pts/0 00:00:00 bash
3784 pts/0 00:00:00 ps
3785 pts/0 00:00:00 grep

Question 2

Piped commands run concurrently. When you run ps | grep ..., it's the luck of the draw (or a matter of details of the workings of the shell combined with scheduler fine-tuning deep in the bowels of the kernel) as to whether ps or grep starts first, and in any case they continue to execute concurrently.

This is very commonly used to allow the second program to process data as it comes out from the first program, before the first program has completed its operation. For example

grep pattern very-large-file | tr a-z A-Z

begins to display the matching lines in uppercase even before grep has finished traversing the large file.

grep pattern very-large-file | head -n 1

displays the first matching line, and may stop processing well before grep has finished reading its input file.

If you read somewhere that piped programs run in sequence, flee this document. Piped programs run concurrently and always have.

Question 3

And what's cool about this example is that when head gets the one line it needs, it terminates and when grep notices this, it also terminates without doing a bunch of further work for nothing.

Question 4

I guess there is some kind of IO buffer concerning the pipe... how do I know it's size in bytes? What do I want to read to learn more about it? :)

Question 5

@naxa There are two buffers, actually. There's the stdio buffer inside the grep program, and there's a buffer managed by the kernel in the pipe itself. For the latter, see How big is the pipe buffer?

Question 6

The grep option --line-buffered will keep grep from buffering. Thus, grep --line-buffered First_Line very-big-file | ...` will produce its output at once, not waiting for an output bufferfull.

Question 7

The order the commands are run actually doesn't matter and isn't guaranteed. Leaving aside the arcane details of pipe(), fork(), dup() and execve(), the shell first creates the pipe, the conduit for the data that will flow between the processes, and then creates the processes with the ends of the pipe connected to them. The first process that is run may block waiting for input from the second process, or block waiting for the second process to start reading data from the pipe. These waits can be arbitrarily long and don't matter. Whichever order the processes are run, the data eventually gets transferred and everything works.

Question 8

Nice answer, but the OP seems to think the processes run sequentially. You might make it clearer here that the processes are run concurrently, and the pipe is like.... a pipe between buckets, where water flows through all at the (approx.) same time.

Question 9

Thank you for the clarification. The sources I've been reading made it seem like piped programs ran sequentially, rather than concurrently.

Question 10

To see experience the processes starting in an undetermined fashion try running this 1000 times: echo -n a >&2 | echo b >&2

Question 11

Maybe a naive question...Has the first process being blocked/waiting on the second process (for a piece information) anything to do w/ the pipe? But then pipes must be bidirectional?

Question 12

@stdout If the writing process is blocked it is because the pipe is full and the reading process needs to read some data out of the pipe so that there is space. If the reading process is blocked it is because the pipe is empty and the writing process needs to write some data into the pipe. The blocking happens because either the pipe is empty or the pipe is full (pipes have finite size of course).

Question 13

At the risk of beating a dead horse, the misconception seems to be that

 A | B

is equivalent to

 A > temporary_file
 B < temporary_file
 rm temporary_file

But, back when Unix was created and children rode dinosaurs to school, disks were very small, and it was common for a rather benign command to consume all the free space in a file system. If B was something like grep some_very_obscure_string, the final output of the pipeline could be much smaller than that intermediate file. Therefore, the pipe was developed, not as a shorthand for the "run A first, and then run B with input from A’s output" model, but as a way for B to execute concurrently with A and eliminate the need for storing the intermediate file on disk.

Question 14

Typically you run this under bash. process working and starting concurrently, but are running by the shell in parallel. How is it possible?

if it isn't last command in pipe, create unnamed pipe with pair of sockets
fork
in child reassign stdin/stdout to sockets if it's needed (for first process in pipe stdin is not reassigned, the same for last process and his stdout)
in child EXEC specified command with arguments that sweep out original shell code, but leaves all opened by them sockets. child process ID will not be changed because this is the same child process
concurrently with child but parallel under main shell go to step 1.

system not guarantee how fast exec will be executed and specified command starts. it's independent to the shell, but system. This is because:

ps auxww| grep ps | cat

once show grep and/or ps command, and next now. It depends how fast kernel really start processes using system exec function.

Question 15

Concurrent execution means that two or more processes execute within the same time frame, usually with some sort of dependency between them. Parallel execution means that two or more processes execute simultaneously (e.g. on separate CPU cores at the same time). Parallelism is not relevant to the question, nor is "how fast" exec() is executed, but how the exec() calls and execution of the programs in a pipe are interleaved.

Question 16

Albeit all programs seem to start concurrently, they don't seem to execute in a real concurrent fashion. I wrote a dummy program (below) that can write 1 or more lines, sleep a number of seconds, read 1 or more lines, or check if the head of the pipe is alive, and I get the following:

$ ./a.out w5 s1 w5 s1 w50 | ./a.out r | ./a.out r1 check r & ps -ef | grep a.out
[3] 10908
ale 10906 8066 0 10:24 pts/6 00:00:00 ./a.out w5 s1 w5 s1 w50
ale 10907 8066 0 10:24 pts/6 00:00:00 ./a.out r
ale 10908 8066 0 10:24 pts/6 00:00:00 ./a.out r1 check r
ale 10910 8066 0 10:24 pts/6 00:00:00 grep a.out
$ 2025年03月04日T10:24:56.628434+01:00 pcale dummy pipe[10906]: exiting (read 0 bytes)
2025年03月04日T10:24:56.628925+01:00 pcale dummy pipe[10907]: exiting (read 2769 bytes)
2025年03月04日T10:24:56.629188+01:00 pcale dummy pipe[10908]: kill 10906: No such process
2025年03月04日T10:24:56.629377+01:00 pcale dummy pipe[10908]: exiting (read 2769 bytes)

That is, by the time the third instance starts reading, the first one has already finished, despite the time it spent sleeping. This means that at some point all of those 2,769 bytes rows are stored together.

Increasing the amount of data, as Scott noted, concurrency enters.

$ ./a.out w5 s1 w5 s1 w150 | ./a.out r | ./a.out r1 check r & ps -ef | grep a.out
[3] 11254
ale 11252 8066 0 10:31 pts/6 00:00:00 ./a.out w5 s1 w5 s1 w150
ale 11253 8066 0 10:31 pts/6 00:00:00 ./a.out r
ale 11254 8066 0 10:31 pts/6 00:00:00 ./a.out r1 check r
ale 11256 8066 0 10:31 pts/6 00:00:00 grep a.out
$ 2025年03月04日T10:31:13.002766+01:00 pcale dummy pipe[11252]: exiting (read 0 bytes)
2025年03月04日T10:31:13.003068+01:00 pcale dummy pipe[11254]: kill 11252: still running
2025年03月04日T10:31:13.003235+01:00 pcale dummy pipe[11253]: exiting (read 7569 bytes)
2025年03月04日T10:31:13.003502+01:00 pcale dummy pipe[11254]: exiting (read 7569 bytes)

I tried to determine by dichotomic search the value that triggers the buffering strategy, but couldn't. It changes from time to time.

The output is logged, and tail -f syslog runs in the background. closelog() is used to flush syslog, and the output appears all of a sudden.

Here's the dummy program, if you wanna play with it:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <syslog.h>
#include <errno.h>
#include <limits.h>
static void do_open(void)
{
 openlog("dummy pipe", LOG_PID, LOG_USER);
}
int main(int argc, char *argv[])
{
 do_open();
 pid_t me = 1;
 int wrote = 0, n;
 long total_read = 0;
 for (int i = 1; i < argc; ++i)
 {
 if (argv[i][0] == 'w' && (n = atoi(&argv[i][1])) > 0)
 {
 if (wrote == 0)
 {
 me = getpid();
 setsid();
 wrote = 1;
 }
 if (n --> 0)
 printf("Pid: %d\n", me);
 while (n --> 0)
 puts("Lorem ipsum lorem ipsum lorem ipsum lorem ipsum");
 }
 else if (argv[i][0] == 's' && (n = atoi(&argv[i][1])) > 0)
 sleep(n);
 else if (argv[i][0] == 'r')
 {
 if (argv[i][1] == 0)
 n = INT_MAX;
 else
 n = atoi(&argv[i][1]);
 char buf[1024];
 while (n --> 0)
 {
 char *p = fgets(buf, sizeof buf, stdin);
 if (p == NULL)
 break;
 if (buf[0] == 'P')
 sscanf(buf, "Pid: %d\n", &me);
 total_read += strlen(buf);
 if (!isatty(fileno(stdout)))
 printf("%s", buf);
 }
 }
 else if (strcmp(argv[i], "check") == 0)
 {
 if (me != 1)
 {
 n = kill(me, 0);
 syslog(LOG_NOTICE, "kill %d: %s\n", me,
 n? strerror(errno): "still running");
 closelog();
 do_open();
 }
 }
 }
 syslog(LOG_NOTICE, "exiting (read %'ld bytes)\n", total_read);
 closelog();
 return 0;
}

Question 17

Sorry for the first edition of this post. I fixed it today.

Question 18

Run this enough times and you might encounter a case where the third one started before the first.

Question 19

Well... As an exercise, run ./a.out w5 s1 w5 s1 w5 on the command line, and then try ./a.out w5 s1 w5 s1 w5 | cat. What do you see, and why?

Question 20

The shell seems to consistently start the processes in the order give, after pid's.

Question 21

The program only prints what it read if the output is a pipe, to avoid cluttering the terminal window.

Question 22

You asked about the order and I think that this is an interesting aspect to the matter. In all the shells I've used, this is not random.

Here is a ps -ef piped to the grep command:

$ ps -ef | grep .
...
alexis 37188 55443 0 20:17 pts/4 00:00:00 ps -ef
alexis 37189 55443 0 20:17 pts/4 00:00:00 grep --color=auto .
...

Note: I removed all the other processes from the output since they are of no importance to the matter.

As we can see, there is a ps -ef and a grep --color=auto . in the output. Can you answer your question now?

Yes. The ps command has PID 37,188 and the grep command has PID 37,189. Clearly, they were created left to right and I still need to find a shell that does this the other way around.

Technically, in C, we create pipes with the pipe(2) function which gives us two file descriptors. One is going to be used as the stdout of ps and the other as the stdin of grep. How, it is easy enough to hold on either file descriptor and start ps or grep in any order. However, if you start grep first, it will be stuck until ps starts and then send data through the pipe.

Further, if you look at your system configuration like so:

$ getconf -a | grep PIPE_BUF
PIPE_BUF 4096
_POSIX_PIPE_BUF 4096

you notice those two parameters defining the minimum guaranteed size of the pipe in bytes. Since Linux 2.6, the default size is 64Kb. Also the absolute maximum number of bytes is defined in:

$ cat /proc/sys/fs/pipe-max-size
1048576

and we can see this is 1Mb. Once the pipe is full, the outputter (ps in our first example) blocks until data gets read by the process on the other side of the pipe (grep in our first example).

In other words, since the output of ps is much less than the size of a pipe:

$ ps -ef | wc
 1132 10819 121435

(i.e. about 120Kb of output on my computer at the moment...)

The pipeline won't get blocked at all.

For streamed data over 1Mb, it does get blocked at some point. If grep was not started immediately, it would never start since the write() call in the first command would then be blocked.

So the processes are very quickly started back to back, but for most of the time, they run in parallel (or concurrently if you have a single processor). That is, the ps command will die first. This marks the pipe as "done" (you get the EOF signal when reading data from it) and that's how the next tool knows it is done and it also dies once it processes the last few bytes it received.

Conversely, if a process on the right side of a pipe dies early (before the one of the left is done writing to the pipe), then the process on the left receives the SIGPIPE signal as soon as it tries to write to the pipe. This is done like so to make sure that the pipeline dies quickly if any of the processes within it dies.

Question 23

No, PIPE_BUF is the maximum amount that can be guaranteed to be written atomically which is much less than the pipe size. Since Linux 2.6, the pipe size is by default 64 kB and can be increased up to the maximum as configured in /proc/sys/fs/pipe-max-size which is 1 MB by default.

Question 24

The PID only indicates the sequence of process were created with fork() but it doesn't show the sequence the programs started. Processes are created before they get scheduled. The shell process can easily call fork() several times before anything actually starts, especially on single CPU machines. The order they actually start is then up to kernel scheduling which is not predictable. Indeed after fork, processes then need to call exec() to load the program into memory. The program may be already cached in memory or may require disk IO which further randomises the start sequence.

Question 25

Also there's nothing in POSIX to claim that processes need to be forked in any particular order, so there's no bases to claim that "no shell should do it differently". The fact that few (none?) do is simple happy chance of developer habits. When the shell parses the command it will, most likely, produce a list of commands from left to right and developers will naturally loop through that list in order. But right-to-left parses do exist and there's no reason to assume one will never be used by a shell.

score 99 · Accepted Answer · 2012-04-30 01:37:29Z

Piped commands run concurrently. When you run ps | grep ..., it's the luck of the draw (or a matter of details of the workings of the shell combined with scheduler fine-tuning deep in the bowels of the kernel) as to whether ps or grep starts first, and in any case they continue to execute concurrently.

This is very commonly used to allow the second program to process data as it comes out from the first program, before the first program has completed its operation. For example

grep pattern very-large-file | tr a-z A-Z

begins to display the matching lines in uppercase even before grep has finished traversing the large file.

grep pattern very-large-file | head -n 1

displays the first matching line, and may stop processing well before grep has finished reading its input file.

If you read somewhere that piped programs run in sequence, flee this document. Piped programs run concurrently and always have.

And what's cool about this example is that when head gets the one line it needs, it terminates and when grep notices this, it also terminates without doing a bunch of further work for nothing.
I guess there is some kind of IO buffer concerning the pipe... how do I know it's size in bytes? What do I want to read to learn more about it? :)
@naxa There are two buffers, actually. There's the stdio buffer inside the grep program, and there's a buffer managed by the kernel in the pipe itself. For the latter, see How big is the pipe buffer?
The grep option --line-buffered will keep grep from buffering. Thus, grep --line-buffered First_Line very-big-file | ...` will produce its output at once, not waiting for an output bufferfull.

Stack Exchange Network

In what order do piped commands run?

6 Answers 6

You must log in to answer this question.

Linked

Hot Network Questions

In what order do piped commands run?

6 Answers 6

You must log in to answer this question.

Linked

Related

Hot Network Questions