I struggle to understand the effects of the following command:
yes | tee hello | head
On my laptop, the number of lines in 'hello' is of the order of 36000, much higher than the 10 lines displayed on standard output.
My questions are:
When does
yes
, and, more generally, a command in a pipe, stop?Why is there a mismatch between the two numbers above. Is it because
tee
does not pass the lines one by one to the next command in the pipe?
2 Answers 2
:> yes | strace tee output | head
[...]
read(0, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
write(3, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
read(0, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=5202, si_uid=1000} ---
+++ killed by SIGPIPE +++
From man 2 write
:
EPIPE
fd is connected to a pipe or socket whose reading end is closed. When this happens the writing process will also receive a SIGPIPE signal.
So the processes die right to left. head
exits on its own, tee
gets killed when it tries to write to the pipeline the first time after head
has exited. The same happens with yes
after tee
has died.
tee
can write to the pipeline until the buffers are full. But it can write as much as it likes to a file. It seems that my version of tee
writes the same block to stdout
and the file.
head
has 8K in its (i.e. the kernel's) read buffer. It reads all of it but prints only the first 10 lines because that's its job.
-
1I think the 8 k here is the internal buffer used by
tee
, not the kernel's buffer. If we do something likeyes | strace tee output | (sleep 1; head)
, we'll see thattee
writes more than that to the pipe before blocking on the write, 64 k on my system (that seems to be the pipe buffer size according to the man page). In the non-sleep case, it's just thathead
gets to run immediately, and closes the pipe.ilkkachu– ilkkachu2018年01月14日 15:54:56 +00:00Commented Jan 14, 2018 at 15:54 -
1The stdout of
tee
is unbuffered. There are two sets of buffers downstream of it, a kernel buffer that comprises the pipe and the stdin buffering in the standard library of thehead
process. ikkachu's command line, which I was just about to suggest, demonstrates that the pipe buffer itself can take more than 8KiB. The 8KiB gulp thathead
takes is the GNU C run-time library filling up the internal buffer in the stdin stream. (Run this on a BSD, and you'll see the BSD C RTL using different stdin buffer sizes.)JdeBP– JdeBP2018年01月14日 16:09:40 +00:00Commented Jan 14, 2018 at 16:09
A program which writes to a pipe will receive a SIGPIPE signal when the pipe reader terminates and tee(1) will not terminate as long as its standard input stays open.
The head(1) outputs 10 lines by default.
-
1"tee(1) will not terminate as long as its standard input stays open" – That is not correct; see the
strace
output in my answer. Different versions oftee
may behave differently, though.Hauke Laging– Hauke Laging2018年01月14日 15:33:02 +00:00Commented Jan 14, 2018 at 15:33