I know I can do this in Bash:
wc -l <<< "${string_variable}"
Basically, everything I found involved <<<
Bash operator.
But in POSIX shell, <<<
is undefined, and I have been unable to find an alternative approach for hours. I am quite sure there is a simple solution to this, but unfortunately, I didn't find it so far.
4 Answers 4
The simple answer is that wc -l <<< "${string_variable}"
is a ksh/bash/zsh shortcut for printf "%s\n" "${string_variable}" | wc -l
.
There are actually differences in the way <<<
and a pipe work: <<<
creates a temporary file that is passed as input to the command, whereas |
creates a pipe. In bash and pdksh/mksh (but not in ksh93 or zsh), the command on right-hand side of the pipe runs in a subshell. But these differences don't matter in this particular case.
Note that in terms of counting lines, this assumes that the variable is not empty and does not end with a newline. Not ending with a newline is the case when the variable is the result of a command substitution, so you'll get the right result in most cases, but you'll get 1 for the empty string.
There are two differences between var=$(somecommand); wc -l <<<"$var"
and somecommand | wc -l
: using a command substitution and a temporary variable strips away blank lines at the end, forgets whether the last line of output ended in a newline or not (it always does if the command outputs a valid nonempty text file), and overcounts by one if the output is empty. If you want to both preserve the result and count lines, you can do it by appending some known text and stripping it off at the end:
output=$(somecommand; echo .)
line_count=$(($(printf "%s\n" "$output" | wc -l) - 1))
printf "The exact output is:\n%s" "${output%.}"
-
1@Inian Keeping
wc -l
is exactly equivalent to the original:<<<$foo
adds a newline to the value of$foo
(even if$foo
was empty). I explain in my answer why this may not have been what was wanted, but it's what was asked.Gilles 'SO- stop being evil'– Gilles 'SO- stop being evil'2018年11月20日 07:20:56 +00:00Commented Nov 20, 2018 at 7:20
Not conforming to shell built-ins, using external utilities like grep
and awk
with POSIX compliant options,
string_variable="one
two
three
four"
Doing with grep
to match start of lines
printf '%s' "${string_variable}" | grep -c '^'
4
And with awk
printf '%s' "${string_variable}" | awk 'BEGIN { count=0 } NF { count++ } END { print count }'
Note that some of the GNU tools, especially, GNU grep
does not respect POSIXLY_CORRECT=1
option to run the POSIX version of the tool. In grep
the only behavior affected by setting the variable will be the difference in processing of the order of the command line flags. From the documentation (GNU grep
manual), it seems that
POSIXLY_CORRECT
If set, grep behaves as POSIX requires; otherwise,
grep
behaves more like other GNU programs. POSIX requires that options that follow file names must be treated as file names; by default, such options are permuted to the front of the operand list and are treated as options.
-
2Surely
wc -l
is still viable here?Michael Homer– Michael Homer2018年11月20日 07:05:58 +00:00Commented Nov 20, 2018 at 7:05 -
1@MichaelHomer: From what I've observed,
wc -l
needs a proper newline delimited stream (having a trailing '\n` at the end to count properly). One cannot use a simple FIFO to use withprintf
, e.g.printf '%s' "${string_variable}" | wc -l
might not work as expected but<<<
would because of the trailing\n
appended by the herestringInian– Inian2018年11月20日 07:14:34 +00:00Commented Nov 20, 2018 at 7:14 -
1That was what
printf '%s\n'
was doing, before you took it out...Michael Homer– Michael Homer2018年11月20日 07:18:44 +00:00Commented Nov 20, 2018 at 7:18 -
Say... suppose there is an empty last line, that is, use single quotes and press enter after
four
. None of these solutions, including thewc
one, will account for that, right?user1593842– user15938422023年08月11日 21:04:02 +00:00Commented Aug 11, 2023 at 21:04
The here-string <<<
is pretty much a one-line version of the here-document <<
. The former isn't a standard feature, but the latter is. You can use <<
too in this case. These should be equivalent:
wc -l <<< "$somevar"
wc -l << EOF
$somevar
EOF
Though do note that both add an extra newline at the end of $somevar
, e.g. this prints 6
, even though the variable only has five lines :
s=$'foo\n\n\nbar\n\n'
wc -l <<< "$s"
With printf
, you could decide if you want the additional newline or not:
printf "%s\n" "$s" | wc -l # 6
printf "%s" "$s" | wc -l # 5
But then, do note that wc
only counts complete lines (or the number of newline characters in the string). grep -c ^
should also count the final line fragment.
s='foo'
printf "%s" "$s" | wc -l # 0 !
printf "%s" "$s" | grep -c ^ # 1
(Of course you could also count the lines entirely in the shell by using the ${var%...}
expansion to remove them one at a time in a loop...)
In those surprisingly frequent cases where what you actually need to do is process all the non-empty lines inside a variable in some fashion (including counting them), you can set IFS to just a newline and then use the shell's word splitting mechanism to break the non-empty lines apart.
For example, here's a little shell function that totals the non-empty lines inside all supplied arguments:
lines() (
IFS='
'
set -f #disable pathname expansion
set -- $*
echo $#
)
Parentheses, rather than braces, are used here to form the compound command for the function body. This makes the function execute in a subshell so that it doesn't pollute the outside world's IFS variable and pathname expansion setting on every call.
If you want to iterate over non-empty lines you can do it similarly:
IFS='
'
set -f
for line in $lines
do
printf '[%s]\n' $line
done
Manipulating IFS in this way is an often-overlooked technique, also handy for doing things like parsing pathnames that could contain spaces from tab-delimited columnar input. However, you do need to be aware that deliberately removing the space character usually included in IFS's default setting of space-tab-newline can end up disabling word splitting in places where you would normally expect to see it.
For example, if you're using variables to build a complicated command line for something like ffmpeg
, you might want to include -vf scale=$scale
only when variable scale
is set to something non-empty. Normally you could achieve this with ${scale:+-vf scale=$scale}
but if IFS doesn't include its usual space character at the time this parameter expansion is done, the space between -vf
and scale=
won't be used as a word separator and ffmpeg
will be passed all of -vf scale=$scale
as a single argument, which it won't understand.
To fix that, you'd either need to make sure IFS was set more normally before doing the ${scale}
expansion, or do two expansions: ${scale:+-vf} ${scale:+scale=$scale}
. The word splitting that the shell does in the process of initial parsing of command lines, as opposed to the splitting it does during the expansion phase of processing those command lines, doesn't depend on IFS.
Something else that could be worth your while if you're going to do this kind of thing would be creating two global shell variables to hold just a tab and just a newline:
t=' '
n='
'
That way you can just include $t
and $n
in expansions where you need tabs and newlines, rather than littering all your code with quoted whitespace. If you'd rather avoid quoted whitespace altogether in a POSIX shell that has no other mechanism for doing so, printf
can help though you do need a bit of fiddling to work around the removal of trailing newlines in command expansions:
nt=$(printf '\n\t')
n=${nt%?}
t=${nt#?}
Sometimes setting IFS as if it were a per-command environment variable works well. For example, here's a loop that reads a pathname that's allowed to contain spaces and a scaling factor from each line of a tab-delimited input file:
while IFS=$t read -r path scale
do
ffmpeg -i "$path" ${scale:+-vf scale=$scale} "${path%.*}.out.mkv"
done <recode-queue.txt
In this case the read
builtin sees IFS set to just a tab, so it won't split the input line it reads on spaces as well. But IFS=$t set -- $lines
doesn't work: the shell expands $lines
as it build the set
builtin's arguments before executing the command, so the temporary setting of IFS in a way that applies only during the execution of the builtin itself comes too late. This is why the code snippets I've given above all set IFS in a separate step, and why they have to deal with the issue of preserving it.