Bash scripting: When to use variable, when function?

Question 1

basic, innocent question: In bash scripting, why ever using a function, if one can set a variable containing command substitution with the essence of the function - a certain command or set of commands, which is supposed to output a certain value?

In other words: Does it matter, if one defines a variable, or a function for a certain, desired output? When and why to implement it as a variable? When and why it's better to implement it as a function?

Example: Let's say, there is a directory on your system, which contains a lot of sub-directories in 1st level, and you want to find out with a bash script, what's the most recently modified.

In a bash script, you can define a variable rece_dir for it, and print out its content on demand:

#!/bin/bash
# latest-directory-displayer
# 
# Copyleft 🄯 2024
# 
# This program is free software: you can redistribute it and/or modify 
# it under the terms of the GNU Affero General Public License as published by 
# the Free Software Foundation, either version 3 of the License, or 
# (at your option) any later version. 
# 
# This program is distributed in the hope that it will be useful, 
# but WITHOUT ANY WARRANTY; without even the implied warranty of 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 
# GNU Affero General Public License for more details. 
# 
# You should have received a copy of the GNU Affero General Public License 
# along with this program. If not, see <https://www.gnu.org/licenses/>. 
# Displays what's the most recently modified sub-directory within
# current directory.
# Tool variable set 
find="/usr/bin/find"
sort="/usr/bin/sort"
tail="/usr/bin/tail"
grep="/bin/grep"
sed="/bin/sed"
# Variable set 
rece_dir="`"$find" . -maxdepth 1 -type d -printf '%T+ %p\n' | \
 "$sort" | "$tail" -1 | "$grep" -o "/.*" | \
 "$sed" 's/ /\\\&/g;s+$+/+'`"
# Function set 
################################## Main part ################################### 
printf "$rece_dir\n"
exit

Cute. But why not doing it like this?

#!/bin/bash
# latest-directory-displayer
# 
# Copyleft 🄯 2024
# 
# This program is free software: you can redistribute it and/or modify 
# it under the terms of the GNU Affero General Public License as published by 
# the Free Software Foundation, either version 3 of the License, or 
# (at your option) any later version. 
# 
# This program is distributed in the hope that it will be useful, 
# but WITHOUT ANY WARRANTY; without even the implied warranty of 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 
# GNU Affero General Public License for more details. 
# 
# You should have received a copy of the GNU Affero General Public License 
# along with this program. If not, see <https://www.gnu.org/licenses/>. 
# Displays what's the most recently modified sub-directory within
# current directory.
# Tool variable set 
find="/usr/bin/find"
sort="/usr/bin/sort"
tail="/usr/bin/tail"
grep="/bin/grep"
sed="/bin/sed"
# Variable set 
# Function set 
display_latest_dir() {
 "$find" ./ -maxdepth 1 -type d -printf '%T+ %p\n' |
 "$sort" |
 "$tail" -1 |
 "$grep" -o "/.*" |
 "$sed" 's/ /\\&/g;s+$+/+'
}
################################## Main part ################################### 
display_latest_dir
exit

Same output, same basic inner workings, but one over variable, while the other over function.

A minor difference I've spotted was different escape requirements. As soon as you put your set of commands into a variable as command substitution, it probably needs 1x more \ wherever you've escaped with a back slash.

Why not always use variables, instead of functions? Why not always use functions, instead of variables?

Question 2

Please read mywiki.wooledge.org/BashFAQ/050. Also copy/paste your first script into shellcheck.net and fix the issues it tells you about.

Question 3

In the first example, the find is executed when the variable is set, so the contents of the variable are static. This means that if the directory contents change then the variable will be "wrong".

You can test this by

rmdir a b
mkdir a
printf "$rece_dir\n"
mkdir b
printf "$rece_dir\n"

When I first ran this I got two blank lines 'cos my test directory was empty; second time I ran it I got two "b" results.

In the second case the find is executed each time the function is called.

Compare:

rmdir a b
mkdir a
display_latest_dir
mkdir b
display_latest_dir

This time I correctly get "a" and "b" output.

If you only want static output then for a simple command (and your's is sufficiently simple) then I wouldn't use a function. But if there was a lot of work (eg loops) then I might make it into one.

BTW as a matter of coding style you may want to use $(...) instead of the older backtick notation

rece_dir=$($find . -maxdepth 1 -type d -printf '%T+ %p\n' |
 $sort | $tail -1 | $grep -o "/.*" |
 $sed 's/ /\\\&/g;s+$+/+')

And, of course, you can set variable to the output of a function

rece_dir=$(display_latest_dir)

Question 4

Variables are for data, functions are for code. (And see also How can we run a command stored in a variable?)

While you could store the path to some program in a variable to make sure you get the correct instance (and I've done that too), it'd still be cleaner at the use site with a function. You wouldn't need all those quotes and dollar signs.

E.g. compare:

foo=/usr/bin/foo
"$foo" bar whatever

vs.

foo() {
 /usr/bin/foo "$@"
}
foo bar whatever

_{(Though /bin and /usr/bin likely are in $PATH anyway, so that shouldn't be too necessary.)}

The significant difference in your example of rece_dir= vs. display_latest_dir seems to be that in one of the cases, you store the output from the pipeline to a variable, and in another, you let it get printed to the terminal. The choice between those depends on what you want to do. Function or not, there's no need to use a command substitution to store something in a variable just to print it out again. And if you do need it in variable, you can still use a function with the command substitution too.

Both of these work:

findsomething() {
 find ... | blah | blah
}
result=$(findsomething)

and

result=$(find ... | blah | blah)

Usually the difference is that functions can be useful for reuse and splitting code into logical pieces for easier reading.

(Note that filenames can contain newlines, so a pipeline like find -printf "...\n" | ... is not safe in general. And you could probably replace tail | grep | sed with just a single sed.)

score 3 · Accepted Answer · 2024-06-07 18:08:34Z

In the first example, the find is executed when the variable is set, so the contents of the variable are static. This means that if the directory contents change then the variable will be "wrong".

You can test this by

rmdir a b
mkdir a
printf "$rece_dir\n"
mkdir b
printf "$rece_dir\n"

When I first ran this I got two blank lines 'cos my test directory was empty; second time I ran it I got two "b" results.

In the second case the find is executed each time the function is called.

Compare:

rmdir a b
mkdir a
display_latest_dir
mkdir b
display_latest_dir

This time I correctly get "a" and "b" output.

If you only want static output then for a simple command (and your's is sufficiently simple) then I wouldn't use a function. But if there was a lot of work (eg loops) then I might make it into one.

BTW as a matter of coding style you may want to use $(...) instead of the older backtick notation

rece_dir=$($find . -maxdepth 1 -type d -printf '%T+ %p\n' |
 $sort | $tail -1 | $grep -o "/.*" |
 $sed 's/ /\\\&/g;s+$+/+')

And, of course, you can set variable to the output of a function

rece_dir=$(display_latest_dir)

Stack Exchange Network

Bash scripting: When to use variable, when function?

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Bash scripting: When to use variable, when function?

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions