I started scripting a 2 weeks ago and wrote a script which checks every sub directory until a certain level for the largest files.
My question would be if I'm using the getopts correctly and/or if I could improve the script in general
I'm using a getopt function for the specify options like:
c
= part of the start directory for the searchl
= the number of output lines -> default value is 10 if option not useds
= for the search string which is used in the grep command -> default value is today's date
I am not sure if I used the getopt correctly and would like to improve the code or completely rewrite it.
#!/usr/bin/bash
LNE=--------------------------------------------------------------------------
NUMBER_RESULT_DEFAULT=10;
#------------------------
DATE=#$(date '+%y%m%d');
INTEGER_CHECK='^[0-9]+$';
CHAR_CHECK='^[a-z]+$';
#------------------------
# [---CodeBlock Input check for valid arguments and their values---]
echo "$LNE";
HelpFunction ()
{
echo -e "\n"
echo -e "--------------------------------------\n"
echo -e "Script for tracking filegrowth on system \n"
echo -e "--------------------------------------\n"
echo -e "-h process option for user manual \n"
echo -e "-c process option for CUSTOMER - 3- Letter token for tenant directory \n"
echo -e "-l process option for setting lengh of result - INTEGER values only \n"
echo -e "-s process option for search string \n"
}
# Failsafe for mandatory option "-c" to specifiy directory starting point
if ! [[ $@ =~ '-c ' ]]; then
echo "#>> Mandatory argument missing. Exit ReturnCode 8";
HelpFunction;
exit 8;
fi
# Set default value for output length if "-l" option not used
if ! [[ $@ =~ '-l ' ]]; then
NUMBER_RESULT="$NUMBER_RESULT_DEFAULT";
echo "#>> Output length set to [ -DEFAULT- ] value [ $NUMBER_RESULT_DEFAULT ]";
echo "$LNE":
fi
if ! [[ $@ =~ '-s ' ]]; then
SEARCH_STRING="$DATE";
echo "#>> No [-s] search string specified - set [-DEFAULT-] to today's date;"
echo "$LNE";
fi
while getopts ":hc:l:s:" opt; do
case ${opt} in
h ) # process option -h for user manual
HelpFunction
exit 0
;;
c ) # process option -c for tenant
TENANT=$OPTARG
! [[ $TENANT =~ $CHAR_CHECK ]] &&
echo "#>> 3-Letter tenant directory missing - Non-Letter character are not allowed" &&
echo "#>> Exiting with ReturnCode: 8" &&
echo $LNE && exit 8;
;;
l ) # process option -l for number of output lines
NUMBER_RESULT=$OPTARG
! [[ $NUMBER_RESULT =~ $INTEGER_CHECK ]] &&
echo "#>> Value for for output length has to be an INTEGER number" &&
echo "#>> Exiting with ReturnCode: 8" &&
echo $LNE && exit 8;
;;
s ) # process option -s for search string
SEARCH_STRING=$OPTARG
;;
\? ) # undefined option used
echo -e "\n"
echo -e "#>> ERROR unknown options used \n" 1>&2
echo -e "use [ 0ドル -h ] for help\n"
exit 8
;;
esac
done
echo "# *** Starting directory set to [ $SHARED_BASE_DIR/$TENANT ]";
echo "# *** Output length set to value [ $NUMBER_RESULT ]";
echo $LNE;
#------------------------
# Get basedirecory including subdir "/data"
ARRAY_DATA_DIR=($(find $SHARED_BASE_DIR/$TENANT -maxdepth 3 -type d -name 'data' 2> /dev/null));
for DATA_DIR in "${ARRAY_DATA_DIR[@]}"; do
# Get all sub directories
ARRAY_HLQ_DIR=($(ls -1 $DATA_DIR/))
for HLQ_DIR in ${ARRAY_HLQ_DIR[@]}; do
if ls $DATA_DIR/$HLQ_DIR/*$SEARCH_STRING* 1> /dev/null 2>1ドル; then
# Get all files in directory and sort by size - output only n number of lines specified as argument
OUTPUT=$(ls -1fsh $DATA_DIR/$HLQ_DIR/*$SEARCH_STRING* 2> /dev/null | sort -rh | head -$NUMBER_RESULT)
# Split ls output get result formatted without whitespace
echo "[ SIZE ],[ --- $HLQ_DIR --- ]" | awk -F "," '{printf("%8s %s\n", 1,ドル2ドル)}'
echo "${OUTPUT[0]}" | awk '{printf("%8s %s\n", 1,ドル2ドル)}'
echo "$LNE"
fi
done
done
-
1\$\begingroup\$ For a start, shellcheck.net points out a number of common beginner errors. Probably try that before asking for human assistance. \$\endgroup\$tripleee– tripleee2022年01月13日 14:40:41 +00:00Commented Jan 13, 2022 at 14:40
1 Answer 1
A few comments about your getopts
usage:
You are duplicating some of what getopts does by checking
$@
before you even use getopts. It's more common (and, IMO, better) to set some defaults for your option variables, then use getopts to allow the user to override the defaults.Do you really want all those warnings about using default values? It's good to warn about actual errors (like not providing a mandatory option where no sensible default is possible), but most users would consider a warning about perfectly normal things like a program using default values to be annoying noise. The defaults can be printed in the
HelpFunction
message.Your script doesn't use any non-option arguments (like filenames), but in scripts that do, you will need to use
shift
to remove the options (andshift
again to remove their OPTARG(s) if any) from the argument list. It wouldn't be any use to do that in this script, but it wouldn't hurt to do it either (and it's good habit / good practice).
And general comments:
Double-quote your variable expansions. You mostly do this, which is good. Sometimes you don't, which is not. See Why does my shell script choke on whitespace or other special characters?
Use
printf
and/or heredocs instead ofecho
, especially when printing variables. See Why is printf better than echo?Don't parse the output of
ls
.ls
's output is for viewing by human eyes, not for consumption by programs. See Why not parsels
(and what to do instead)?.There are two main circumstances where you're using the output of
ls
in your script.Testing if a file exists. Bash can already do this easily for single files with
test
/[ ... ]
or[[ ... ]]
.For wildcards, it's a bit more complicated but you still shouldn't use
ls
. Usefind
instead - that's what it's for. The trouble is thatfind
always returns an exit code of 0 unless an error occurs. We can, however, usefind
andgrep
to check if at least one match was found.For example, don't do this:
if ls $DATA_DIR/$HLQ_DIR/*$SEARCH_STRING* 1> /dev/null 2>1ドル; then
(BTW, there are two bugs on that line: unquoted variables and
2>1ドル
- that's probably a typo for2>&1
)Do this instead:
if find "$DATA_DIR/$HLQ_DIR/" -maxdepth 1 -type f \ -name "*$SEARCH_STRING*" 2>/dev/null | grep -q '.' ; then
If there aren't any errors (e.g. permissions), this shouldn't produce any output, but I'm redirecting stderr to /dev/null in case there are.
BTW,
find
has-regex
and-iregex
(case-insensitive) options to search with regular expressions rather than glob patterns. Also a-regextype
option to set what "dialect" of regular expressions to use.Extracting one or more filenames into a scalar variable or an array. Using
ls
for this is never a good idea. Either use a glob or usefind
. E.g. Instead of usingls
and command substitution like this:OUTPUT=$(ls -1fsh $DATA_DIR/$HLQ_DIR/*$SEARCH_STRING* 2> /dev/null | sort -rh | head -$NUMBER_RESULT)
Use
mapfile
(AKAreadarray
) and process substitution to populate an array called OUTPUT by runningfind
.The following uses GNU find's
-printf
to print each file's size (in an 8-character wide field), another 3 spaces, its pathname, and a NUL. Output fromfind
is then piped into the GNU versionssort
,head
, andnumfmt
, all of which can handle NUL-separated input with the-z
option:mapfile -d '' -t OUTPUT < <( find "$DATA_DIR/$HLQ_DIR/" \ -maxdepth 1 \ -name "*SEARCH_STRING*" \ -printf '%8s %p0円'| sort -z -r -n -k1,1 | head -z -n "$NUMBER_RESULT" | numfmt -z --to=si)
However, see below - this entire block of code (under the
#Get basedirecory including subdir "/data"
comment) can be improved:
(extra newlines added for readability and to avoid horizontal scroll bars on this site)
# Get base directory including subdir "/data"
mapfile -d '' -t ARRAY_DATA_DIR < <(
find "$SHARED_BASE_DIR/$TENANT" \
-maxdepth 3 -type d -name 'data' -print0)
for DATA_DIR in "${ARRAY_DATA_DIR[@]}"; do
# Get all sub directories using GNU find's -printf %f (basename)
mapfile -d '' -t ARRAY_HLQ_DIR < <(
find "$DATA_DIR" -mindepth 1 -maxdepth 1 \
-type d -printf '%f0円')
for HLQ_DIR in "${ARRAY_HLQ_DIR[@]}"; do
# get $NUMBER_RESULT largest files in $HLQ_DIR
mapfile -d '' -t OUTPUT < <(
find "$DATA_DIR/$HLQ_DIR/" \
-maxdepth 1 -type f -name "*$SEARCH_STRING*" \
-printf '%8s %p0円' |
sort -z -r -n -k1,1 |
head -z -n "$NUMBER_RESULT" | numfmt -z --to=si)
# print formatted output with bash's built-in printf
if [ -n "$OUTPUT" ]; then # we have at least one file
printf "%8s %s\n" '[ Size ]' "[ --- $HLQ_DIR --- ]";
printf "%s\n" "${OUTPUT[@]}" "$LNE"
fi
done
done
This example:
- quotes all variable expansions;
- never parses
ls
; - never uses whitespace as a filename separator, always uses NUL. It is reliable and safe to use with any valid filename;
- uses
mapfile
to populate arrays with find's output.-d ''
ensures it also uses NUL as the separator; - uses Bash's built-in
printf
instead of piping to awk.
PS: I just noticed that this question is over 6 months old. Oh well, I've written my answer now and hopefully @JAK will come back to see it. If not, maybe someone else will find something useful in it.