This script is running indefinitely as the Linux background process.
I have put an enormous effort to make this POSIX shell script containing an infinite loop shut down tidily along with the operating system (TERM signal) and by me sending it a HUP signal.
Ok, we don't call it exception-handling, but I didn't find any other suitable tag for it. Also did not find an appropriate tag for the script's termination.
We have reviewed two big pieces of the puzzle already, so I cut it out.
readonly script_one_instance_lockfile="${HOME}/.PROGRAM/$(basename "${0}").lock"
# this is long and not relevant to this question, it does as its name says
# you can find a review on it here: https://codereview.stackexchange.com/q/204828/104270
print_error_and_exit() { ... }
is_number()
{
[ "${1}" -eq "${1}" ] 2> /dev/null
}
# purpose is clear, i.e. to clean up some temp, lock files
# which were created during script execution and should not be left in place
cleanup_on_exit()
{
[ -f "${script_one_instance_lockfile}" ] && rm "${script_one_instance_lockfile}"
}
# this function is merely for future expansions of the script
# it might very well be different from cleanup_on_exit
cleanup_on_signal()
{
cleanup_on_exit
}
# here we define a generic function to handle signals
# treat them as errors with appropriate messages
# example calls:
# kill -15 this_script_name # POSIX; all shells compatible
# kill -TERM this_script_name # Bash and alike; newer shells
signal_handler_generic()
# expected arguments:
# 1ドル = signal code
{
# check if exactly one argument has been passed
[ "${#}" -eq 1 ] || print_error_and_exit "signal_handler_generic()" "Exactly one argument has not been passed!\\n\\tPassed: ${*}"
# check if the argument is a number
is_number "${1}" || print_error_and_exit "signal_handler_generic()" "The argument is not a number!\\n\\Signal code expected.\\n\\tPassed: ${1}"
number_argument=${1}
signal_code=${number_argument}
case "${number_argument}" in
1 ) signal_human_friendly='HUP' ;;
2 ) signal_human_friendly='INT' ;;
3 ) signal_human_friendly='QUIT' ;;
6 ) signal_human_friendly='ABRT' ;;
15) signal_human_friendly='TERM' ;;
* ) signal_human_friendly='' ;;
esac
if [ "${signal_human_friendly}" = "" ]
then
print_error_and_exit "signal_handler_generic()" "Given number code (${signal_code}) does not correspond to supported signal codes."
else
# tidy up any temp or lock files created along the way
cleanup_on_signal
# print human friendly and number signal code that has been caught
print_error_and_exit "\\ntrap()" "Caught ${signal_human_friendly} termination signal ${signal_code}.\\n\\tClean-up finished. Exiting. Bye!"
fi
}
# use the above function for signal handling;
# note that the SIG* constants are undefined in POSIX,
# and numbers are to be used for the signals instead
trap 'signal_handler_generic 1' 1
trap 'signal_handler_generic 2' 2
trap 'signal_handler_generic 3' 3
trap 'signal_handler_generic 6' 6
trap 'signal_handler_generic 15' 15
# this is long and not relevant to this question, it does as its name says
# you can find a review on it here: https://codereview.stackexchange.com/q/213156/104270
is_java_program_running() { ... }
####################
### MAIN PROGRAM ###
####################
if [ -f "${script_one_instance_lockfile}" ]
then
# if one instance of this script is already running, quit to shell
print_error_and_exit "\\nmain()" "One instance of this script should already be running.\\n\\tLock file: ${script_one_instance_lockfile}\\n\\tMore than one instance is not allowed. Exiting."
else
# create a .lock file for one instance handling
touch "${script_one_instance_lockfile}"
fi
# keep the PROGRAM alive forever, check in 5 seconds interval
while true
do
sleep 5s
if ! is_java_program_running "${PROGRAM_java_process_identifier}"
then
my_date_time=$(date +%Y-%m-%d_%H:%M:%S)
printf '%s %s\n' "${my_date_time}" "(re-)starting PROGRAM"
( /full/path/to/program > /dev/null 2>&1 & )
fi
done
# ordinary scripts don't have infinite loops
# this code shall be unreachable, but it is good to have it here since I will be
# copying / reusing the script and I would definitely forget on this
cleanup_on_exit
2 Answers 2
Simplify with case
, and eliminate a conditional
Instead of this:
case "${number_argument}" in 1 ) signal_human_friendly='HUP' ;; 2 ) signal_human_friendly='INT' ;; 3 ) signal_human_friendly='QUIT' ;; 6 ) signal_human_friendly='ABRT' ;; 15) signal_human_friendly='TERM' ;; * ) signal_human_friendly='' ;; esac if [ "${signal_human_friendly}" = "" ] then print_error_and_exit "..." else # tidy up ... fi
I suggest to move the print_error_and_exit
in the case
,
and then you can eliminate the conditional that follows,
and "flatten" the code:
case "${number_argument}" in
1 ) signal_human_friendly='HUP' ;;
2 ) signal_human_friendly='INT' ;;
3 ) signal_human_friendly='QUIT' ;;
6 ) signal_human_friendly='ABRT' ;;
15) signal_human_friendly='TERM' ;;
* ) print_error_and_exit "..." ;;
esac
# tidy up ...
Eliminate a low-value conditional statement
Instead of this:
[ -f "${script_one_instance_lockfile}" ] && rm "${script_one_instance_lockfile}"
Why not simply:
rm -f "${script_one_instance_lockfile}"
Sharpen error messages
Instead of "Exactly one argument has not been passed! [...] Passed: ...",
I would write "Expected one argument. Received: ...".
-
\$\begingroup\$
rm -f
does produce an error message forENOENT
, so the test isn't completely useless. \$\endgroup\$Toby Speight– Toby Speight2019年02月25日 18:52:06 +00:00Commented Feb 25, 2019 at 18:52
Scripts intended to be executable should have a shebang:
#!/bin/sh
The repetitive calls to trap
could be replaced by a loop:
for i in 1 2 3 6 15
do trap "signal_handler_generic $i" $i
done
(Note: if using Shellcheck, you may need # shellcheck disable=SC2064
: we must not use single-quotes here, as it would make no sense to defer the expansion of $i
).
With some small changes, we could trap the signals by name (the names are specified in POSIX, contrary to the code comment), and not need the conversion from number in the handler.
Consider letting SIGHUP
mean "reload configuration" as is conventional in daemons.
I think the name script_one_instance_lockfile
is possibly over-long. As it's the only lock file we use, we could call it simply lockfile
and get more of our lines down to a reasonable length.
There's no need to capture the output of date
if the only thing we do with it is to immediately print it:
date '+%Y-%m-%d_%H:%M:%S (re-)starting PROGRAM'
Use test -e
(or [ -e
) to test for existence, where the file type is irrelevant. Also consider using a pidfile, so that after unclean shutdown and recovery, invocation isn't inhibited. BTW, is it intentional that the lockfile is in (per-user) $HOME
, rather than per-machine in some temporary filesystem (e.g. /run
on Linux)?
-
\$\begingroup\$ Thank you Toby for clearing some of the things I was unsure of. I just had to add that comment. \$\endgroup\$Vlastimil Burián– Vlastimil Burián2019年05月10日 03:21:18 +00:00Commented May 10, 2019 at 3:21
kill
that doesn't deal with signal names?man 7 signal
says that POSIX does define those names. \$\endgroup\$