Bash argument parser with support for concatenated flags and '=' or ' ' between arguments and values
This is my best attempt so far at a bash script argument parser written without GNU getopt
or bash getopts
the first two functions, usage
and err
can be more or less ignored, but I plan on adding the ability to specify an exit code when calling err
.
Now for the first section of code in the main
function:
shopt -s extglob
args=()
for (( i = 1; i <= "$#"; i++ )); do
arg="${!i}"
case "${arg}" in
-[[:alpha:]?]+([[:alpha:]?]))
for (( j = 1; j < "${#arg}"; j++ )); do
args+=("-${arg:j:1}")
done ;;
-[[:alpha:]]=*|--*=*)
args+=("${arg%%=*}")
args+=("${arg#*=}") ;;
*)
args+=("${arg}") ;;
esac
done
set -- "${args[@]}"
shopt -u extglob
This section de-concatenates flags and separates flags joined to their values with an =
.
as an example, ./script -xYz --test=value
would pop out as ./script -x -Y -z --test value
. Most of this could likely have been combined with the second part, but I think there's at least a little value in being able to access/save the intermediate form, and it made debugging easier. Single letter flags with =
can also be processed, but as I understand it, this isn't an extremely common thing to see anyway. Flags that are already in the correct format, flags that could not be reformatted due to user error (./script -xYz=value
, for instance), and positional parameters would both be passed to the second part without any modification. This hasn't caused any issues yet, but I am considering trying to further differentiate between good/bad input. at the very end, the reformatted args are set
for later use.
Part two of the main
function:
args=()
for (( i = 1; i <= "$#"; i++ )); do
arg="${!i}"
case "${arg}" in
--)
break ;;
-*)
case "${arg}" in
-h|--help|-\?)
usage ;;
-x|-Y|-z) ;;
-t|--type)
arg2="${!i+1}"
if [[ -n "${arg2}" ]] && [[ "${arg2:0:1}" != "-" ]]; then
type="${arg2}"
(( i++ ))
else
err "Invalid option: ${arg} requires an argument"
fi ;;
-*)
err "Invalid option: ${arg}" ;;
esac ;;
*)
args+=("${arg}")
esac
done
set -- "${args[@]}"
This is where flags and their values actually get processed. Flags (and their arguments, if applicable) are checked one at a time, but only unused positional parameters are put back in the array to be set once again. For example ./script -x -Y -z --test value hello world
would pop out as ./script hello world
I did what I could to prevent any special cases slipping through, and to account for as many common formats as possible, but I couldn't find a list of either, so I'd really appreciate advice on both of those issues.
I also have very little experience writing bash scripts, so general bash scripting advice would also be greatly appreciated.
Full code:
#!/bin/bash
usage() {
echo "help me"
exit 0
}
err() {
echo "$*" >&2
exit 1
}
shopt -s extglob
main() {
shopt -s extglob
args=()
for (( i = 1; i <= "$#"; i++ )); do
arg="${!i}"
case "${arg}" in
-[[:alpha:]?]+([[:alpha:]?]))
for (( j = 1; j < "${#arg}"; j++ )); do
args+=("-${arg:j:1}")
done ;;
-[[:alpha:]]=*|--*=*)
args+=("${arg%%=*}")
args+=("${arg#*=}") ;;
*)
args+=("${arg}") ;;
esac
done
set -- "${args[@]}"
shopt -u extglob
echo "0ドル $@"
args=()
for (( i = 1; i <= "$#"; i++ )); do
arg="${!i}"
case "${arg}" in
--)
break ;;
-*)
case "${arg}" in
-h|--help|-\?)
usage ;;
-x|-Y|-z) ;;
-t|--type)
arg2="${!i+1}"
if [[ -n "${arg2}" ]] && [[ "${arg2:0:1}" != "-" ]]; then
type="${arg2}"
(( i++ ))
else
err "Invalid option: ${arg} requires an argument"
fi ;;
-*)
err "Invalid option: ${arg}" ;;
esac ;;
*)
args+=("${arg}")
esac
done
set -- "${args[@]}"
echo "0ドル ${@:-No positional parameters set}"
echo "test: ${test:-Test not set}"
}
shopt -u extglob
main "$@"
1 Answer 1
I think this is pretty nicely written Bash.
I see the parsing happens in two passes:
- Convert the argument list to some sort of canonical form
- Validate the argument list
This is easy to understand and I think it makes sense.
Handling arguments after --
As written, the program ignores all further arguments after --
.
The common practice is to take all arguments after --
verbatim, without further parsing.
For example this behavior makes it possible to use the rm
command to delete a file named -f
if you ever need it. You would do that with rm -- -f
instead of rm -f
(which usually does nothing).
Keeping things "simple"
I'm not a fan of advanced features of Bash. I think they are pushing the limits of the language, and a common source of bugs, and code that's difficult to understand.
Look at what extglob
forces you to do:
shopt -s extglob
before the declaration ofmain
and cleaning up withshopt -u extglob
after it, so thatmain
can be parsed- Then inside
main
, againshopt -s extglob
before you need it, and cleaning up withshopt -u extglob
when you no longer need it
I find this double activation / deactivation dirty.
If you gotta use it, you gotta use it. If I have a chance to do without it, I would. And here I see an opportunity. By reorganizing the conditions, you could achieve something similar:
args=()
for (( i = 1; i <= $#; i++ )); do
arg="${!i}"
case "${arg}" in
-[[:alpha:]]=*|--*=*)
args+=("${arg%%=*}")
args+=("${arg#*=}") ;;
--)
for (( j = i; j <= $#; j++ )); do
args+=("${!j}")
done
break ;;
--*)
args+=("${arg}") ;;
-*)
for (( j = 1; j < "${#arg}"; j++ )); do
args+=("-${arg:j:1}")
done ;;
*)
args+=("${arg}") ;;
esac
done
set -- "${args[@]}"
The difference from your original is that -[[:alpha:]?]+([[:alpha:]?])
is replaced with simply -*
. To put it simply, I think the practical implication is that an argument like -c9
would be converted to -c -9
instead of keeping it as -c9
.
I don't know if this would be acceptable to you. If yes, then you could get rid of all the shopt
, and I think that would be a good thing.
shellcheck
As you are new to Bash, it's probably good to point out shellcheck.net (also available as a command line tool), a nice tool to check Bash code against common mistakes and bad practices.
It finds just a minor issue about echo "0ドル $@"
, where the recommended usage would be echo "0ドル $*"
.
-
1\$\begingroup\$ so instead of just ending the parse loop, everything after
--
should be just be passed without modification to be used as a positional parameter? That makes sense, and explains whyset --
behaves the way it does. That should also let me remove the nested case in part 2. are there any other special cases (like--
) I may have missed? \$\endgroup\$Kestrel_– Kestrel_2021年10月29日 19:04:25 +00:00Commented Oct 29, 2021 at 19:04