Bash argument parser with support for concatenated flags and '=' or ' ' between arguments and values

Question 1

This is my best attempt so far at a bash script argument parser written without GNU getopt or bash getopts

the first two functions, usage and err can be more or less ignored, but I plan on adding the ability to specify an exit code when calling err.

Now for the first section of code in the main function:

shopt -s extglob
args=()
for (( i = 1; i <= "$#"; i++ )); do
 arg="${!i}"
 case "${arg}" in
 -[[:alpha:]?]+([[:alpha:]?]))
 for (( j = 1; j < "${#arg}"; j++ )); do
 args+=("-${arg:j:1}")
 done ;;
 -[[:alpha:]]=*|--*=*)
 args+=("${arg%%=*}")
 args+=("${arg#*=}") ;;
 *)
 args+=("${arg}") ;;
 esac
done
set -- "${args[@]}"
shopt -u extglob

This section de-concatenates flags and separates flags joined to their values with an =. as an example, ./script -xYz --test=value would pop out as ./script -x -Y -z --test value. Most of this could likely have been combined with the second part, but I think there's at least a little value in being able to access/save the intermediate form, and it made debugging easier. Single letter flags with = can also be processed, but as I understand it, this isn't an extremely common thing to see anyway. Flags that are already in the correct format, flags that could not be reformatted due to user error (./script -xYz=value, for instance), and positional parameters would both be passed to the second part without any modification. This hasn't caused any issues yet, but I am considering trying to further differentiate between good/bad input. at the very end, the reformatted args are set for later use.

Part two of the main function:

 args=()
 for (( i = 1; i <= "$#"; i++ )); do
 arg="${!i}"
 case "${arg}" in
 --)
 break ;;
 -*)
 case "${arg}" in
 -h|--help|-\?)
 usage ;;
 -x|-Y|-z) ;;
 -t|--type)
 arg2="${!i+1}"
 if [[ -n "${arg2}" ]] && [[ "${arg2:0:1}" != "-" ]]; then
 type="${arg2}"
 (( i++ ))
 else
 err "Invalid option: ${arg} requires an argument"
 fi ;;
 -*)
 err "Invalid option: ${arg}" ;;
 esac ;;
 *)
 args+=("${arg}")
 esac
 done
 set -- "${args[@]}"

This is where flags and their values actually get processed. Flags (and their arguments, if applicable) are checked one at a time, but only unused positional parameters are put back in the array to be set once again. For example ./script -x -Y -z --test value hello world would pop out as ./script hello world

I did what I could to prevent any special cases slipping through, and to account for as many common formats as possible, but I couldn't find a list of either, so I'd really appreciate advice on both of those issues.

I also have very little experience writing bash scripts, so general bash scripting advice would also be greatly appreciated.

Full code:

#!/bin/bash
usage() {
 echo "help me"
 exit 0
}
err() {
 echo "$*" >&2
 exit 1
}
shopt -s extglob
main() {
 shopt -s extglob
 args=()
 for (( i = 1; i <= "$#"; i++ )); do
 arg="${!i}"
 case "${arg}" in
 -[[:alpha:]?]+([[:alpha:]?]))
 for (( j = 1; j < "${#arg}"; j++ )); do
 args+=("-${arg:j:1}")
 done ;;
 -[[:alpha:]]=*|--*=*)
 args+=("${arg%%=*}")
 args+=("${arg#*=}") ;;
 *)
 args+=("${arg}") ;;
 esac
 done
 set -- "${args[@]}"
 shopt -u extglob
 echo "0ドル $@"
 args=()
 for (( i = 1; i <= "$#"; i++ )); do
 arg="${!i}"
 case "${arg}" in
 --)
 break ;;
 -*)
 case "${arg}" in
 -h|--help|-\?)
 usage ;;
 -x|-Y|-z) ;;
 -t|--type)
 arg2="${!i+1}"
 if [[ -n "${arg2}" ]] && [[ "${arg2:0:1}" != "-" ]]; then
 type="${arg2}"
 (( i++ ))
 else
 err "Invalid option: ${arg} requires an argument"
 fi ;;
 -*)
 err "Invalid option: ${arg}" ;;
 esac ;;
 *)
 args+=("${arg}")
 esac
 done
 set -- "${args[@]}"
 echo "0ドル ${@:-No positional parameters set}"
 echo "test: ${test:-Test not set}"
}
shopt -u extglob
main "$@"

Question 2

I think this is pretty nicely written Bash.

I see the parsing happens in two passes:

Convert the argument list to some sort of canonical form
Validate the argument list

This is easy to understand and I think it makes sense.

Handling arguments after `--`

As written, the program ignores all further arguments after --.

The common practice is to take all arguments after -- verbatim, without further parsing. For example this behavior makes it possible to use the rm command to delete a file named -f if you ever need it. You would do that with rm -- -f instead of rm -f (which usually does nothing).

Keeping things "simple"

I'm not a fan of advanced features of Bash. I think they are pushing the limits of the language, and a common source of bugs, and code that's difficult to understand.

Look at what extglob forces you to do:

shopt -s extglob before the declaration of main and cleaning up with shopt -u extglob after it, so that main can be parsed
Then inside main, again shopt -s extglob before you need it, and cleaning up with shopt -u extglob when you no longer need it

I find this double activation / deactivation dirty.

If you gotta use it, you gotta use it. If I have a chance to do without it, I would. And here I see an opportunity. By reorganizing the conditions, you could achieve something similar:

args=()
for (( i = 1; i <= $#; i++ )); do
 arg="${!i}"
 case "${arg}" in
 -[[:alpha:]]=*|--*=*)
 args+=("${arg%%=*}")
 args+=("${arg#*=}") ;;
 --)
 for (( j = i; j <= $#; j++ )); do
 args+=("${!j}")
 done
 break ;;
 --*)
 args+=("${arg}") ;;
 -*)
 for (( j = 1; j < "${#arg}"; j++ )); do
 args+=("-${arg:j:1}")
 done ;;
 *)
 args+=("${arg}") ;;
 esac
done
set -- "${args[@]}"

The difference from your original is that -[[:alpha:]?]+([[:alpha:]?]) is replaced with simply -*. To put it simply, I think the practical implication is that an argument like -c9 would be converted to -c -9 instead of keeping it as -c9.

I don't know if this would be acceptable to you. If yes, then you could get rid of all the shopt, and I think that would be a good thing.

shellcheck

As you are new to Bash, it's probably good to point out shellcheck.net (also available as a command line tool), a nice tool to check Bash code against common mistakes and bad practices. It finds just a minor issue about echo "0ドル $@", where the recommended usage would be echo "0ドル $*".

Question 3

so instead of just ending the parse loop, everything after -- should be just be passed without modification to be used as a positional parameter? That makes sense, and explains why set -- behaves the way it does. That should also let me remove the nested case in part 2. are there any other special cases (like --) I may have missed?

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2021-10-29 14:39:59Z

I think this is pretty nicely written Bash.

I see the parsing happens in two passes:

Convert the argument list to some sort of canonical form
Validate the argument list

This is easy to understand and I think it makes sense.

Handling arguments after `--`

As written, the program ignores all further arguments after --.

The common practice is to take all arguments after -- verbatim, without further parsing. For example this behavior makes it possible to use the rm command to delete a file named -f if you ever need it. You would do that with rm -- -f instead of rm -f (which usually does nothing).

Keeping things "simple"

I'm not a fan of advanced features of Bash. I think they are pushing the limits of the language, and a common source of bugs, and code that's difficult to understand.

Look at what extglob forces you to do:

shopt -s extglob before the declaration of main and cleaning up with shopt -u extglob after it, so that main can be parsed
Then inside main, again shopt -s extglob before you need it, and cleaning up with shopt -u extglob when you no longer need it

I find this double activation / deactivation dirty.

If you gotta use it, you gotta use it. If I have a chance to do without it, I would. And here I see an opportunity. By reorganizing the conditions, you could achieve something similar:

args=()
for (( i = 1; i <= $#; i++ )); do
 arg="${!i}"
 case "${arg}" in
 -[[:alpha:]]=*|--*=*)
 args+=("${arg%%=*}")
 args+=("${arg#*=}") ;;
 --)
 for (( j = i; j <= $#; j++ )); do
 args+=("${!j}")
 done
 break ;;
 --*)
 args+=("${arg}") ;;
 -*)
 for (( j = 1; j < "${#arg}"; j++ )); do
 args+=("-${arg:j:1}")
 done ;;
 *)
 args+=("${arg}") ;;
 esac
done
set -- "${args[@]}"

The difference from your original is that -[[:alpha:]?]+([[:alpha:]?]) is replaced with simply -*. To put it simply, I think the practical implication is that an argument like -c9 would be converted to -c -9 instead of keeping it as -c9.

I don't know if this would be acceptable to you. If yes, then you could get rid of all the shopt, and I think that would be a good thing.

shellcheck

As you are new to Bash, it's probably good to point out shellcheck.net (also available as a command line tool), a nice tool to check Bash code against common mistakes and bad practices. It finds just a minor issue about echo "0ドル $@", where the recommended usage would be echo "0ドル $*".

so instead of just ending the parse loop, everything after -- should be just be passed without modification to be used as a positional parameter? That makes sense, and explains why set -- behaves the way it does. That should also let me remove the nested case in part 2. are there any other special cases (like --) I may have missed?

Stack Exchange Network

Bash argument parser with support for concatenated flags and '=' or ' ' between arguments and values

1 Answer 1

Handling arguments after `--`

Keeping things "simple"

shellcheck

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bash argument parser with support for concatenated flags and '=' or ' ' between arguments and values

1 Answer 1

Handling arguments after --

Keeping things "simple"

shellcheck

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Handling arguments after `--`