Explanation
The reason I post this is that I am worried this code is clever, which is usually a bad sign.
The command-line parser considers the following synopsis.
[--help | -h | -? | /?] | [--version] [<command> [[--help | (...)] | <args>]]
The idea of the parser is to match the following behaviour, which is based on git.
If the help option is set, but no command is given, then the help message is printed. If the help option is set and a command is given, then the command's help message is printed. If the version option is set, and the help option unset, then the version message is printed. If neither option is given, the <command>
is executed.
But there is a catch. If a bad option is found, the rest cannot be parsed, since the option may or may not take an argument (e.g. git log --tree green
). But if until then the help option or the version option were given, the program is executed only considering those options. So --help --cheese --version
would print the help option.
Code
There are two files: the header file (args.h
), and the implementation (args.c
). An example (example.c
) is also given.
args.h
#ifndef ARGS_H
#define ARGS_H
enum args_result {
ARGS_RESULT_SUCCESS,
ARGS_RESULT_BAD_OPTION,
ARGS_RESULT_NO_COMMAND,
ARGS_RESULT_NULL_ARG,
ARGS_RESULT_HELP,
ARGS_RESULT_VERSION,
ARGS_RESULT_HELP_COMMAND
};
enum args_result args_parse(int argc, char **argv, int *r);
#endif /* ARGS_H */
args.c
#include "args.h"
#include <string.h>
#include <stddef.h>
enum arg_type {
ARG_TYPE_NON,
ARG_TYPE_COMMAND,
ARG_TYPE_NULL,
ARG_TYPE_UNKNOWN_OPT
};
static int is_option(const char *str)
{
return str[0] == '-';
}
static int is_help_option(const char *str)
{
return strcmp(str, "--help") == 0 ||
strcmp(str, "-h") == 0 ||
strcmp(str, "-?") == 0 ||
strcmp(str, "/?") == 0;
}
enum args_result args_parse(int argc, char **argv, int *r)
{
int i;
int help = 0, ver = 0;
int argI = 0;
enum arg_type argT = ARG_TYPE_NON;
for (i = 1; i < argc; i++) {
if (argv[i] == NULL) {
argI = i;
argT = ARG_TYPE_NULL;
break;
}
if (is_help_option(argv[i])) {
help = 1;
continue;
}
if (strcmp(argv[i], "--version") == 0) {
ver = 1;
continue;
}
argI = i;
if (is_option(argv[i]))
argT = ARG_TYPE_UNKNOWN_OPT;
else
argT = ARG_TYPE_COMMAND;
break;
}
if (argT == ARG_TYPE_COMMAND && !help) {
/* --help after the <command> */
if (argI + 1 == argc);
else if (argv[argI + 1] == NULL);
else if (is_help_option(argv[argI + 1]))
help = 1;
}
/* --help and --version ignoring the correctness of the syntax. */
if (help) {
if (argT != ARG_TYPE_COMMAND)
return ARGS_RESULT_HELP;
*r = argI;
return ARGS_RESULT_HELP_COMMAND;
}
if (ver && argT != ARG_TYPE_COMMAND)
return ARGS_RESULT_VERSION;
if (argT == ARG_TYPE_NON)
return ARGS_RESULT_NO_COMMAND;
if (argT == ARG_TYPE_NULL) {
*r = argI;
return ARGS_RESULT_NULL_ARG;
}
if (argT == ARG_TYPE_UNKNOWN_OPT) {
*r = argI;
return ARGS_RESULT_BAD_OPTION;
}
/* argT = ARG_TYPE_COMMAND */
*r = argI;
return ARGS_RESULT_SUCCESS;
}
example.c
#include <stdio.h>
#include "args.h"
int main(int argc, char **argv)
{
enum args_result r;
int argI;
r = args_parse(argc, argv, &argI);
if (r == ARGS_RESULT_SUCCESS) {
printf("Command = '%s'\n", argv[argI]);
return 0;
}
if (r == ARGS_RESULT_BAD_OPTION) {
fprintf(stderr, "Fatal: Bad option: '%s'\n", argv[argI]);
return 1;
}
if (r == ARGS_RESULT_NO_COMMAND) {
fprintf(stderr, "Fatal: No command given\n");
return 1;
}
if (r == ARGS_RESULT_NULL_ARG) {
fprintf(stderr, "Fatal: Bad arg: NULL\n");
return 1;
}
if (r == ARGS_RESULT_HELP) {
printf(
"Usage:\n"
" %s --help",
argv[0]);
return 0;
}
if (r == ARGS_RESULT_VERSION) {
printf("0.1\n");
return 0;
}
if (r == ARGS_RESULT_HELP_COMMAND) {
printf("Help command = '%s'\n", argv[argI]);
return 0;
}
return 0;
}
1 Answer 1
odd semantics
I don't understand this:
static int is_option(const char *str)
{
return str[0] == '-';
}
That is, if we try is_option("/?")
it reports false,
which seems wrong, if I correctly understood the problem statement.
Anyway, kudos on the naming, so code will be self explanatory.
use conjuncts
I'm not crazy about how you phrased this. It doesn't seem like the best way to communicate a technical idea to a colleague.
if (argI + 1 == argc);
else if (argv[argI + 1] == NULL);
else if (is_help_option(argv[argI + 1]))
help = 1;
Consider phrasing it as if (A && B && C) { help = 1; }
.
If you do keep it this way, at least introduce {
}
braces
to emphasize the empty statements.
As written, I worry that some hapless maintenance engineer
will add a suitably indented help = 1; foo = bar;
and be surprised that foo
is unconditionally assigned.
manifest constant
if (r == ARGS_RESULT_VERSION) {
printf("0.1\n");
Clearly this is correct.
But presumably this code is teaching folks how to use your API,
and they will be doing some copy-n-pasting.
Encourage them to #define APP_VERSION "0.1"
,
by doing that yourself.
Then it will be available for various bits of app code
that may want to implement backward compatibility checks.
Consider working with a
SemVer
triple of integers, (major, minor, patch).
Then range comparisons like 9 < 10
work better,
avoiding the "10" < "9"
surprise.
wrong identifier
enum args_result args_parse(int argc, char **argv, int *r)
The first two args are conventional and are perfectly clear.
But r
is the wrong name to use here,
especially without any explanatory comments.
Find something more descriptive to name it.
Identifiers in a Public API carry a much greater documentation burden.
On the other hand naming a local variable r
is perfectly fine;
the documentation burden is much lower there.
design of Public API
Thank you for including example.c in the submission, that's helpful.
It seems like most callers of the args_parse()
library routine
would need a fair amount of boilerplate code,
resembling what we see in example.c.
Consider adding convenience method(s) for common default processing.
Also consider adding a fancier variant that does arg parsing
and performs some common default processing before returning.
-
\$\begingroup\$ Thank you for the awesome reviewing, J_H! The fact you mostly talked the interface seems like the implementation is readable enough, so that's great. About on "use conjuncts", it is because apparently C does not guarantee the order of evaluations, so
is_help()
could be executed first, and if it isNULL
, the program could crash -- but I could just changeis_help()
, too. About thatis_help("/?") = 0
, it makes sense. I will change that! \$\endgroup\$Schilive– Schilive2024年04月12日 20:18:42 +00:00Commented Apr 12, 2024 at 20:18 -
2\$\begingroup\$ "C does not guarantee the order of evaluations". I don't understand that remark. Section 6.5.13 Logical AND operator of the C99 specification speaks of a sequence point, and explains "If the first operand compares equal to 0, the second operand is not evaluated,", cf section 1.9.18 of the C++ standard. So
return x != 0 && reciprocal(x)
is identical toreturn x != 0
and will never trigger a 1/0 div-by-zero error. I routinely rely on similar short-circuiting idioms in Common Lisp and Python. \$\endgroup\$J_H– J_H2024年04月12日 20:28:39 +00:00Commented Apr 12, 2024 at 20:28 -
1
-
\$\begingroup\$ @Schilive If it helps, the C23 asserts the same too. \$\endgroup\$Madagascar– Madagascar2024年04月12日 21:23:37 +00:00Commented Apr 12, 2024 at 21:23
getopt_long()
to parse their arguments. This is a GNU extension of the POSIXgetopt()
function, but it's supported on many operating systems out of the box. Unless you really want to reinvent the wheel, I recommend you use that instead, and then use a drop-in library for Windows if you also need to support that operating system. \$\endgroup\$getopt_long()
myself or takes someone else's implementation. \$\endgroup\$