I have built a very small program that is a command line utility for generating alphanumeric characters of a certain length (up to a max. length) called randchars
.
$ randchars 12
Prints twelve pseudo-random upper-/lower-case alphanumeric characters to stdout
. The command line options -l
and -u
can be given, if somebody wishes to have only lower- or uppercase alphanumeric characters.
Here is the code, this is my first project while reading K&R. I know that in C99 I do not need to declare the variables on top of the functions, but I somehow grew to like it this way^^
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#define MAX_CHARS 32
int is_lowercase(char c);
int is_uppercase(char c);
int is_digit(char c);
enum OPTS {
MIXED, // default
UPPERCASE, // -u
LOWERCASE, // -l
};
int main(int argc, char *argv[])
{
unsigned i, size, char_ok;
srand(time(0));
char chars[MAX_CHARS + 1];
char rchar, c;
enum OPTS mode;
// get options
if (argc < 2 || 3 < argc)
goto error_argc;
// size is last arg
size = atoi(argv[argc - 1]);
if (size == 0)
goto usage;
if (size > MAX_CHARS)
goto error_size;
mode = MIXED;
if (argc == 3) {
if (strcmp(argv[1], "-l") == 0)
mode = LOWERCASE;
else if (strcmp(argv[1], "-u") == 0)
mode = UPPERCASE;
else
goto error_opt;
}
// do actual computation of random chars
for (i = 0; i < size; i++) {
do {
c = rand() % ('z' + 1);
chars[i] = c;
switch(mode) {
case MIXED:
char_ok = is_digit(c) || is_uppercase(c) || is_lowercase(c);
break;
case LOWERCASE:
char_ok = is_digit(c) || is_lowercase(c);
break;
case UPPERCASE:
char_ok = is_digit(c) || is_uppercase(c);
break;
}
} while (!char_ok); // repeat if char is not ok
}
chars[size] = '0円';
printf("%s\n", chars);
return 0;
error_size:
printf("error: maximum sequence length: %u\n", MAX_CHARS);
goto usage;
error_opt:
printf("error: invalid option: %s\n", argv[1]);
goto usage;
error_argc:
printf("error: invalid amount of arguments: %u\n", argc);
goto usage;
usage:
printf("usage:\n\t%s [ -u | -l ] <n>\n", argv[0]);
return 1;
}
int is_lowercase(char c)
{
if ('a' <= c && c <='z')
return 1;
else
return 0;
}
int is_uppercase(char c)
{
if ('A' <= c && c <='Z')
return 1;
else
return 0;
}
int is_digit(char c)
{
if ('0' <= c && c <='9')
return 1;
else
return 0;
}
2 Answers 2
First of all, this is well written and organized code. Except for one part. A dreadful part. A part, that was and is the cause for many issues, even today.
Don't use goto
Using goto
brings multiple problems. While I won't repeat the statements from the linked question, I can certainly tell that it slowed me down: to understand your code, I had to repeatedly jump between lines 32+ and lines 72+.
Compare this
// size is last arg
size = atoi(argv[argc - 1]);
if (size == 0)
goto usage;
if (size > MAX_CHARS)
goto error_size;
to this:
// size is last arg
size = atoi(argv[argc - 1]);
if (size == 0) {
usage();
return 1;
}
if (size > MAX_CHARS) {
fprintf(stderr, "error: maximum sequence length: %u\n", MAX_CHARS);
usage();
return 1;
}
The latter one does not force me to read other parts of the code, I immediately know what will happen, and I also know all effects around the used variables. goto
does not yield any benefit here.
Don't use rand() % NUM
for random numbers
That's covered by the C FAQ. Instead, use something along
c = '0' + rand() / (RAND_MAX / ('z' - '0' + 1) + 1)
This has the nice effect that all except 13 of the possible c
s are already alphanumeric.
Use stderr
for error messages
For errors, you want to use stderr
instead of stdout
.
And that's all. As I said, the code itself is well-written. I would prefer late declared variables instead of the K&R-style early one, but that's personal preference. For argument parsing, you might want to have a look at getopt
.
-
\$\begingroup\$ Thank you for the detailed review. Maybe another question about
goto
, since this seems to be very controversial. Looking at the second reply in the link that you posted, the use of goto here seems also fine. I have read some of the Linux kernel code and they use goto very often, so I figured for error handling it should be fine. Isn't it? \$\endgroup\$EvenDance– EvenDance2021年12月26日 16:51:01 +00:00Commented Dec 26, 2021 at 16:51 -
\$\begingroup\$ Disagree that
'0' + rand() / (RAND_MAX / ('z' - '0' + 1) + 1)
is much preferable torand() % NUM
. Each has its pros and cons. "return numbers from 0 to N-1) is poor, because the low-order bits of many random number generators are distressingly non-random." does not well apply whenN
is'z'+1
(123) as the results depends on all bits returned byrand()
. Issue deeper than comments can flesh out. \$\endgroup\$chux– chux2021年12月27日 19:05:03 +00:00Commented Dec 27, 2021 at 19:05
Alphanumeric generation
Rather than generate a random value that might be non-alphanumeric and require a re-do, consider forming only acceptable values.
const char *set = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
int count = 26 + 10 + 26;
int offset = 0;
switch (mode) {
case MIXED:
break;
case LOWERCASE:
count = 10 + 26;
offset = 26;
break;
case UPPERCASE:
count = 10 + 26;
break;
}
for (i = 0; i < size; i++) {
chars[size] = set[offset + some_good_rand() % count];
}
chars[size] = '0円';
It is extremely common that
'A'-'Z'
and'a'-'z'
are consecutive. OP's code takes advantage of that. Still C does not require consecutiveness. Example. The above code does not rely on that.rand()
has its strength and weakness. It is loosely specified. Ifrand()
is insufficient, look to Mersenne Twister.
Initial randomization
srand(time(0));
is OK. Yet sometimes, say in a script, the program is called multiple times in the same second. Code needs other distinguishing data to start distinctively.
If getpid()
available, a simple improvement:
srand((unsigned) time(0) ^ getpid());
On select systems, look to even better random sources.
Length limitation
MAX_CHARS 32
seems small. Could use a larger constant, yet any limitation less than INT_MAX
seems unnecessarily small. Perhaps:
Print one character at a time, no limit
Print in groups of
MAX_CHARS
. Form a loop.