I wrote the following C program to compare software version strings. Please provide any general feedback you'd like, but with special focus on error handling decisions, user input validation/sanitation, and robustness. My end-goal is for the program to properly handle user input and not have non-graceful crashes, etc... A couple of known issues/concerns I already have (but haven't decided on a solution for yet) is, where should I be handling errors? Should I be printing to stderr
only in the driver (main) program and leave all of that out of the main "library" code? Secondly, I have an enum
definitionn for FAIL
but I never actually use it because when I wrote the switch
clause, I thought it may be a better idea to just default
to error rather than select it explicitly. I'm not sure if this is the best choice. This was compiled with GCC 7.5.0 for Linux 64-bit on my end.
The program is only intended to work with inputs with 4 version numbers concatenated with a .
for now. I also need to build out a better driver which accepts actual user input eventually of course.
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
enum VERSION_RESULT
{
FAIL = -1,
LEFT = 1,
RIGHT = 2,
EQUAL = 3
};
typedef struct _version
{
unsigned int major;
unsigned int minor;
unsigned int build;
unsigned int revision;
} VERSION;
VERSION *version_new(char *str);
int version_parse(VERSION *pVer, char *str);
bool validate_version_string(char *str);
void version_print(VERSION *pVer);
void version_compare_print(VERSION *pL, VERSION *pR);
int main() {
char ver_string[] = {'5','.','2','5','.','5','.','0','0円'}; // CANNOT use read-only memory due to strtok
char ver_string2[] = {'5','.','5','.','5','.','1','0円'};
VERSION *ver = version_new(ver_string);
VERSION *ver2 = version_new(ver_string2);
version_compare_print(ver, ver2);
free(ver);
free(ver2);
ver = NULL;
ver2 = NULL;
return EXIT_SUCCESS;
}
VERSION *version_new(char *str)
{
assert(str);
VERSION *pVer = malloc(sizeof(VERSION));
assert(pVer);
int result = version_parse(pVer, str);
if(result == -1)
{
fprintf(stderr, "Failed to parse version from string.\n");
free(pVer);
return NULL;
}
return pVer;
}
int version_parse(VERSION *pVer, char * str)
{
assert(pVer && str);
if(!validate_version_string(str))
{
fprintf(stderr, "Invalid version #. Format xxx.xxx.xxx.xxx\n");
return -1;
}
char *token;
char *delim = ".";
token = strtok(str, delim);
pVer->major = (unsigned int) atoi(token);
token = strtok(NULL, delim);
pVer->minor = (unsigned int) atoi(token);
token = strtok(NULL, delim);
pVer->build = (unsigned int) atoi(token);
token = strtok(NULL, delim);
pVer->revision = (unsigned int) atoi(token);
}
bool validate_version_string(char *str)
{
assert(str);
unsigned int count = 0;
char *copy;
char tmp;
for(copy = str; *copy != '0円'; ++copy)
{
tmp = *copy;
if(tmp == '.') ++count;
if((tmp > '9' || tmp < '0') && tmp != '.') return false;
}
if(count == 3) return true;
return false;
}
void version_print(VERSION *pVer)
{
assert(pVer);
printf("%u.%u.%u.%u", pVer->major,pVer->minor,pVer->build,pVer->revision);
return;
}
/*
* Compare 2 versions. Return -1 if error, 1 if left is largest, 2 if right is largest, 3 if equal
*/
enum VERSION_RESULT version_compare(VERSION *pL, VERSION *pR)
{
assert(pL && pR);
//Major checks
if(pL->major > pR->major) return LEFT;
if(pR->major > pL->major) return RIGHT;
//Majors are equal
if(pL->minor > pR->minor) return LEFT;
if(pR->minor > pL->minor) return RIGHT;
//Minors are equal
if(pL->build > pR->build) return LEFT;
if(pR->build > pL->build) return RIGHT;
//Builds are equal
if(pL->revision > pR->revision) return LEFT;
if(pR->revision > pL->revision) return RIGHT;
//Revisions are equal
return EQUAL;
}
void version_compare_print(VERSION *pL, VERSION *pR)
{
assert(pL && pR);
switch(version_compare(pL, pR))
{
case LEFT:
version_print(pL);
printf(" > ");
version_print(pR);
putchar('\n');
break;
case RIGHT:
version_print(pL);
printf(" < ");
version_print(pR);
putchar('\n');
break;
case EQUAL:
version_print(pL);
printf(" == ");
version_print(pR);
putchar('\n');
break;
default:
fprintf(stderr, "An error occurred in the version compare function.\n");
}
return;
}
4 Answers 4
Don't
strtok
+atoi
. Usestrtol
, which (a) doesn't need a mutable input, (b) has much better error handling and reporting, and (c) eliminates the need for independent validation. An example of use would bechar * end; pVer->major = strtol(str, &end, 0); if (*end != '.') { // major is not a number. return suitable_failure; } str = end + 1; pVer->minor = strtol(str, &end, 0); // etc, and keep going with other pieces of a version. // After revision is computed, *end must be 0
Instead of spelling out the field names, have an array. Then you could wrap the repeated number parsing in a loop.
I don't see the need for a dynamic allocation.
VERSION
is simplestruct
, and c would be happy to return it:VERSION version_new(char * str) { VERSION v; .... return v; } int main() { .... VERSION ver1 = version_new(str); // etc
-
\$\begingroup\$ @vnp to clarify, you mean don't use the struct fields but just use an array? Re: " Instead of spelling out the field names, have an array. Then you could wrap the repeated number parsing in a loop." \$\endgroup\$the_endian– the_endian2020年03月16日 06:06:11 +00:00Commented Mar 16, 2020 at 6:06
-
4\$\begingroup\$ As
major
is unsigned,strtoul()
makes more sense thenstrtol()
here. \$\endgroup\$chux– chux2020年03月16日 12:41:51 +00:00Commented Mar 16, 2020 at 12:41
Instead of spelling out every single element of the array,
char ver_string[] = {'5','.','2','5','.','5','.','0','0円'};
just use a string literal. It's the same thing:
char ver_string[] = "5.25.5.0";
Should I be printing to stderr only in the driver (main) program and leave all of that out of the main "library" code?
Yes, exactly. You can return error codes and set errno
to be more specific.
Secondly, I have an
enum
definitionn forFAIL
but I never actually use it because when I wrote the switch clause, I thought it may be a better idea to just default to error rather than select it explicitly.
That's fine. Anything other than a valid input is an invalid input.
You don't return a value from version_parse
even though it looks like you even inserted a newline where it was supposed to go.
Instead of using LEFT
and RIGHT
, I would use GREATER
and LESS
. Those names convey more meaning to me.
Usually, functions that don't take values have an unnamed void
parameter. I suggest doing that with main
, because it might be undefined behavior if you don't:
int main(void)
{
(I also fixed up the formatting to make it more consistent.)
In validate_version_string
, tmp
and copy
are unnecessary. Just use str
and *str
, like so:
for(; *str != '0円'; ++str)
{
if(*str == '.') ++count;
if((*str > '9' || *str < '0') && *str != '.') return false;
}
You may think that removing tmp
is bad for performance, but it's not; in any nonzero optimization level, these are equivalent.
The validation of the string seems to leave some gaps. For instance, the version numbers seem to be valid whatever size they have. They could be zero digits large, or contain so many digits that they would not fit into an integer.
I'd try not to print to standard error if this is supposed to be used as a library. In that case you might want to use separate header files, c code and test code of course as well.
I'd keep version_compare_print
possibly out of the library; at the very least it should otherwise just populate a string pointer rather than print it to standard out. version_print
makes more sense, but it should also populate a string. Then the application can print it to standard output. Probably best to call it version_sprint
.
Coding remarks
main
I'm not sure if setting:
ver = NULL;
does any good for a local variable, after you've already freed it.
version_new
I'm not sure if this function should be there at all, just let the caller perform the malloc
and call version_parse
. They'll have to dealloc
anyway.
version_parse
unsigned int count = 0;
Count of what? You should make that explicit in the variable name (delimiter_count
seems logical).
char *copy;
This doesn't seem to operate on a copy - just on a copy of the reference; just call it position
or something similar.
return -1;
You're hiding the error code. Either you should probably upgrade to a more application specific code or just return the code you're given. Currently it is possible to read the error from standard error stream, but the calling method will be left in the dark.
version_compare
Compare almost always use a negative value if the leftmost parameter is smaller than the second, zero if they are equal and a positive value if the leftmost parameter is great than the second.
As you don't seem to use FAIL value anyway (and since you use pointers to the already parsed versions), I think it is best to keep to that.
version_compare_print
I'm not sure if this should be a special function; better keep it with main
or at least explain what it does.
There is a lot of repetition in the code. You could just set a variable to "<"
, "=="
or ">"
and then perform the print after the switch.
The final return
statement seems rather useless; nothing is returned and the method is at the end.
-
3\$\begingroup\$ Setting freed variables to null instead of letting them dangle makes it much easier to find use after free errors. It's not that uncommon, although usually someone writes a macro for it sooner than later. \$\endgroup\$Voo– Voo2020年03月16日 10:14:02 +00:00Commented Mar 16, 2020 at 10:14
-
1\$\begingroup\$ It seems to me that for local variables it just adds to the clutter; I can see this spill over from pointers in other locations though. Maybe I'm too used to Java where null pointers are commonly the cause of problems, not part of the solution to problems. \$\endgroup\$Maarten Bodewes– Maarten Bodewes2020年03月16日 10:28:39 +00:00Commented Mar 16, 2020 at 10:28
-
1\$\begingroup\$ No not GCed objects (which yes I guess with resurrection that would be a problem), but .NET got me. I meant the Closeable interfaces (try with resources and so) Say using a file reader after it has been closed. Also with try-with-resources not a problem in general because you limit the scope to its lifetime, but if you can't (say it's a member) setting those to null is a good idea too. \$\endgroup\$Voo– Voo2020年03月16日 10:37:43 +00:00Commented Mar 16, 2020 at 10:37
-
1\$\begingroup\$ Well that's the problem trying to find analogies for constructs in other languages (my point wasn't to speed up GC, but to make it harder for someone to access a closed resource afterwards). The main point is that there's no downside to setting the variable to NULL, while it can help avoid a common programming error (use after free errors are a very common security exploit after all) \$\endgroup\$Voo– Voo2020年03月16日 10:53:34 +00:00Commented Mar 16, 2020 at 10:53
-
1\$\begingroup\$ Maarten Bodewes, "I'm not sure if setting:
ver = NULL;
does any good". This is an OK C idiom. It has marginal cost yet marginal value, so a such if fine to have, fine to not have. Follow group's coding guidelines here. @Voo, In cases wherever
is not used later, it may get optimized out, so provides no improved security. Agree about ease for debugging. \$\endgroup\$chux– chux2020年03月16日 12:50:29 +00:00Commented Mar 16, 2020 at 12:50
A small version compare improvement would perform up to n +1 rather than 2n compares.
Instead of
if(pL->major > pR->major) return LEFT;
if(pR->major > pL->major) return RIGHT;
if(pL->minor > pR->minor) return LEFT;
if(pR->minor > pL->minor) return RIGHT;
...
Compare for equality first:
if(pL->major != pR->major) {
return pL->major > pR->major ? LEFT : RIGHT;
}
if(pL->minor != pR->minor) {
return pL->minor > pR->minor ? LEFT : RIGHT;
}
...
Yet since such code is not likely to be in a critical path, coding for clarity likely overrides.