One of the K&R exercises is to write a small piece of software to compare two files and print the first line in which they differ.
Here is my attempt:
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#define SET 1
#define UNSET 0
#define BUFFERSIZE 100
void error(char *fmt, ...);
int main(int argc, char *argv[])
{
if (argc != 3)
error("Wrong argument size. Usage: %s FILE1 FILE2", *argv);
char *prog = *argv++;
char *fname1 = *argv++;
char *fname2 = *argv;
FILE *fp1 = fopen(fname1, "r");
FILE *fp2 = fopen(fname2, "r");
if(fp1 == NULL)
error("Failed to open \"%s\"");
if(fp2 == NULL)
error("Failed to open \"%s\"");
char *b1 = malloc(BUFFERSIZE * sizeof(char));
char *b2 = malloc(BUFFERSIZE * sizeof(char));
char EOF_FLAG1 = UNSET;
char EOF_FLAG2 = UNSET;
do {
EOF_FLAG1 = fgets(b1, BUFFERSIZE, fp1) == NULL ? SET : UNSET;
EOF_FLAG2 = fgets(b2, BUFFERSIZE, fp2) == NULL ? SET : UNSET;
} while(strcmp(b1, b2) == 0 && EOF_FLAG1 == UNSET && EOF_FLAG2 == UNSET);
if(strcmp(b1, b2) == 0 && EOF_FLAG1 == EOF_FLAG2)
printf("The contents of \"%s\" and \"%s\" are equal.\n", fname1, fname2);
else {
printf("Difference found: \n\n");
printf("%10s:\t%s\n", fname1, b1);
printf("%10s:\t%s\n", fname2, b2);
}
free(b1);
free(b2);
exit(0);
}
void error(char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
fprintf(stderr, "error: ");
vfprintf(stderr, fmt, ap);
putc('\n', stderr);
va_end(ap);
exit(1);
}
2 Answers 2
Things I like about your program include
- The use of pointer arithmetic to parse your file arguments. I have never seen that done, and I think that is a good use of it.
- I also like how your error function works, It gives you a good way to exit your program in a hurry, consider adding the ability to give warning levels to this.
Your output statements are very clear, and user friendly. However, it is not very unixy, in that you could pipe the output of this to another tool easily. Consider fixing this, just because it violates the rule of silence.
Rule of Silence Developers should design programs so that they do not print unnecessary output. This rule aims to allow other programs and developers to pick out the information they need from a program's output without having to parse verbosity. ~ Wikipedia
Things I dislike:
- You don't close your file handles using
fclose
, this is good practice and should be done. - if statements should always be curly bracketed, as it makes them easier to expand if you need to.
if(){ .... } else { .... }
. You don't necessarily need to wrap conditionals in parenthesis, but I feel that code is made clearer when you wrap each conditional in a compound conditional in parenthesis.
if((COND1) && (COND2)){ ... }
Things you could improve
You could return a code to the operating system to indicate that the files do not match, this would make it more useful, as you could add it to a shell script workflow. See the snippet below.
YourProg f1 f2
if [ $? -eq 0 ]
then
echo File match
else
echo Files do not match
fi
-
\$\begingroup\$ Thank you really much. I really like your "I like/I dislike/things you could improve" approach to review my code. \$\endgroup\$LastSecondsToLive– LastSecondsToLive2015年11月17日 17:52:54 +00:00Commented Nov 17, 2015 at 17:52
Corner case holes
Code uses the contents of
b1
,b2
before knowing they are valid. Whenfgets()
returnsNULL
, the state of the buffer contents are undefined if an I/O error occurred. Best to not inspect the buffer whenNULL
returneddo { EOF_FLAG1 = fgets(b1, BUFFERSIZE, fp1) == NULL ? SET : UNSET; EOF_FLAG2 = fgets(b2, BUFFERSIZE, fp2) == NULL ? SET : UNSET; // Bad - using b1, b2 // } while(strcmp(b1, b2) == 0 && EOF_FLAG1 == UNSET && EOF_FLAG2 == UNSET); } while(EOF_FLAG1 == UNSET && EOF_FLAG2 == UNSET && strcmp(b1, b2) == 0)'
Does code care if the null character
'0円'
was read? If so, do not usefgets()
. Afterfgets()
the end of the buffer is often determine by the first null chacter as implicitly done instrcmp()
. But this fails to find the end if a'0円'
was read. Typically .txt files will not have a null character in them, (unless encoded UTF-16). Code could usefgetc()
, but that does slow things down.
Minor
Avoid magic numbers. Why 10 in
printf("%10s:\t%s\n", fname1, b1);
? Suggestprintf("\"%s\":\t%s\n", fname1, b1);
I favor
pointer = malloc(sizeof *poiner * n)
.* sizeof(char)
serve little purpose as that is always* ((size_t)1)
. By usingsizeof()
first, the product is certainly calculated withsize_t
math. Makes a difference withint row, col; ptr = malloc(sizeof *ptr * row * col)
vs.ptr = malloc(row * col * sizeof *ptr)
// char *b1 = malloc(BUFFERSIZE * sizeof(char)); char *b1 = malloc(sizeof *b1 * BUFFERSIZE);
Pedantic code would inspect
feof()
andferror()
as code really as 3 outcomes: Same, Different, Failed to compare. Shouldferrror()
occur, the comparison is uncertain.I would expect return codes from
main()
to reflect the above 0:Same, 1:Different, Negative values: various open/read/close errors.
-
1\$\begingroup\$ Thank you for your review. The idea with a useful return value is great and I'll definitely stick to it. \$\endgroup\$LastSecondsToLive– LastSecondsToLive2015年11月17日 17:53:30 +00:00Commented Nov 17, 2015 at 17:53