Write a function that a receives file name of a txt
file, which contains unknown number of words of max length 10
each. Print the word with the most occurrences. Assume the input is correct and contains at least one word.
Code
#define MAX_CHARS 11
int mostFrequentWord(char* filename) {
char ch;
int i, j, k, occurrence, counter = 1, maximum = 0;
char *index_max = NULL;
char **arr;
FILE *file = fopen(filename, "r");
while((ch = fgetc(file)) != EOF) {
if(ch == ' ') {
counter++;
}
}
rewind(file);
arr = (char**) calloc(counter, sizeof(char*));
for (k = 0; k < counter; k++) {
arr[k] = (char*) calloc(MAX_CHARS, sizeof(char));
}
while ((ch = fgetc(file)) != EOF) {
if (ch == ' '){
i++;
j = 0;
} else {
arr[i][j] = ch;
j++;
}
}
for (i= 0; i < counter; i++) {
occurrence = 0;
for (j = i; j < counter; j++) {
if (!strcmp(*(arr + i), *(arr + j))){
occurrence++;
}
}
if (occurrence > maximum) {
maximum = occurrence;
index_max = *(arr+i);
}
}
if (index_max != NULL){
printf("The most frequent word is \"%s\" with %d occurrences.\n", index_max, maximum);
}
for ( i = 0; i < counter; i++ ) {
free(arr[i]);
}
free(arr);
return 0;
}
Example
Input (test.txt
):
test one two test three test
Output:
The most frequent word is "test" with 3 occurrences.
One thing I think I'm missing (and not sure what's the way to do it) is to check the success of calling functions that use system resources.
Any review is appreciated.
2 Answers 2
Clearly employ contract requirements
"Write a function that ... contains unknown number of words of max length 10 each." should echo in code someplace a 10
named something like "words of max length". Use "size" for the size of the array. With strings, best to use "length" in the context of the number of characters before the '0円'
.
//#define MAX_CHARS 11
#define WORD_MAX_LENGTH 10
#define WORD_SIZE (WORD_MAX_LENGTH + 1)
...
// calloc(MAX_CHARS, ...
calloc(WORD_SIZE, ...
Use int
with fgetc()
fgetc()
returns values in the unsigned char
range and the negative EOF
. These typical 257 different values will not distinctively fit in an 8-bit char
. With original code, should char
be an unsigned char
, your loop is infinite.
// char ch;
int ch;
...
while((ch = fgetc(file)) != EOF) {
Check return value of fopen()
and calloc()
Should the open fail, detect it.
FILE *file = fopen(filename, "r");
if (file == NULL) {
fprintf(stderr, "Unable to open file <%s>\n", filename);
return 0;
}
Should the allocation of more than 0 bytes returnNULL
,detect it.
arr = (char**) calloc(counter, sizeof(char*));
if (arr == NULL && counter != 0) {
fprintf(stderr, "Out of memory\n");
return 0;
}
Improve *alloc()
style
Cast not needed.
Allocate to the size of the object, not the type. Easier to code right, review and maintain.
// arr = (char**) calloc(counter, sizeof(char*));
arr = calloc(counter, sizeof *arr);
Good use of sentinels in printing strings
// v v
printf("The most frequent word is \"%s\" ...
Lack of bounds checking
Code should ensure i, j
are not to big.
if (i >= counter) Handle_Error_Somehow();
i++;
if (j >= MAX_CHARS) Handle_Error_Somehow();
else {
arr[i][j] = ch;
}
Missing fclose()
fclose(file);
Little bits.
I'd expect size_t counter, i, j, k
for array indexing as size_t
is neither too narrow nor too wide. Be mindful it is an unsigned type.
Style: if (index_max != NULL){
is replaceable with if (index_max) {
. I find the the 2nd clearer.
Because I can't comment chuxs answer I have to use a new answer. Like him I have multiple problems with the call of the calloc-function but I will show them step by step.
1. Introduce variables where you need them
char **arr;
//... many lines ...
arr = (char**) calloc(counter, sizeof(char*));
should be changed to:
char **arr = (char**) calloc(counter, sizeof(char*));
By introducing them right where you need them you clarify the code a lot. A possible reader does not need to go to the top to find out what kind of variable you are assigning a value to.
2. Casting of malloc
/calloc
/realloc
You don't need to cast the result of the functions malloc
, calloc
or realloc
, because you might hide errors. See question for more details.
Better version:
char **arr = calloc(counter, sizeof(char*));
3. Changing sizeof()
parameter
You can change the line to this:
char **arr = calloc(counter, sizeof(*arr));
If you would change the type of arr to a different one the sizeof()
is updated automatically since it is evaluated at compile time.