This loads a dictionary text file into memory to be used as part of a spell checker. It's part of a larger program, but I wanted general comments so I can clean it up further.
#define TABLESIZE 500
#define LENGTH 45
bool load(const char* dictionary)
{
//initiate hash table
node* hashtable[TABLESIZE];
//open dictionary and check
FILE* dict = fopen(dictionary, "r");
if (dict == NULL)
{
printf("Could not open file.");
return false;
}
//initiate variable to store current word
char* dword = calloc(LENGTH+ 1,sizeof(char));
//read the file
while(fscanf(dict, "%s", dword) != EOF)
{
//if there is a word, create node and put word in it
node* new_node = malloc(sizeof(node));
strcpy(new_node->word,dword);
//find spot in hash table and put it in that bucket
unsigned int hashkey= 0;
for (int counter = 0; dword[counter]!= '0円'; counter++)
{
hashkey = (hashkey*dword[counter] + dword[counter] + counter)%TABLESIZE;
}
//check if spot in table exists; if not, start the linked list
if (hashtable[hashkey] == NULL)
{
hashtable[hashkey] = new_node;
new_node->next = NULL;
}
//otherwise made current node the first, shift rest over
else
{
new_node->next = hashtable[hashkey];
hashtable[hashkey] = new_node;
}
//count words for later use
nwords++;
}
fclose(dict);
return true;
}
3 Answers 3
I see some problems that you'll probably want to address, and some other general comments that you may or may not want to use.
Provide missing pieces
It's not in what you've posted, but I'm inferring that your node
is defined something like this:
typedef struct node
{
char *word;
node *next;
} node;
Also, your code refers to nwords
which I'm assuming is defined like this:
static unsigned nwords;
Provide required #include
s
What would be useful is for you to provide your own #include
file, say, something like "mydict.h" with the following contents:
#ifndef MYDICT_H
#define MYDICT_H
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
typedef struct node
{
char *word;
struct node *next;
} node;
bool load(const char* dictionary);
#endif
The code implementation would then go into a corresponding mydict.c
file.
Provide a way to actually use the dictionary
This code carefully constructs a dictionary from a file, and then throws the whole thing away because there is no way to access the dictionary after this function has returned! Clearly, to be useful, at least thehashtable
must be made available outside the function.
Check return values and handle errors
The code calls calloc
and malloc
but never checks for error return values. This is a serious problem that must be addressed. Also, fscanf
can return EOF
if there was an error, but that condition is neither checked nor handled. Note too, that even fclose
can fail.
Think about return values
The code is defined as returning a bool
but only actually false
if the file failed to open. However, there are a number of errors that are not handled (and should be), such as running out of memory. It would make more sense to return true
only if there were no errors.
Allocate memory before using it
The code currently includes these three lines:
//if there is a word, create node and put word in it
node* new_node = malloc(sizeof(node));
strcpy(new_node->word,dword);
However, only the new_node
is actually allocated. No memory has been allocated for new_node->word
and so this is doomed to fail! This should be strdup
(if your compiler is Posix compliant) instead as:
new_node->word = strdup(dword);
Don't use raw fscanf
The code that reads in a word is currently this:
while(fscanf(dict, "%s", dword) != EOF)
However, what happens if the word is longer than the allocated space? The result is a classic buffer overrun. You can easily prevent it using the length modifier:
while(fscanf(dict, "%45s", dword) != EOF)
The code should also make sure that the string is properly terminated.
Don't assume variables are magically initialized
If you need a variable initialized in C, you must initialize it yourself. The only exception is code that is declared in static or global scope, and hashtable
is not. That means that code that checks for specific values, such as this line:
if (hashtable[hashkey] == NULL)
is only going to work by chance. If you initialize hashtable
to contain NULL
s when it's created, you can collapse this code:
//check if spot in table exists; if not, start the linked list
if (hashtable[hashkey] == NULL)
{
hashtable[hashkey] = new_node;
new_node->next = NULL;
}
//otherwise made current node the first, shift rest over
else
{
new_node->next = hashtable[hashkey];
hashtable[hashkey] = new_node;
}
to this:
// add new word to hashtable
new_node->next = hashtable[hashkey];
hashtable[hashkey] = new_node;
-
\$\begingroup\$ Thanks Edward. Regarding your comment about allocating memory for new_node->word using strdup, if my global struct is:
typedef struct node { char word[LENGTH + 1]; struct node* next; } node;
Do I still need to allocate sinceLENGTH
is defined as 45? Same question with regard to your comment on using the length modifier in thefscanf
call \$\endgroup\$Andy– Andy2014年07月28日 01:19:03 +00:00Commented Jul 28, 2014 at 1:19 -
1\$\begingroup\$ Because your
node
is achar
array, no, you don't need to allocate it. However, you might find that you save some space if you do sincestrdup
only uses just enough memory to store the string. You do still need to use the length modifier withfscanf
either way. \$\endgroup\$Edward– Edward2014年07月28日 01:35:54 +00:00Commented Jul 28, 2014 at 1:35
- Your constants could be named slightly differently; consider
TABLE_SIZE
andMAX_WORD_LENGTH
. - You could also rename
dictionary
to better indicate that it's a file path and not a dictionary object. - Consider printing error output to
stderr
:fputs(stderr, ...)
orfprintf(stderr, ...)
. - You're using a global
nwords
, but a localhashtable
; this looks to me like a scoping error. You could consider just making a new struct.
Since the function will terminate right away if the file fails to open,
hashtable
doesn't need to be declared before the check. Do it sometime afterwards, preferably the closest point at which it's used for the first time. This will keep it in the lowest scope possible, which is good for maintenance.Instead of using
printf()
to output a simple message, just useputs()
:puts("Could not open file.");
-
\$\begingroup\$ Thanks for the response. Is it preferred to put
hashtable
declaration close to its use and then initialize with{NULL}
or put it as a global up top without the initialization and so I can use the table outside of theload
function? \$\endgroup\$Andy– Andy2014年07月28日 01:23:44 +00:00Commented Jul 28, 2014 at 1:23 -
1\$\begingroup\$ @Andy: The first one is preferred, and definitely not the second. Having global access can just cause more problems. If another function needs it, then have it passed to that function. \$\endgroup\$Jamal– Jamal2014年07月28日 01:25:28 +00:00Commented Jul 28, 2014 at 1:25