This is a simple hash table I've written in C. It is a closed-addressing hash table with TABLE_SIZE
number of buckets. The hash
function is extremely simple and only sum up the characters of the key
string. I've also implemented a REPL for it to be able to test it.
I appreciate any type of suggestion to improve this code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLE_SIZE 1024
typedef unsigned int HashResult;
typedef struct _node {
char *key;
void *value;
struct _node* next;
} Node;
typedef struct _hashtable {
Node* cells[TABLE_SIZE];
} HashTable;
HashTable* new_hashtable() {
return (HashTable*)calloc(1, sizeof(HashTable));
}
HashResult hash(char* string) {
HashResult result = 0;
while(*string != '0円')
result += *(string++);
return result % TABLE_SIZE;
}
void set(HashTable* ht, char *key, void *value) {
HashResult hash_result = hash(key);
Node* node = (Node*)malloc(sizeof(Node));
node->key = key;
node->value = value;
node->next = NULL;
if (ht->cells[hash_result] == NULL) {
ht->cells[hash_result] = node;
} else {
Node* root = ht->cells[hash_result];
while (root->next != NULL)
root = root->next;
root->next = node;
}
}
void* get(HashTable* ht, char* key) {
HashResult hash_result = hash(key);
if (ht->cells[hash_result] == NULL)
return NULL;
Node* node = ht->cells[hash_result];
if (!strcmp(key, node->key))
return node->value;
do {
if (!strcmp(key, node->key))
return node->value;
node = node->next;
}
while (node != NULL);
return NULL;
}
void delete(HashTable* ht, char* key) {
void *i = get(ht, key);
if (i == NULL)
return;
HashResult hash_result = hash(key);
Node *current, *prev, *temp;
current = ht->cells[hash_result];
if (!strcmp(current->key, key)) {
free(current);
current = NULL;
ht->cells[hash_result] = NULL;
return;
}
while (current->next != NULL) {
temp = current;
current = current->next;
prev = temp;
if (!strcmp(current->key, key)) {
free(current);
current = NULL;
prev->next = NULL;
return;
}
}
}
int
main(int argc, char* argv[]) {
HashTable* ht = new_hashtable();
while (!feof(stdin)) {
fprintf(stdout, ">>> ");
char input[256];
fgets(input, 256, stdin);
char *command;
command = strtok(input, " ");
if (!strcmp(command, "SET")) {
char *sub = &input[(int)strlen(command) + 1];
char *key = strtok(sub, " ");
sub = &input[(int)strlen(command) + (int)strlen(key) + 2];
char *value = strtok(sub, "\n");
key = strncpy(malloc(strlen(key)), key, strlen(key));
value = strncpy(malloc(strlen(value)), value, strlen(value));
set(ht, key, (void*)value);
} else if (!strcmp(command, "GET")) {
char *sub = &input[(int)strlen(command) + 1];
char *key = strtok(sub, "\n");
char *value = get(ht, key);
fprintf(stdout, "%s\n", (char*)value);
} else if (!strcmp(command, "DEL") || !strcmp(command, "DELETE")) {
char *sub = &input[(int)strlen(command) + 1];
char *key = strtok(sub, "\n");
delete(ht, key);
} else if (input[0] != '\n') {
puts("Invalid command!");
}
}
free(ht);
return 0;
}
1 Answer 1
Unnecessary casts
You cast a lot of value unnecessarily, sometimes even to the type they already have! Casting should only be done if really necessary. Some things that don't need casting:
return (HashTable*)calloc(1, sizeof(HashTable));
Since the function's return type is alreadyHashTable*
, the cast here is not necessary. C allows implicit casts fromvoid*
to another pointer type. The same goes for casts of the return value ofmalloc()
.set(ht, key, (void*)value);
Similarly, you don't need to explicitly cast from another pointer type tovoid*
.sub = &input[(int)strlen(command) + (int)strlen(key) + 2];
There is no need to cast from thesize_t
to anint
here, and in fact this can lead to incorrect truncation of the length if you would have really large strings.fprintf(stdout, "%s\n", (char*)value);
Why the cast at all?value
is already achar*
.
Use strdup()
Use strdup()
to copy strings. While officially it has only been part of C since C23, most compilers have supported this function for decades.
Missing error checking everywhere
There are many things that can fail in your code. For example:
- If you run out of memory,
calloc()
andmalloc()
will returnNULL
. If you don't check for that, your program will crash with a segmentation fault. - There could also be an issue reading from
stdin
, for example if the user redirected a file tostdin
, and there is an error reading that file somehwere halfway. Checking forfeof()
is not enough. - There could also be error writing to
stdout
. Again, this might be important to check for if that is redirected to another file. - Your input might not be a valid command. Consider what happens if
strtok()
cannot find the separator. - An input line might be longer than 256 characters. What happens with the result from
fgets()
? What happens with the next call tofgets()
?
If an error happens, you ideally print an error message to stderr
(you can use helper functions like perror()
or the nicer but less standard err()
), and then exit the program with EXIT_FAILURE
. The error message informs a potential human looking at the output, and the error code will ensure that scripts calling your program can detect that something went wrong.
Error handling is one of the less fun things to do when programming, and you have to do a lot of that if you write in C, but it is important; it ensures your program will not have unexpected behavior when things go wrong.
Hardcoded number of buckets
You have hardcoded the number of buckets to use for the hash table. What if you only need to store a few entries? Then you have wasted memory. What if you need to store a million entires? Then your hash table becomes very slow as it just scanning linked lists all the time.
Only if you know up front how many items you will store in the hash table can you make a good decision about the number of buckets to use. Consider making the number of buckets configurable at runtime, and/or make it so the hash table will dynamically grow to avoid too many items being stored in the same bucket.
Your hash function is of very poor quality
Just summing the ASCII values of the characters in a string is a very bad hash function. Consider that this will put a lot of things in the same bucket. Consider: abc
, cba
, aad
. Using modulo a power of two is also not great.
A real hash table implementation would use a more sophisticated hash function that doesn't suffer from these problems.
Deleting a hash table
At the end of your main()
, you call free(ht)
. However, that only deletes the HashTable
object, but it doesn't free any of the Node
s that were created. You should implement a free_hashtable()
function that ensures all the nodes are deleted before deleting the HashTable
object.