Simple hash table written in C with REPL

Question 1

This is a simple hash table I've written in C. It is a closed-addressing hash table with TABLE_SIZE number of buckets. The hash function is extremely simple and only sum up the characters of the key string. I've also implemented a REPL for it to be able to test it.

I appreciate any type of suggestion to improve this code.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLE_SIZE 1024
typedef unsigned int HashResult;
typedef struct _node {
 char *key;
 void *value;
 struct _node* next;
} Node;
 
typedef struct _hashtable {
 Node* cells[TABLE_SIZE];
} HashTable;
HashTable* new_hashtable() {
 return (HashTable*)calloc(1, sizeof(HashTable));
}
HashResult hash(char* string) {
 HashResult result = 0;
 while(*string != '0円')
 result += *(string++);
 return result % TABLE_SIZE;
}
void set(HashTable* ht, char *key, void *value) {
 HashResult hash_result = hash(key);
 Node* node = (Node*)malloc(sizeof(Node));
 node->key = key;
 node->value = value;
 node->next = NULL;
 if (ht->cells[hash_result] == NULL) {
 ht->cells[hash_result] = node;
 } else {
 Node* root = ht->cells[hash_result];
 while (root->next != NULL)
 root = root->next;
 root->next = node;
 }
}
void* get(HashTable* ht, char* key) {
 HashResult hash_result = hash(key);
 if (ht->cells[hash_result] == NULL)
 return NULL;
 Node* node = ht->cells[hash_result];
 if (!strcmp(key, node->key))
 return node->value;
 do {
 if (!strcmp(key, node->key))
 return node->value;
 node = node->next;
 }
 while (node != NULL);
 return NULL;
}
void delete(HashTable* ht, char* key) {
 void *i = get(ht, key);
 if (i == NULL)
 return;
 HashResult hash_result = hash(key);
 Node *current, *prev, *temp;
 current = ht->cells[hash_result];
 if (!strcmp(current->key, key)) {
 free(current);
 current = NULL;
 ht->cells[hash_result] = NULL;
 return;
 }
 while (current->next != NULL) {
 temp = current;
 current = current->next;
 prev = temp;
 if (!strcmp(current->key, key)) {
 free(current);
 current = NULL;
 prev->next = NULL;
 return;
 }
 }
}
int
main(int argc, char* argv[]) {
 HashTable* ht = new_hashtable();
 while (!feof(stdin)) {
 fprintf(stdout, ">>> ");
 char input[256];
 fgets(input, 256, stdin);
 char *command;
 command = strtok(input, " ");
 if (!strcmp(command, "SET")) {
 char *sub = &input[(int)strlen(command) + 1];
 char *key = strtok(sub, " ");
 sub = &input[(int)strlen(command) + (int)strlen(key) + 2];
 char *value = strtok(sub, "\n");
 key = strncpy(malloc(strlen(key)), key, strlen(key));
 value = strncpy(malloc(strlen(value)), value, strlen(value));
 set(ht, key, (void*)value);
 } else if (!strcmp(command, "GET")) {
 char *sub = &input[(int)strlen(command) + 1];
 char *key = strtok(sub, "\n");
 char *value = get(ht, key); 
 fprintf(stdout, "%s\n", (char*)value);
 } else if (!strcmp(command, "DEL") || !strcmp(command, "DELETE")) {
 char *sub = &input[(int)strlen(command) + 1];
 char *key = strtok(sub, "\n");
 delete(ht, key);
 } else if (input[0] != '\n') {
 puts("Invalid command!");
 }
 }
 free(ht);
 return 0;
}

Question 2

Unnecessary casts

You cast a lot of value unnecessarily, sometimes even to the type they already have! Casting should only be done if really necessary. Some things that don't need casting:

return (HashTable*)calloc(1, sizeof(HashTable));
Since the function's return type is already HashTable*, the cast here is not necessary. C allows implicit casts from void* to another pointer type. The same goes for casts of the return value of malloc().
set(ht, key, (void*)value);
Similarly, you don't need to explicitly cast from another pointer type to void*.
sub = &input[(int)strlen(command) + (int)strlen(key) + 2];
There is no need to cast from the size_t to an int here, and in fact this can lead to incorrect truncation of the length if you would have really large strings.
fprintf(stdout, "%s\n", (char*)value);
Why the cast at all? value is already a char*.

Use `strdup()`

Use strdup() to copy strings. While officially it has only been part of C since C23, most compilers have supported this function for decades.

Missing error checking everywhere

There are many things that can fail in your code. For example:

If you run out of memory, calloc() and malloc() will return NULL. If you don't check for that, your program will crash with a segmentation fault.
There could also be an issue reading from stdin, for example if the user redirected a file to stdin, and there is an error reading that file somehwere halfway. Checking for feof() is not enough.
There could also be error writing to stdout. Again, this might be important to check for if that is redirected to another file.
Your input might not be a valid command. Consider what happens if strtok() cannot find the separator.
An input line might be longer than 256 characters. What happens with the result from fgets()? What happens with the next call to fgets()?

If an error happens, you ideally print an error message to stderr (you can use helper functions like perror() or the nicer but less standard err()), and then exit the program with EXIT_FAILURE. The error message informs a potential human looking at the output, and the error code will ensure that scripts calling your program can detect that something went wrong.

Error handling is one of the less fun things to do when programming, and you have to do a lot of that if you write in C, but it is important; it ensures your program will not have unexpected behavior when things go wrong.

Hardcoded number of buckets

You have hardcoded the number of buckets to use for the hash table. What if you only need to store a few entries? Then you have wasted memory. What if you need to store a million entires? Then your hash table becomes very slow as it just scanning linked lists all the time.

Only if you know up front how many items you will store in the hash table can you make a good decision about the number of buckets to use. Consider making the number of buckets configurable at runtime, and/or make it so the hash table will dynamically grow to avoid too many items being stored in the same bucket.

Your hash function is of very poor quality

Just summing the ASCII values of the characters in a string is a very bad hash function. Consider that this will put a lot of things in the same bucket. Consider: abc, cba, aad. Using modulo a power of two is also not great.

A real hash table implementation would use a more sophisticated hash function that doesn't suffer from these problems.

Deleting a hash table

At the end of your main(), you call free(ht). However, that only deletes the HashTable object, but it doesn't free any of the Nodes that were created. You should implement a free_hashtable() function that ensures all the nodes are deleted before deleting the HashTable object.

G. Sliepen G. Sliepen 69k3 gold badges74 silver badges180 bronze badges · Accepted Answer · 2023-11-29 20:21:56Z

Unnecessary casts

You cast a lot of value unnecessarily, sometimes even to the type they already have! Casting should only be done if really necessary. Some things that don't need casting:

return (HashTable*)calloc(1, sizeof(HashTable));
Since the function's return type is already HashTable*, the cast here is not necessary. C allows implicit casts from void* to another pointer type. The same goes for casts of the return value of malloc().
set(ht, key, (void*)value);
Similarly, you don't need to explicitly cast from another pointer type to void*.
sub = &input[(int)strlen(command) + (int)strlen(key) + 2];
There is no need to cast from the size_t to an int here, and in fact this can lead to incorrect truncation of the length if you would have really large strings.
fprintf(stdout, "%s\n", (char*)value);
Why the cast at all? value is already a char*.

Use `strdup()`

Use strdup() to copy strings. While officially it has only been part of C since C23, most compilers have supported this function for decades.

Missing error checking everywhere

There are many things that can fail in your code. For example:

If you run out of memory, calloc() and malloc() will return NULL. If you don't check for that, your program will crash with a segmentation fault.
There could also be an issue reading from stdin, for example if the user redirected a file to stdin, and there is an error reading that file somehwere halfway. Checking for feof() is not enough.
There could also be error writing to stdout. Again, this might be important to check for if that is redirected to another file.
Your input might not be a valid command. Consider what happens if strtok() cannot find the separator.
An input line might be longer than 256 characters. What happens with the result from fgets()? What happens with the next call to fgets()?

If an error happens, you ideally print an error message to stderr (you can use helper functions like perror() or the nicer but less standard err()), and then exit the program with EXIT_FAILURE. The error message informs a potential human looking at the output, and the error code will ensure that scripts calling your program can detect that something went wrong.

Error handling is one of the less fun things to do when programming, and you have to do a lot of that if you write in C, but it is important; it ensures your program will not have unexpected behavior when things go wrong.

Hardcoded number of buckets

You have hardcoded the number of buckets to use for the hash table. What if you only need to store a few entries? Then you have wasted memory. What if you need to store a million entires? Then your hash table becomes very slow as it just scanning linked lists all the time.

Only if you know up front how many items you will store in the hash table can you make a good decision about the number of buckets to use. Consider making the number of buckets configurable at runtime, and/or make it so the hash table will dynamically grow to avoid too many items being stored in the same bucket.

Your hash function is of very poor quality

Just summing the ASCII values of the characters in a string is a very bad hash function. Consider that this will put a lot of things in the same bucket. Consider: abc, cba, aad. Using modulo a power of two is also not great.

A real hash table implementation would use a more sophisticated hash function that doesn't suffer from these problems.

Deleting a hash table

At the end of your main(), you call free(ht). However, that only deletes the HashTable object, but it doesn't free any of the Nodes that were created. You should implement a free_hashtable() function that ensures all the nodes are deleted before deleting the HashTable object.

Stack Exchange Network

Simple hash table written in C with REPL

1 Answer 1

Unnecessary casts

Use `strdup()`

Missing error checking everywhere

Hardcoded number of buckets

Your hash function is of very poor quality

Deleting a hash table

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple hash table written in C with REPL

1 Answer 1

Unnecessary casts

Use strdup()

Missing error checking everywhere

Hardcoded number of buckets

Your hash function is of very poor quality

Deleting a hash table

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `strdup()`