While reading a data structure book, I have implemented a chained hash table. I want to share the code with you and get your opinion about it.
chained_hash_table.h
#ifndef CHAINED_HASH_TABLE_H_INCLUDED
#define CHAINED_HASH_TABLE_H_INCLUDED
#include <stdlib.h>
#include "../linked_list/linked_list.h"
typedef size_t hkey_t;
typedef struct ChainedHashTable_ {
size_t buckets;
List **table;
hkey_t (*hash)(const void *key);
void (*destroy)(void *data);
int (*compare)(const void *fdata, const void *sdata);
}ChainedHashTable;
ChainedHashTable *chtb_init(size_t buckets,
hkey_t (*hash)(const void *key),
void (*destroy)(void *data),
int (*compare)(const void *fdata, const void *sdata));
void chtb_destroy(ChainedHashTable *chtb);
int chtb_insert(ChainedHashTable *chtb, void *data);
int chtb_remove(ChainedHashTable *chtb, void **data);
int chtb_lookup(const ChainedHashTable *chtb, const void *data);
#endif
chained_hash_table.c
#include <string.h>
#include "chained_hash_table.h"
#define chtb_hash_key(chtb, data) chtb->hash(data) % chtb->buckets
ChainedHashTable *chtb_init(size_t buckets,
hkey_t (*hash)(const void *key),
void (*destroy)(void *data),
int (*compare)(const void *fdata, const void *sdata)) {
ChainedHashTable *chtb;
if( (chtb = malloc(sizeof *chtb)) != NULL ) {
chtb->buckets = buckets;
chtb->hash = hash;
chtb->destroy = destroy;
chtb->compare = compare;
if( (chtb->table = malloc(buckets * sizeof *chtb->table)) == NULL) {
free(chtb);
return NULL;
}
int i;
for(i = 0; i < buckets; ++i) {
if( (chtb->table[i] = list_init(destroy, compare)) == NULL) {
for(--i; i >= 0; --i)
free(chtb->table[i]);
free(chtb);
return NULL;
}
}
}
return chtb;
}
void chtb_destroy(ChainedHashTable *chtb) {
size_t i;
for(i = 0; i < chtb->buckets; ++i)
if(chtb->table[i] != NULL)
list_destroy(chtb->table[i]);
free(chtb->table);
free(chtb);
}
int chtb_insert(ChainedHashTable *chtb, void *data) {
size_t bucket = chtb_hash_key(chtb, data);
if(list_find(chtb->table[bucket], data) != NULL)
return 1;
if(list_prepend(chtb->table[bucket], data) != 0)
return -1;
return 0;
}
int chtb_remove(ChainedHashTable *chtb, void **data) {
size_t bucket = chtb_hash_key(chtb, data);
if(chtb->table[bucket] == NULL)
return -1;
return list_remove(chtb->table[bucket], data);
}
int chtb_lookup(const ChainedHashTable *chtb, const void *data) {
size_t bucket = chtb_hash_key(chtb, data);
return list_find(chtb->table[bucket], data) == NULL ? 0 : 1;
}
2 Answers 2
typedef size_t hkey_t;
doesn't seem right. Sincehash
has to be implemented by the client, the client must know exactly whathkey_t
is. It would however make certain sense to usehkey_t
throughout your code.It is very suspicious that
data
argument toremove
has an extra level of indirection, compared toinsert
andlookup
. Is it a typo?I'd recommend to allocate buckets with
calloc
instead ofmalloc
. Then a cleanup loop ofchtb_init
can be replaced with a call tochtb_destroy
, that is stating your intention explicitly.Since the
linked_list.h
code is not posted, it is impossible to reason about the rest of the functions (specifically, return codes). In general, it make sense to return more than just failure code (a pointer to an inserted node, maybe? - cf STL inserts).
The function pointers hash
, destroy
and compare
are repeatedly used in a of couple places. It is time to turn them into typedefs:
typedef hkey_t (*hash_func) (const void *key);
typedef void (*destroy_func) (void *data);
typedef int (*compare_func) (const void *fdata, const void *sdata);
Integer return codes can easily become a source of confusion and error. Take chtb_insert()
for instance. It return 1, -1, and 0, where 0 seems to be the "successful" return. Now suppose someone (even yourself a few months from now) forgets about this and assumes the function is returning a boolean 0|1 value for failure and success, respectively. This person would probably use the function like this:
if (chtb_insert( htbl, ... ))
{
// Yey! Insertion was successful
}
Only to later find out that his/her logic is going to behave the exact opposite way it was expected.
A better interface would be return a bool
true
or false
(Include <stdbool.h>
for that). Or, if you need more specific error handling, then define and enum type for error codes that are more involving than just success and failure. E.g.:
enum ht_error
{
HT_ERROR_DUPLICATE_KEY,
HT_ERROR_OUT_OF_MEMORY,
... etc ...
};
ht_error ht_function( ... );
Prefer to return early instead of nesting. This simplifies logic for the reader. For example, in the first lines of chtb_init()
you have:
if( (chtb = malloc(sizeof *chtb)) != NULL ) {
chtb->buckets = buckets;
chtb->hash = hash;
chtb->destroy = destroy;
chtb->compare = compare;
You can get rid of a level of indentation for the whole function by just doing this:
if ( (chtb = malloc(sizeof *chtb)) == NULL ) {
return NULL;
}
With C99, you can declare variables anywhere. This includes loop counters in for()
loops. This is generally preferred, as it reduces scope. A variable with full function scope is somewhat like a mini-global.
I recommend checking your ChainedHashTable *chtb
pointers with assert
before dereferencing them. A valid hash table pointer is a precondition of every function, so you should enforce that.