Chained hash table implementation

Question 1

While reading a data structure book, I have implemented a chained hash table. I want to share the code with you and get your opinion about it.

chained_hash_table.h

#ifndef CHAINED_HASH_TABLE_H_INCLUDED
#define CHAINED_HASH_TABLE_H_INCLUDED
#include <stdlib.h>
#include "../linked_list/linked_list.h"
typedef size_t hkey_t;
typedef struct ChainedHashTable_ {
 size_t buckets;
 List **table;
 hkey_t (*hash)(const void *key);
 void (*destroy)(void *data);
 int (*compare)(const void *fdata, const void *sdata);
}ChainedHashTable;
ChainedHashTable *chtb_init(size_t buckets,
 hkey_t (*hash)(const void *key),
 void (*destroy)(void *data),
 int (*compare)(const void *fdata, const void *sdata));
void chtb_destroy(ChainedHashTable *chtb);
int chtb_insert(ChainedHashTable *chtb, void *data);
int chtb_remove(ChainedHashTable *chtb, void **data);
int chtb_lookup(const ChainedHashTable *chtb, const void *data);
#endif

chained_hash_table.c

#include <string.h>
#include "chained_hash_table.h"
#define chtb_hash_key(chtb, data) chtb->hash(data) % chtb->buckets
ChainedHashTable *chtb_init(size_t buckets,
 hkey_t (*hash)(const void *key),
 void (*destroy)(void *data),
 int (*compare)(const void *fdata, const void *sdata)) {
 ChainedHashTable *chtb;
 if( (chtb = malloc(sizeof *chtb)) != NULL ) {
 chtb->buckets = buckets;
 chtb->hash = hash;
 chtb->destroy = destroy;
 chtb->compare = compare;
 if( (chtb->table = malloc(buckets * sizeof *chtb->table)) == NULL) {
 free(chtb);
 return NULL;
 }
 int i;
 for(i = 0; i < buckets; ++i) {
 if( (chtb->table[i] = list_init(destroy, compare)) == NULL) {
 for(--i; i >= 0; --i) 
 free(chtb->table[i]);
 free(chtb);
 return NULL;
 }
 }
 }
 return chtb;
}
void chtb_destroy(ChainedHashTable *chtb) {
 size_t i;
 for(i = 0; i < chtb->buckets; ++i)
 if(chtb->table[i] != NULL)
 list_destroy(chtb->table[i]);
 free(chtb->table);
 free(chtb);
}
int chtb_insert(ChainedHashTable *chtb, void *data) {
 size_t bucket = chtb_hash_key(chtb, data); 
 if(list_find(chtb->table[bucket], data) != NULL)
 return 1;
 if(list_prepend(chtb->table[bucket], data) != 0)
 return -1;
 return 0;
}
int chtb_remove(ChainedHashTable *chtb, void **data) {
 size_t bucket = chtb_hash_key(chtb, data); 
 if(chtb->table[bucket] == NULL)
 return -1;
 return list_remove(chtb->table[bucket], data);
}
int chtb_lookup(const ChainedHashTable *chtb, const void *data) {
 size_t bucket = chtb_hash_key(chtb, data); 
 return list_find(chtb->table[bucket], data) == NULL ? 0 : 1;
}

Question 2

typedef size_t hkey_t; doesn't seem right. Since hash has to be implemented by the client, the client must know exactly what hkey_t is. It would however make certain sense to use hkey_t throughout your code.
It is very suspicious that data argument to remove has an extra level of indirection, compared to insert and lookup. Is it a typo?
I'd recommend to allocate buckets with calloc instead of malloc. Then a cleanup loop of chtb_init can be replaced with a call to chtb_destroy, that is stating your intention explicitly.
Since the linked_list.h code is not posted, it is impossible to reason about the rest of the functions (specifically, return codes). In general, it make sense to return more than just failure code (a pointer to an inserted node, maybe? - cf STL inserts).

Question 3

The function pointers hash, destroy and compare are repeatedly used in a of couple places. It is time to turn them into typedefs:

typedef hkey_t (*hash_func) (const void *key);
typedef void (*destroy_func) (void *data);
typedef int (*compare_func) (const void *fdata, const void *sdata);

Integer return codes can easily become a source of confusion and error. Take chtb_insert() for instance. It return 1, -1, and 0, where 0 seems to be the "successful" return. Now suppose someone (even yourself a few months from now) forgets about this and assumes the function is returning a boolean 0|1 value for failure and success, respectively. This person would probably use the function like this:

if (chtb_insert( htbl, ... ))
{
 // Yey! Insertion was successful
}

Only to later find out that his/her logic is going to behave the exact opposite way it was expected.

A better interface would be return a bool true or false (Include <stdbool.h> for that). Or, if you need more specific error handling, then define and enum type for error codes that are more involving than just success and failure. E.g.:

enum ht_error
{
 HT_ERROR_DUPLICATE_KEY,
 HT_ERROR_OUT_OF_MEMORY,
 ... etc ...
};
ht_error ht_function( ... );

Prefer to return early instead of nesting. This simplifies logic for the reader. For example, in the first lines of chtb_init() you have:

if( (chtb = malloc(sizeof *chtb)) != NULL ) {
 chtb->buckets = buckets;
 chtb->hash = hash;
 chtb->destroy = destroy;
 chtb->compare = compare;

You can get rid of a level of indentation for the whole function by just doing this:

if ( (chtb = malloc(sizeof *chtb)) == NULL ) {
 return NULL;
}

With C99, you can declare variables anywhere. This includes loop counters in for() loops. This is generally preferred, as it reduces scope. A variable with full function scope is somewhat like a mini-global.

I recommend checking your ChainedHashTable *chtb pointers with assert before dereferencing them. A valid hash table pointer is a precondition of every function, so you should enforce that.

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Answer 1 · 2014-10-30 22:07:26Z

typedef size_t hkey_t; doesn't seem right. Since hash has to be implemented by the client, the client must know exactly what hkey_t is. It would however make certain sense to use hkey_t throughout your code.
It is very suspicious that data argument to remove has an extra level of indirection, compared to insert and lookup. Is it a typo?
I'd recommend to allocate buckets with calloc instead of malloc. Then a cleanup loop of chtb_init can be replaced with a call to chtb_destroy, that is stating your intention explicitly.
Since the linked_list.h code is not posted, it is impossible to reason about the rest of the functions (specifically, return codes). In general, it make sense to return more than just failure code (a pointer to an inserted node, maybe? - cf STL inserts).

glampert glampert 17.3k4 gold badges31 silver badges89 bronze badges · Answer 2 · 2014-10-31 01:19:45Z

The function pointers hash, destroy and compare are repeatedly used in a of couple places. It is time to turn them into typedefs:

typedef hkey_t (*hash_func) (const void *key);
typedef void (*destroy_func) (void *data);
typedef int (*compare_func) (const void *fdata, const void *sdata);

Integer return codes can easily become a source of confusion and error. Take chtb_insert() for instance. It return 1, -1, and 0, where 0 seems to be the "successful" return. Now suppose someone (even yourself a few months from now) forgets about this and assumes the function is returning a boolean 0|1 value for failure and success, respectively. This person would probably use the function like this:

if (chtb_insert( htbl, ... ))
{
 // Yey! Insertion was successful
}

Only to later find out that his/her logic is going to behave the exact opposite way it was expected.

A better interface would be return a bool true or false (Include <stdbool.h> for that). Or, if you need more specific error handling, then define and enum type for error codes that are more involving than just success and failure. E.g.:

enum ht_error
{
 HT_ERROR_DUPLICATE_KEY,
 HT_ERROR_OUT_OF_MEMORY,
 ... etc ...
};
ht_error ht_function( ... );

Prefer to return early instead of nesting. This simplifies logic for the reader. For example, in the first lines of chtb_init() you have:

if( (chtb = malloc(sizeof *chtb)) != NULL ) {
 chtb->buckets = buckets;
 chtb->hash = hash;
 chtb->destroy = destroy;
 chtb->compare = compare;

You can get rid of a level of indentation for the whole function by just doing this:

if ( (chtb = malloc(sizeof *chtb)) == NULL ) {
 return NULL;
}

With C99, you can declare variables anywhere. This includes loop counters in for() loops. This is generally preferred, as it reduces scope. A variable with full function scope is somewhat like a mini-global.

I recommend checking your ChainedHashTable *chtb pointers with assert before dereferencing them. A valid hash table pointer is a precondition of every function, so you should enforce that.

Stack Exchange Network

Chained hash table implementation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Chained hash table implementation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions