Quick & Simple Hash Table Implementation in C

Question 1

First time implementing a hash table.

I resolve collisions using the separate chaining method (closed addressing), i.e with linked lists.
The hash function used is: murmurhash3 (please tell me why this could be a bad choice or why it is a good choice (briefly)).
If you want to investigate it more, or even test it, all dependencies and a test_main.c are ready made in the following Github repo: click here. Litterally just download and press make.

Please be ruthless, relentless and blunt in your review... like seriously embarrass me, humiliate me. And THANK you in advance for your time.

Also if you have any advice, tips, any good habit I should adopt/start doing, please let me know, I am relatively speaking still a beginner !

Oh and is this kind of commenting of the code encouraged or annoying ? Also should I comment in functions ? Some say its bad practice...

Note: I implement my own standard library functions, so just don't worry about that it is intentional, just assume they work perfectly.

Here's the code:

Header: hashtable.h

/* * * * * * * * * * * *
========================
 HASH TABLE HEADER
========================
 * * * * * * * * * * * */
#ifndef FT_HASHTABLE_H
# define FT_HASHTABLE_H
# define HASHCODE(key, buckets) (hash(key, ft_strlen(key)) % buckets)
# define MIN_LOAD_FACTOR 0.0
# define MAX_LOAD_FACTOR 0.7
typedef struct s_entry
{
 char *key;
 void *value;
 struct s_entry *successor;
} t_entry;
typedef struct s_hashtable
{
 unsigned int entries;
 unsigned int num_buckets;
 t_entry **bucket_list;
} t_hashtable;
t_hashtable *hashtable_alloc_table(unsigned int num_entries);
int hashtable_grow_table(t_hashtable **table);
int hashtable_shrink_table(t_hashtable **table);
t_entry *hashtable_fetch_entry(t_hashtable *table, char *key);
int hashtable_insert_entry(t_hashtable **table,
 char *key,
 void *value);
int hashtable_delete_entry(t_hashtable **table, char *key);
int hashtable_rehash_entry(t_hashtable **table_to,
 t_entry **entry);
int hashtable_rehash_table(t_hashtable **table_from,
 t_hashtable **table_to);
int hashtable_destroy_table(t_hashtable **table);
int hashtable_set_appropriate_load_factor(t_hashtable **table);
t_entry *entry_create(char *key, void *value);
void entry_free(t_entry **entry);
void bucket_free(t_entry **head);
#endif

Functions: hashtable.c

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
// ======================================================================= //
// HASH TABLE FUNCTION LIBRARY //
// ======================================================================= //
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
#include <stdlib.h>
#include "hashtable.h"
#include "utils.h"
#include "murmurhash3/murmurhash3.h"
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Creates/allocates an empty hash table of size 'num_entries'
 and then some (inorder to get to the nearest prime
 number).
RETURN VALUES: If successful, returns a pointer to the
 hash table. If an error occurs the function
 will return a NULL pointer.
 — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
t_hashtable *hashtable_alloc_table(unsigned int num_entries)
{
 t_hashtable *table;
 unsigned int i;
 if (num_entries < 1)
 return (NULL);
 if (!(table = malloc(sizeof(t_hashtable))))
 return (NULL);
 num_entries = (unsigned int)ft_find_next_prime(num_entries);
 if (!(table->bucket_list =
 malloc(sizeof(t_entry*) * num_entries)))
 {
 free(table);
 return (NULL);
 }
 table->num_buckets = num_entries;
 table->entries = 0;
 i = 0;
 while (i < num_entries)
 (table->bucket_list)[i++] = NULL;
 return (table);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Inserts a key-value pair into the hash table.
 NOTE: in this implementation ownership of 'value' is
 taken, that is to say that free'ing of 'value' will be
 taken care of, but 'value' MUST be allocated before hand
 somewhere in the code, if you fail to do so, upon free'ing
 of an entry or the hash table you WILL get a 'bad free'
 error.
 As for the 'key', a duplicate of it will be made (i.e
 memory will be allocated inorder to make a duplicate of
 it). The memory will then be free'd.
RETURN VALUES: If successful, returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_insert_entry(t_hashtable **table,
 char *key,
 void *value)
{
 t_entry *entry;
 unsigned int index;
 if (table && *table && key && value)
 {
 if (hashtable_set_appropriate_load_factor(table) == -1)
 return (-1);
 if (!(entry = entry_create(key, value)))
 return (-1);
 index = HASHCODE(key, (*table)->num_buckets);
 entry->successor = ((*table)->bucket_list)[index];
 ((*table)->bucket_list)[index] = entry;
 (*table)->entries += 1;
 return (0);
 }
 return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Finds and returns (retrieves) an entry.
RETURN VALUES: If the entry is found, a pointer to the entry is
 returned; otherwise a NULL pointer is returned.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
t_entry *hashtable_fetch_entry(t_hashtable *table, char *key)
{
 t_entry *cur_entry;
 unsigned int index;
 if (table && key)
 {
 index = HASHCODE(key, table->num_buckets);
 cur_entry = (table->bucket_list)[index];
 while (cur_entry)
 {
 if (ft_strcmp(cur_entry->key, key) == 0)
 return (cur_entry);
 cur_entry = cur_entry->successor;
 }
 }
 return (NULL);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Finds and deletes/frees an entry in the hash
 table.
RETURN VALUES: If the entry is found, and is successfully
 deleted/free'd, the function returns 0;
 otherwise the function returns -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_delete_entry(t_hashtable **table, char *key)
{
 t_entry *prev_entry;
 t_entry *cur_entry;
 unsigned int index;
 if (table && key)
 {
 index = HASHCODE(key, (*table)->num_buckets);
 cur_entry = ((*table)->bucket_list)[index];
 while (cur_entry)
 {
 if (ft_strcmp(cur_entry->key, key) == 0)
 {
 if (cur_entry == ((*table)->bucket_list)[index])
 ((*table)->bucket_list)[index] = cur_entry->successor;
 else
 prev_entry->successor = cur_entry->successor;
 entry_free(&cur_entry);
 (*table)->entries -= 1;
 return (0);
 }
 prev_entry = cur_entry;
 cur_entry = cur_entry->successor;
 }
 }
 return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Deletes/frees the entire hash table
 and all the entries contained in it.
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_destroy_table(t_hashtable **table)
{
 unsigned int i;
 if (table)
 {
 if (*table)
 {
 if ((*table)->bucket_list)
 {
 i = 0;
 while (i < (*table)->num_buckets)
 {
 if (((*table)->bucket_list)[i])
 bucket_free(&((*table)->bucket_list)[i]);
 i++;
 }
 free((*table)->bucket_list);
 (*table)->bucket_list = NULL;
 }
 free(*table);
 (*table) = NULL;
 }
 return (0);
 }
 return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Checks that the current load factor is
 neither greater than nor smaller than
 the desired max load factor and desired
 minimum load factor respectively.
 If either is the case, a procedure to
 realloc (grow) or dealloc (shrink) the
 table will ensue.
 If neither is the case, nothing happens.
RETURN VALUES: If nothing happens, or a successful
 reallocation or deallocation happens,
 0 is returned. If an error occurs -1
 is returned.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_set_appropriate_load_factor(t_hashtable **table)
{
 if (table && *table)
 {
 if ((float)(*table)->entries / (float)(*table)->num_buckets
 > MAX_LOAD_FACTOR)
 {
 if (hashtable_grow_table(table) == -1)
 return (-1);
 return (0);
 }
 if ((float)(*table)->entries / (float)(*table)->num_buckets
 < MIN_LOAD_FACTOR)
 {
 if (hashtable_shrink_table(table) == -1)
 return (-1);
 return (0);
 }
 else
 {
 return (0);
 }
 }
 return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Grows the hash table by a factor of 2 and then some
 (inorder to get to the nearest prime number).
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_grow_table(t_hashtable **table)
{
 t_hashtable *new_table;
 if (table && *table)
 {
 new_table = hashtable_alloc_table((*table)->num_buckets * 2);
 if (new_table == NULL)
 return (-1);
 if (hashtable_rehash_table(table, &new_table) == -1)
 return (-1);
 hashtable_destroy_table(table);
 (*table) = new_table;
 return (0);
 }
 return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Shrinks the hash table by half and then some
 (inorder to get to the nearest prime number).
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_shrink_table(t_hashtable **table)
{
 t_hashtable *new_table;
 if (table && *table)
 {
 if ((*table)->num_buckets > 1)
 {
 new_table = hashtable_alloc_table((*table)->num_buckets / 2);
 if (new_table == NULL)
 return (-1);
 if (hashtable_rehash_table(table, &new_table) == -1)
 return (-1);
 hashtable_destroy_table(table);
 (*table) = new_table;
 return (0);
 }
 return (0);
 }
 return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Rehashs one entry in the 'table_to' hashtable.
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_rehash_entry(t_hashtable **table_to, t_entry **entry)
{
 unsigned int index;
 if (table_to && *table_to && entry && *entry)
 {
 index = HASHCODE((*entry)->key, (*table_to)->num_buckets);
 (*entry)->successor = ((*table_to)->bucket_list)[index];
 ((*table_to)->bucket_list)[index] = (*entry);
 (*table_to)->entries += 1;
 return (0);
 }
 return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Rehashs all the entries in the hashtable
 'table_from' into the hashtable 'table_to'.
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_rehash_table(t_hashtable **table_from,
 t_hashtable **table_to)
{
 t_entry *cur_entry;
 t_entry *temp;
 unsigned int i;
 if (table_from && *table_from && table_to && *table_to)
 {
 i = 0;
 while (i < (*table_from)->num_buckets)
 {
 if (((*table_from)->bucket_list)[i])
 {
 cur_entry = ((*table_from)->bucket_list)[i];
 while (cur_entry)
 {
 temp = cur_entry->successor;
 if (hashtable_rehash_entry(table_to, &cur_entry) == -1)
 return (-1);
 cur_entry = temp;
 }
 ((*table_from)->bucket_list)[i] = NULL;
 }
 i++;
 }
 }
 return (((*table_to)->entries == (*table_from)->entries) ? 0 : -1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Takes a key and a value and creates an entry out
 of them.
 NOTE: 'value' MUST have previously been
 allocated, otherwise in the free'ing of
 an entry, you WILL get a 'bad free' error.
RETURN VALUES: If successful, the function returns a pointer
 to the new entry; if an error occurs, it returns
 a NULL pointer.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
t_entry *entry_create(char *key, void *value)
{
 t_entry *new_entry;
 if (key && value)
 {
 if (!(new_entry = malloc(sizeof(t_entry))))
 return (NULL);
 new_entry->key = ft_strdup(key);
 new_entry->value = value;
 new_entry->successor = NULL;
 return (new_entry);
 }
 return (NULL);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Deletes/frees an entry (this is for use with entries that
 have members that were allocated only).
 NOTE: if 'entry->value' was not allocated somewhere in
 the code, you WILL get a 'bad free' error.
RETURN VALUES: none.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
void entry_free(t_entry **entry)
{
 if (entry && *entry)
 {
 if ((*entry)->key)
 free((*entry)->key);
 if ((*entry)->value)
 free((*entry)->value);
 free(*entry);
 (*entry) = NULL;
 }
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Deletes/frees the entire bucket (linked
 list).
RETURN VALUES: none.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
void bucket_free(t_entry **head)
{
 t_entry *temp;
 if (head)
 {
 while (*head)
 {
 temp = (*head);
 (*head) = (*head)->successor;
 entry_free(&temp);
 }
 }
}

Testing: click here!

Thank you for your time!

Question 2

"Please be ruthless, relentless and blunt in your review... like seriously embarrass me, humiliate me. " Code Reviews usually aren't meant to satisfy a masochist's lust :3

Question 3

@πάνταῥεῖ ῥεῖ ... Lol I didn't mean it that way. I meant like I really am curious to know about any little thing I could improve on, fix. I'm just super eager to improve..

Question 4

It would be easier for someone to give you a thorough review if they were able to run your code. Many of your dependencies are not included in the code you have posted.

Question 5

@MikeBorkland Borkland i gave a link to the github repo, all the dependencies are present, as well as a ready made test_main.c, you can litterally download the directory, then just hit make and you can play around with the test_main.c file. Again here is the github repo

Question 6

Not that easy:

test_main.c:42:9: error: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘void *’ [-Werror=format=]

. But yes, that's a minor problem.

Question 7

Overall, this seems fairly well written. But you said to be relentless and blunt, so here's what I really think! 😉

If you want to investigate it more, or even test it, all dependencies and a test_main.c are ready made in the following Github repo: click here. Litterally just download and press make.

Ugh. This seems to me to straddle the line in the rules for the site:

Be sure to embed the code you want reviewed in the question itself; you can leave supporting, but non-essential, code in links to other sites.

I wouldn't consider missing #includes to be "supporting but non-essential code". If your actual code is separated into a header and a source file, you should post them both separately here rather than inlining the header and omitting the includes.

Style

I have to say that while I am one of these people who likes to line things up to some degree, you've gone overboard with aligning things, in my opinion. For me, it's actually harder to read with your formatting. I don't think that every variable and function declaration throughout the code needs the same spacing. Furthermore, the spacing on functions that go longer than you like makes it look like another function declaration on the next line, but an invalid one. If you're going to wrap the arguments, it's tradition to either indent them one level, or line them up with the function arguments on the previous line.

You should avoid declaring variables until you need them. If a person reading your code wants to know what type a variable is, it's easier to find if it's close to its first use. Likewise, if you want to change a type, it's easier to do if it's near where it's used.

Naming

I think your function names could be shorter. There's a lot of redundant words in their names. For example, hashtable_alloc_table() could just be hashtable_alloc(), and hashtable_insert_entry() could just be hashtable_insert(). (What else would you be inserting into a hash table? A dinner plate?)

Adding t_ to the front of every type is unnecessary. Types are obviously types from the context in which they're used. If you must add something, make it a suffix of _t like every other C programmer so it's consistent. Also, what's the purpose of giving struct names a prefix of s_? Are you ever going to write the type as struct s_whatever instead of t_whatever? If not, just give it the same name, so it's:

typedef struct foo {
 // fields
} foo;

In hashtable_alloc_table(), the argument is called size, but the units aren't clear. My assumption on reading the declaration was that it was going to be in bytes. But it's actually the number of entries to hold. As such, I would name it numEntries rather than size because size is ambiguous.

The difference between hashtable_dealloc_table() and hashtable_destroy_table() is surprising given their names. It seems like hashtable_dealloc_table() should be renamed to hashtable_shrink_table() or something more in line with what it's doing.

The name hashtable_check_load_factor() is also misleading. I wouldn't expect a function which is named "check " to change anything. I would call it something like hashtable_set_appropriate_load_factor() or something like that so that a caller knows that it may change the hash table.

Finally, what does the prefix ft stand for? It's not at all clear from the code you posted. A comment about its meaning somewhere might be appropriate.

Memory Leak in `hashtable_alloc_table()`

There's a memory leak in 'hashtable_alloc_table(). If the table is allocated, but the bucket list isn't, it returns NULL`, but it never frees the table. That memory is now considered in-use by the OS making it unavailable to be re-used.

Redundancy

Why have the caller pass in a pointer to a pointer to hashtable_realloc_table(), and then return a pointer to a hash table? You should do one or the other. A pointer to a pointer allows you to change the value of the pointer the caller uses, so you don't need to also return the new one. You can simply delete the old one and replace it with the new one.

Comments

I think you have too much info in your function comments. Why should I care which functions you call from that function? I shouldn't need to know that info. I shouldn't need to know which headers include the functions that any given function depends on. They should simply be included at the top of the file (which they aren't here).

The problem with comments is that they can get out of date with the code. That has happened with your hashtable_realloc_table() function. It says that it "Grows the hash table by half", but it actually doubles the hash table size. Perhaps it used to only grow it by half, but now it doubles it?

I have no idea what "search tags" are in this context, and it seems unnecessary. I either already know what to search for or I don't. If I search for the function name, I'll find the function definition, so I don't need the tag.

Standard Library

You say:

Note: I implement my own standard library functions

Oh Heck no! The original implementors of the standard library made all kinds of mistakes in how they designed the standard library functions. Your implementations likely have all those same mistakes plus a whole bunch more due to the fact that you're a beginner at this. It's an interesting exercise to learn the language, but you shouldn't use them in real code.

Question 8

First of all, @user1118321 , God bless you, this is a great review! For the first remark, I separated the files, and add the includes, was lazy on my part. For remark #2, the "Style", I did not mention but, I am following (42's) my school's mandatory style, that's why everything is insanely aligned, also why all variables are at the top of functions and separated by 1 line, although I totally see what your saying and it absolutely makes sense; as for the wrapping of function arguments, I changed it to the way you suggested, I actually prefer it, let me know if I did it wrong.

Question 9

Remark #4, "Naming"; your right, my function names are terribly long, I was thinking of just 'ht_' instead of 'hashtable_' and also, what you said makes sense, this is just a preference of mine to be terribly explicity, even tho, yes it is pretty dumb obvious what a hashtable inserts, I did leave the names for this one. Next, the 't_' & 's_' are just my school's mandatory style for structs and typedefs, so yeah, would defently do the '_t' and ommit the 's_', in my own code.

Question 10

Next, the hashtable_alloc_table() variable size is yes pretty damn ambigious, changed it to num_entries, thank you. Next, the hashtable_dealloc_table() makes no sense your right, + not very explicative of what it does different from _destroy_table(), renamed both the hashtable_realloc_table() & hashtable_dealloc_table() to hashtable_grow_table() & hashtable_shrink_table(). Next, the "ft_" is again just (42's) my school's convention to name functions, I should mention that in the post commentary up top.

Question 11

oops, Naming was remark 3, i'll just keep going with it; Remark #5, "Memory Leak", yep that was a mistake on my part, big time, thank you, fixed it.

Question 12

Remark #6: "Comments". Got rid of the dependencies and the search tags & fixed the outdated comment. And yes it's pretty damn easy to forget to update one of the comments. Will take into consideration for the future.

Question 13

Few observations:

Long comments stating mostly obvious but completely ommits for example information about the hashtable_insert_entry takes ownership of the value, but it makes it's own copy of the key. And you've got memory leak in your test program because of it. It's noted for ft_entry_create, but it's little bit hidden.

Making the copy of key is fine, it can be local string, or even some internal static buffer (many standard functions without reentrancy support has it), or constant. None of those should be freed.

Freeing the value poointer, that is the question. For example in the C++ none of the STL containers are releasing raw pointers.
Those comments should be in header file. Usual scenario is you'll get header and precompiled library (if it's not the open source).
Reimplementing wheel? Why do you need the own dup, strlen and so on?
The prime number computation - so many "optimizations" and then you are using that weird condition inside loop with the result. You can bet the int nb never will be more than that constant, therefore you cant get i*i biger than that (well, you can, but it'll never get into the loop, as it fails condition i*i < nb). Not to mention something like INT_MAX would be much better than magically loking 2147483647.
Dependency on HASHCODE macro? But nothing about it's dependency on hash function.

Question 14

Couldn't ask for better, thank you for taking the time. I do want to address your observations. So #3, yeah no i would never actually do that, but '42' the school im in makes us redo those funcs, so no worries about those. Obs #5, i dont consider the hash function part of my code so.. yea left it out. About obs #2, that's interesting i didnt know that thank you, so basically your saying to place that info above every prototype of the functions, in the header file ? Obs #1, i see, so your saying, that i need to mention that piece of info, i didnt think of that, and ur right its pretty important

Question 15

Obs #4: Yeah that was pretty stupid on my part :33 and the 2147483647 was really bad and sloppy, should defently be macro'd. @KIIV

Question 16

For observation #1, i added the missing info & in test_main.c, fixed the memory leak, thanks for that. I'll take note of observation #2 for next time, and for the future. Observation #4, got rid of the if (... || result == 2147483647), and will macro from now on those things and stop being lazy. For observation #5, i added below every HASHCODE() the hash() dep. And on top of all that, a big thanks, again. :D

Question 17

I have a question, you know how my implementation does this: "the hashtable_insert_entry takes ownership of the value, but it makes it's own copy of the key" ; should it be like that or should it be different ? Is that the correct way of doing it, how would you make it ? or what's the convention; from what i've seen, people make duplicates of the key and some make a copy of the value if it is a char *, but obviously if its a void * you can't do that, so they just take ownership of it.. right ?

Question 18

@AymenTM How do I know? I used the valgrind to check memory leaks and it showed to me some lost memory related to the ft_strdup and hashtable_insert_entry, so I take a look to the code and the value is directly assigned into the item, but key is duplicated by another ft_strdup. The taking ownership means your library takes over the responsibility for freeing it.

user1118321 user1118321 11.9k1 gold badge20 silver badges46 bronze badges · Answer 1 · 2018-12-02 01:24:51Z

Overall, this seems fairly well written. But you said to be relentless and blunt, so here's what I really think! 😉

If you want to investigate it more, or even test it, all dependencies and a test_main.c are ready made in the following Github repo: click here. Litterally just download and press make.

Ugh. This seems to me to straddle the line in the rules for the site:

Be sure to embed the code you want reviewed in the question itself; you can leave supporting, but non-essential, code in links to other sites.

I wouldn't consider missing #includes to be "supporting but non-essential code". If your actual code is separated into a header and a source file, you should post them both separately here rather than inlining the header and omitting the includes.

Style

I have to say that while I am one of these people who likes to line things up to some degree, you've gone overboard with aligning things, in my opinion. For me, it's actually harder to read with your formatting. I don't think that every variable and function declaration throughout the code needs the same spacing. Furthermore, the spacing on functions that go longer than you like makes it look like another function declaration on the next line, but an invalid one. If you're going to wrap the arguments, it's tradition to either indent them one level, or line them up with the function arguments on the previous line.

You should avoid declaring variables until you need them. If a person reading your code wants to know what type a variable is, it's easier to find if it's close to its first use. Likewise, if you want to change a type, it's easier to do if it's near where it's used.

Naming

I think your function names could be shorter. There's a lot of redundant words in their names. For example, hashtable_alloc_table() could just be hashtable_alloc(), and hashtable_insert_entry() could just be hashtable_insert(). (What else would you be inserting into a hash table? A dinner plate?)

Adding t_ to the front of every type is unnecessary. Types are obviously types from the context in which they're used. If you must add something, make it a suffix of _t like every other C programmer so it's consistent. Also, what's the purpose of giving struct names a prefix of s_? Are you ever going to write the type as struct s_whatever instead of t_whatever? If not, just give it the same name, so it's:

typedef struct foo {
 // fields
} foo;

In hashtable_alloc_table(), the argument is called size, but the units aren't clear. My assumption on reading the declaration was that it was going to be in bytes. But it's actually the number of entries to hold. As such, I would name it numEntries rather than size because size is ambiguous.

The difference between hashtable_dealloc_table() and hashtable_destroy_table() is surprising given their names. It seems like hashtable_dealloc_table() should be renamed to hashtable_shrink_table() or something more in line with what it's doing.

The name hashtable_check_load_factor() is also misleading. I wouldn't expect a function which is named "check " to change anything. I would call it something like hashtable_set_appropriate_load_factor() or something like that so that a caller knows that it may change the hash table.

Finally, what does the prefix ft stand for? It's not at all clear from the code you posted. A comment about its meaning somewhere might be appropriate.

Memory Leak in `hashtable_alloc_table()`

There's a memory leak in 'hashtable_alloc_table(). If the table is allocated, but the bucket list isn't, it returns NULL`, but it never frees the table. That memory is now considered in-use by the OS making it unavailable to be re-used.

Redundancy

Why have the caller pass in a pointer to a pointer to hashtable_realloc_table(), and then return a pointer to a hash table? You should do one or the other. A pointer to a pointer allows you to change the value of the pointer the caller uses, so you don't need to also return the new one. You can simply delete the old one and replace it with the new one.

Comments

I think you have too much info in your function comments. Why should I care which functions you call from that function? I shouldn't need to know that info. I shouldn't need to know which headers include the functions that any given function depends on. They should simply be included at the top of the file (which they aren't here).

The problem with comments is that they can get out of date with the code. That has happened with your hashtable_realloc_table() function. It says that it "Grows the hash table by half", but it actually doubles the hash table size. Perhaps it used to only grow it by half, but now it doubles it?

I have no idea what "search tags" are in this context, and it seems unnecessary. I either already know what to search for or I don't. If I search for the function name, I'll find the function definition, so I don't need the tag.

Standard Library

You say:

Note: I implement my own standard library functions

Oh Heck no! The original implementors of the standard library made all kinds of mistakes in how they designed the standard library functions. Your implementations likely have all those same mistakes plus a whole bunch more due to the fact that you're a beginner at this. It's an interesting exercise to learn the language, but you shouldn't use them in real code.

First of all, @user1118321 , God bless you, this is a great review! For the first remark, I separated the files, and add the includes, was lazy on my part. For remark #2, the "Style", I did not mention but, I am following (42's) my school's mandatory style, that's why everything is insanely aligned, also why all variables are at the top of functions and separated by 1 line, although I totally see what your saying and it absolutely makes sense; as for the wrapping of function arguments, I changed it to the way you suggested, I actually prefer it, let me know if I did it wrong.
Remark #4, "Naming"; your right, my function names are terribly long, I was thinking of just 'ht_' instead of 'hashtable_' and also, what you said makes sense, this is just a preference of mine to be terribly explicity, even tho, yes it is pretty dumb obvious what a hashtable inserts, I did leave the names for this one. Next, the 't_' & 's_' are just my school's mandatory style for structs and typedefs, so yeah, would defently do the '_t' and ommit the 's_', in my own code.
Next, the hashtable_alloc_table() variable size is yes pretty damn ambigious, changed it to num_entries, thank you. Next, the hashtable_dealloc_table() makes no sense your right, + not very explicative of what it does different from _destroy_table(), renamed both the hashtable_realloc_table() & hashtable_dealloc_table() to hashtable_grow_table() & hashtable_shrink_table(). Next, the "ft_" is again just (42's) my school's convention to name functions, I should mention that in the post commentary up top.
oops, Naming was remark 3, i'll just keep going with it; Remark #5, "Memory Leak", yep that was a mistake on my part, big time, thank you, fixed it.
Remark #6: "Comments". Got rid of the dependencies and the search tags & fixed the outdated comment. And yes it's pretty damn easy to forget to update one of the comments. Will take into consideration for the future.

KIIV KIIV 2662 silver badges7 bronze badges · Answer 2 · 2018-12-01 14:11:19Z

Few observations:

Long comments stating mostly obvious but completely ommits for example information about the hashtable_insert_entry takes ownership of the value, but it makes it's own copy of the key. And you've got memory leak in your test program because of it. It's noted for ft_entry_create, but it's little bit hidden.

Making the copy of key is fine, it can be local string, or even some internal static buffer (many standard functions without reentrancy support has it), or constant. None of those should be freed.

Freeing the value poointer, that is the question. For example in the C++ none of the STL containers are releasing raw pointers.
Those comments should be in header file. Usual scenario is you'll get header and precompiled library (if it's not the open source).
Reimplementing wheel? Why do you need the own dup, strlen and so on?
The prime number computation - so many "optimizations" and then you are using that weird condition inside loop with the result. You can bet the int nb never will be more than that constant, therefore you cant get i*i biger than that (well, you can, but it'll never get into the loop, as it fails condition i*i < nb). Not to mention something like INT_MAX would be much better than magically loking 2147483647.
Dependency on HASHCODE macro? But nothing about it's dependency on hash function.

Couldn't ask for better, thank you for taking the time. I do want to address your observations. So #3, yeah no i would never actually do that, but '42' the school im in makes us redo those funcs, so no worries about those. Obs #5, i dont consider the hash function part of my code so.. yea left it out. About obs #2, that's interesting i didnt know that thank you, so basically your saying to place that info above every prototype of the functions, in the header file ? Obs #1, i see, so your saying, that i need to mention that piece of info, i didnt think of that, and ur right its pretty important
Obs #4: Yeah that was pretty stupid on my part :33 and the 2147483647 was really bad and sloppy, should defently be macro'd. @KIIV
For observation #1, i added the missing info & in test_main.c, fixed the memory leak, thanks for that. I'll take note of observation #2 for next time, and for the future. Observation #4, got rid of the if (... || result == 2147483647), and will macro from now on those things and stop being lazy. For observation #5, i added below every HASHCODE() the hash() dep. And on top of all that, a big thanks, again. :D
I have a question, you know how my implementation does this: "the hashtable_insert_entry takes ownership of the value, but it makes it's own copy of the key" ; should it be like that or should it be different ? Is that the correct way of doing it, how would you make it ? or what's the convention; from what i've seen, people make duplicates of the key and some make a copy of the value if it is a char *, but obviously if its a void * you can't do that, so they just take ownership of it.. right ?
@AymenTM How do I know? I used the valgrind to check memory leaks and it showed to me some lost memory related to the ft_strdup and hashtable_insert_entry, so I take a look to the code and the value is directly assigned into the item, but key is duplicated by another ft_strdup. The taking ownership means your library takes over the responsibility for freeing it.

Stack Exchange Network

Quick & Simple Hash Table Implementation in C

2 Answers 2

Style

Naming

Memory Leak in `hashtable_alloc_table()`

Redundancy

Comments

Standard Library

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Quick & Simple Hash Table Implementation in C

2 Answers 2

Style

Naming

Memory Leak in hashtable_alloc_table()

Redundancy

Comments

Standard Library

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Memory Leak in `hashtable_alloc_table()`