First time implementing a hash table.
I resolve collisions using the separate chaining method (closed addressing), i.e with linked lists.
The hash function used is:
murmurhash3
(please tell me why this could be a bad choice or why it is a good choice (briefly)).If you want to investigate it more, or even test it, all dependencies and a
test_main.c
are ready made in the following Github repo: click here. Litterally just download and pressmake
.
Please be ruthless, relentless and blunt in your review... like seriously embarrass me, humiliate me. And THANK you in advance for your time.
Also if you have any advice, tips, any good habit I should adopt/start doing, please let me know, I am relatively speaking still a beginner !
Oh and is this kind of commenting of the code encouraged or annoying ? Also should I comment in functions ? Some say its bad practice...
Note: I implement my own standard library functions, so just don't worry about that it is intentional, just assume they work perfectly.
Here's the code:
Header: hashtable.h
/* * * * * * * * * * * *
========================
HASH TABLE HEADER
========================
* * * * * * * * * * * */
#ifndef FT_HASHTABLE_H
# define FT_HASHTABLE_H
# define HASHCODE(key, buckets) (hash(key, ft_strlen(key)) % buckets)
# define MIN_LOAD_FACTOR 0.0
# define MAX_LOAD_FACTOR 0.7
typedef struct s_entry
{
char *key;
void *value;
struct s_entry *successor;
} t_entry;
typedef struct s_hashtable
{
unsigned int entries;
unsigned int num_buckets;
t_entry **bucket_list;
} t_hashtable;
t_hashtable *hashtable_alloc_table(unsigned int num_entries);
int hashtable_grow_table(t_hashtable **table);
int hashtable_shrink_table(t_hashtable **table);
t_entry *hashtable_fetch_entry(t_hashtable *table, char *key);
int hashtable_insert_entry(t_hashtable **table,
char *key,
void *value);
int hashtable_delete_entry(t_hashtable **table, char *key);
int hashtable_rehash_entry(t_hashtable **table_to,
t_entry **entry);
int hashtable_rehash_table(t_hashtable **table_from,
t_hashtable **table_to);
int hashtable_destroy_table(t_hashtable **table);
int hashtable_set_appropriate_load_factor(t_hashtable **table);
t_entry *entry_create(char *key, void *value);
void entry_free(t_entry **entry);
void bucket_free(t_entry **head);
#endif
Functions: hashtable.c
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
// ======================================================================= //
// HASH TABLE FUNCTION LIBRARY //
// ======================================================================= //
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
#include <stdlib.h>
#include "hashtable.h"
#include "utils.h"
#include "murmurhash3/murmurhash3.h"
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Creates/allocates an empty hash table of size 'num_entries'
and then some (inorder to get to the nearest prime
number).
RETURN VALUES: If successful, returns a pointer to the
hash table. If an error occurs the function
will return a NULL pointer.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
t_hashtable *hashtable_alloc_table(unsigned int num_entries)
{
t_hashtable *table;
unsigned int i;
if (num_entries < 1)
return (NULL);
if (!(table = malloc(sizeof(t_hashtable))))
return (NULL);
num_entries = (unsigned int)ft_find_next_prime(num_entries);
if (!(table->bucket_list =
malloc(sizeof(t_entry*) * num_entries)))
{
free(table);
return (NULL);
}
table->num_buckets = num_entries;
table->entries = 0;
i = 0;
while (i < num_entries)
(table->bucket_list)[i++] = NULL;
return (table);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Inserts a key-value pair into the hash table.
NOTE: in this implementation ownership of 'value' is
taken, that is to say that free'ing of 'value' will be
taken care of, but 'value' MUST be allocated before hand
somewhere in the code, if you fail to do so, upon free'ing
of an entry or the hash table you WILL get a 'bad free'
error.
As for the 'key', a duplicate of it will be made (i.e
memory will be allocated inorder to make a duplicate of
it). The memory will then be free'd.
RETURN VALUES: If successful, returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_insert_entry(t_hashtable **table,
char *key,
void *value)
{
t_entry *entry;
unsigned int index;
if (table && *table && key && value)
{
if (hashtable_set_appropriate_load_factor(table) == -1)
return (-1);
if (!(entry = entry_create(key, value)))
return (-1);
index = HASHCODE(key, (*table)->num_buckets);
entry->successor = ((*table)->bucket_list)[index];
((*table)->bucket_list)[index] = entry;
(*table)->entries += 1;
return (0);
}
return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Finds and returns (retrieves) an entry.
RETURN VALUES: If the entry is found, a pointer to the entry is
returned; otherwise a NULL pointer is returned.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
t_entry *hashtable_fetch_entry(t_hashtable *table, char *key)
{
t_entry *cur_entry;
unsigned int index;
if (table && key)
{
index = HASHCODE(key, table->num_buckets);
cur_entry = (table->bucket_list)[index];
while (cur_entry)
{
if (ft_strcmp(cur_entry->key, key) == 0)
return (cur_entry);
cur_entry = cur_entry->successor;
}
}
return (NULL);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Finds and deletes/frees an entry in the hash
table.
RETURN VALUES: If the entry is found, and is successfully
deleted/free'd, the function returns 0;
otherwise the function returns -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_delete_entry(t_hashtable **table, char *key)
{
t_entry *prev_entry;
t_entry *cur_entry;
unsigned int index;
if (table && key)
{
index = HASHCODE(key, (*table)->num_buckets);
cur_entry = ((*table)->bucket_list)[index];
while (cur_entry)
{
if (ft_strcmp(cur_entry->key, key) == 0)
{
if (cur_entry == ((*table)->bucket_list)[index])
((*table)->bucket_list)[index] = cur_entry->successor;
else
prev_entry->successor = cur_entry->successor;
entry_free(&cur_entry);
(*table)->entries -= 1;
return (0);
}
prev_entry = cur_entry;
cur_entry = cur_entry->successor;
}
}
return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Deletes/frees the entire hash table
and all the entries contained in it.
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_destroy_table(t_hashtable **table)
{
unsigned int i;
if (table)
{
if (*table)
{
if ((*table)->bucket_list)
{
i = 0;
while (i < (*table)->num_buckets)
{
if (((*table)->bucket_list)[i])
bucket_free(&((*table)->bucket_list)[i]);
i++;
}
free((*table)->bucket_list);
(*table)->bucket_list = NULL;
}
free(*table);
(*table) = NULL;
}
return (0);
}
return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Checks that the current load factor is
neither greater than nor smaller than
the desired max load factor and desired
minimum load factor respectively.
If either is the case, a procedure to
realloc (grow) or dealloc (shrink) the
table will ensue.
If neither is the case, nothing happens.
RETURN VALUES: If nothing happens, or a successful
reallocation or deallocation happens,
0 is returned. If an error occurs -1
is returned.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_set_appropriate_load_factor(t_hashtable **table)
{
if (table && *table)
{
if ((float)(*table)->entries / (float)(*table)->num_buckets
> MAX_LOAD_FACTOR)
{
if (hashtable_grow_table(table) == -1)
return (-1);
return (0);
}
if ((float)(*table)->entries / (float)(*table)->num_buckets
< MIN_LOAD_FACTOR)
{
if (hashtable_shrink_table(table) == -1)
return (-1);
return (0);
}
else
{
return (0);
}
}
return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Grows the hash table by a factor of 2 and then some
(inorder to get to the nearest prime number).
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_grow_table(t_hashtable **table)
{
t_hashtable *new_table;
if (table && *table)
{
new_table = hashtable_alloc_table((*table)->num_buckets * 2);
if (new_table == NULL)
return (-1);
if (hashtable_rehash_table(table, &new_table) == -1)
return (-1);
hashtable_destroy_table(table);
(*table) = new_table;
return (0);
}
return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Shrinks the hash table by half and then some
(inorder to get to the nearest prime number).
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_shrink_table(t_hashtable **table)
{
t_hashtable *new_table;
if (table && *table)
{
if ((*table)->num_buckets > 1)
{
new_table = hashtable_alloc_table((*table)->num_buckets / 2);
if (new_table == NULL)
return (-1);
if (hashtable_rehash_table(table, &new_table) == -1)
return (-1);
hashtable_destroy_table(table);
(*table) = new_table;
return (0);
}
return (0);
}
return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Rehashs one entry in the 'table_to' hashtable.
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_rehash_entry(t_hashtable **table_to, t_entry **entry)
{
unsigned int index;
if (table_to && *table_to && entry && *entry)
{
index = HASHCODE((*entry)->key, (*table_to)->num_buckets);
(*entry)->successor = ((*table_to)->bucket_list)[index];
((*table_to)->bucket_list)[index] = (*entry);
(*table_to)->entries += 1;
return (0);
}
return (-1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Rehashs all the entries in the hashtable
'table_from' into the hashtable 'table_to'.
RETURN VALUES: If successful returns 0; otherwise -1.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
int hashtable_rehash_table(t_hashtable **table_from,
t_hashtable **table_to)
{
t_entry *cur_entry;
t_entry *temp;
unsigned int i;
if (table_from && *table_from && table_to && *table_to)
{
i = 0;
while (i < (*table_from)->num_buckets)
{
if (((*table_from)->bucket_list)[i])
{
cur_entry = ((*table_from)->bucket_list)[i];
while (cur_entry)
{
temp = cur_entry->successor;
if (hashtable_rehash_entry(table_to, &cur_entry) == -1)
return (-1);
cur_entry = temp;
}
((*table_from)->bucket_list)[i] = NULL;
}
i++;
}
}
return (((*table_to)->entries == (*table_from)->entries) ? 0 : -1);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Takes a key and a value and creates an entry out
of them.
NOTE: 'value' MUST have previously been
allocated, otherwise in the free'ing of
an entry, you WILL get a 'bad free' error.
RETURN VALUES: If successful, the function returns a pointer
to the new entry; if an error occurs, it returns
a NULL pointer.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
t_entry *entry_create(char *key, void *value)
{
t_entry *new_entry;
if (key && value)
{
if (!(new_entry = malloc(sizeof(t_entry))))
return (NULL);
new_entry->key = ft_strdup(key);
new_entry->value = value;
new_entry->successor = NULL;
return (new_entry);
}
return (NULL);
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Deletes/frees an entry (this is for use with entries that
have members that were allocated only).
NOTE: if 'entry->value' was not allocated somewhere in
the code, you WILL get a 'bad free' error.
RETURN VALUES: none.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
void entry_free(t_entry **entry)
{
if (entry && *entry)
{
if ((*entry)->key)
free((*entry)->key);
if ((*entry)->value)
free((*entry)->value);
free(*entry);
(*entry) = NULL;
}
}
/* — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
DESCRIPTION: Deletes/frees the entire bucket (linked
list).
RETURN VALUES: none.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — */
void bucket_free(t_entry **head)
{
t_entry *temp;
if (head)
{
while (*head)
{
temp = (*head);
(*head) = (*head)->successor;
entry_free(&temp);
}
}
}
Testing: click here!
Thank you for your time!
2 Answers 2
Overall, this seems fairly well written. But you said to be relentless and blunt, so here's what I really think! 😉
If you want to investigate it more, or even test it, all dependencies and a test_main.c are ready made in the following Github repo: click here. Litterally just download and press make.
Ugh. This seems to me to straddle the line in the rules for the site:
Be sure to embed the code you want reviewed in the question itself; you can leave supporting, but non-essential, code in links to other sites.
I wouldn't consider missing #includes
to be "supporting but non-essential code". If your actual code is separated into a header and a source file, you should post them both separately here rather than inlining the header and omitting the includes.
Style
I have to say that while I am one of these people who likes to line things up to some degree, you've gone overboard with aligning things, in my opinion. For me, it's actually harder to read with your formatting. I don't think that every variable and function declaration throughout the code needs the same spacing. Furthermore, the spacing on functions that go longer than you like makes it look like another function declaration on the next line, but an invalid one. If you're going to wrap the arguments, it's tradition to either indent them one level, or line them up with the function arguments on the previous line.
You should avoid declaring variables until you need them. If a person reading your code wants to know what type a variable is, it's easier to find if it's close to its first use. Likewise, if you want to change a type, it's easier to do if it's near where it's used.
Naming
I think your function names could be shorter. There's a lot of redundant words in their names. For example, hashtable_alloc_table()
could just be hashtable_alloc()
, and hashtable_insert_entry()
could just be hashtable_insert()
. (What else would you be inserting into a hash table? A dinner plate?)
Adding t_
to the front of every type is unnecessary. Types are obviously types from the context in which they're used. If you must add something, make it a suffix of _t
like every other C programmer so it's consistent. Also, what's the purpose of giving struct
names a prefix of s_
? Are you ever going to write the type as struct s_whatever
instead of t_whatever
? If not, just give it the same name, so it's:
typedef struct foo {
// fields
} foo;
In hashtable_alloc_table()
, the argument is called size
, but the units aren't clear. My assumption on reading the declaration was that it was going to be in bytes. But it's actually the number of entries to hold. As such, I would name it numEntries
rather than size
because size
is ambiguous.
The difference between hashtable_dealloc_table()
and hashtable_destroy_table()
is surprising given their names. It seems like hashtable_dealloc_table()
should be renamed to hashtable_shrink_table()
or something more in line with what it's doing.
The name hashtable_check_load_factor()
is also misleading. I wouldn't expect a function which is named "check " to change anything. I would call it something like hashtable_set_appropriate_load_factor()
or something like that so that a caller knows that it may change the hash table.
Finally, what does the prefix ft
stand for? It's not at all clear from the code you posted. A comment about its meaning somewhere might be appropriate.
Memory Leak in hashtable_alloc_table()
There's a memory leak in 'hashtable_alloc_table(). If the table is allocated, but the bucket list isn't, it returns
NULL`, but it never frees the table. That memory is now considered in-use by the OS making it unavailable to be re-used.
Redundancy
Why have the caller pass in a pointer to a pointer to hashtable_realloc_table()
, and then return a pointer to a hash table? You should do one or the other. A pointer to a pointer allows you to change the value of the pointer the caller uses, so you don't need to also return the new one. You can simply delete the old one and replace it with the new one.
Comments
I think you have too much info in your function comments. Why should I care which functions you call from that function? I shouldn't need to know that info. I shouldn't need to know which headers include the functions that any given function depends on. They should simply be included at the top of the file (which they aren't here).
The problem with comments is that they can get out of date with the code. That has happened with your hashtable_realloc_table()
function. It says that it "Grows the hash table by half", but it actually doubles the hash table size. Perhaps it used to only grow it by half, but now it doubles it?
I have no idea what "search tags" are in this context, and it seems unnecessary. I either already know what to search for or I don't. If I search for the function name, I'll find the function definition, so I don't need the tag.
Standard Library
You say:
Note: I implement my own standard library functions
Oh Heck no! The original implementors of the standard library made all kinds of mistakes in how they designed the standard library functions. Your implementations likely have all those same mistakes plus a whole bunch more due to the fact that you're a beginner at this. It's an interesting exercise to learn the language, but you shouldn't use them in real code.
-
\$\begingroup\$ First of all, @user1118321 , God bless you, this is a great review! For the first remark, I separated the files, and add the includes, was lazy on my part. For remark #2, the "Style", I did not mention but, I am following (42's) my school's mandatory style, that's why everything is insanely aligned, also why all variables are at the top of functions and separated by 1 line, although I totally see what your saying and it absolutely makes sense; as for the wrapping of function arguments, I changed it to the way you suggested, I actually prefer it, let me know if I did it wrong. \$\endgroup\$AymenTM– AymenTM2018年12月02日 06:27:24 +00:00Commented Dec 2, 2018 at 6:27
-
\$\begingroup\$ Remark #4, "Naming"; your right, my function names are terribly long, I was thinking of just '
ht_
' instead of 'hashtable_
' and also, what you said makes sense, this is just a preference of mine to be terribly explicity, even tho, yes it is pretty dumb obvious what a hashtable inserts, I did leave the names for this one. Next, the 't_
' & 's_
' are just my school's mandatory style for structs and typedefs, so yeah, would defently do the '_t
' and ommit the 's_
', in my own code. \$\endgroup\$AymenTM– AymenTM2018年12月02日 06:35:36 +00:00Commented Dec 2, 2018 at 6:35 -
\$\begingroup\$ Next, the
hashtable_alloc_table()
variablesize
is yes pretty damn ambigious, changed it tonum_entries
, thank you. Next, thehashtable_dealloc_table()
makes no sense your right, + not very explicative of what it does different from_destroy_table()
, renamed both thehashtable_realloc_table()
&hashtable_dealloc_table()
tohashtable_grow_table()
&hashtable_shrink_table()
. Next, the "ft_
" is again just (42's) my school's convention to name functions, I should mention that in the post commentary up top. \$\endgroup\$AymenTM– AymenTM2018年12月02日 06:44:45 +00:00Commented Dec 2, 2018 at 6:44 -
\$\begingroup\$ oops, Naming was remark 3, i'll just keep going with it; Remark #5, "Memory Leak", yep that was a mistake on my part, big time, thank you, fixed it. \$\endgroup\$AymenTM– AymenTM2018年12月02日 06:48:42 +00:00Commented Dec 2, 2018 at 6:48
-
\$\begingroup\$ Remark #6: "Comments". Got rid of the dependencies and the search tags & fixed the outdated comment. And yes it's pretty damn easy to forget to update one of the comments. Will take into consideration for the future. \$\endgroup\$AymenTM– AymenTM2018年12月02日 06:51:08 +00:00Commented Dec 2, 2018 at 6:51
Few observations:
Long comments stating mostly obvious but completely ommits for example information about the
hashtable_insert_entry
takes ownership of thevalue
, but it makes it's own copy of thekey
. And you've got memory leak in your test program because of it. It's noted forft_entry_create
, but it's little bit hidden.
Making the copy of key is fine, it can be local string, or even some internal static buffer (many standard functions without reentrancy support has it), or constant. None of those should be freed.
Freeing thevalue
poointer, that is the question. For example in the C++ none of the STL containers are releasing raw pointers.Those comments should be in header file. Usual scenario is you'll get header and precompiled library (if it's not the open source).
Reimplementing wheel? Why do you need the own dup, strlen and so on?
The prime number computation - so many "optimizations" and then you are using that weird condition inside loop with the
result
. You can bet theint nb
never will be more than that constant, therefore you cant get i*i biger than that (well, you can, but it'll never get into the loop, as it fails conditioni*i < nb
). Not to mention something like INT_MAX would be much better than magically loking2147483647
.Dependency on
HASHCODE
macro? But nothing about it's dependency onhash
function.
-
\$\begingroup\$ Couldn't ask for better, thank you for taking the time. I do want to address your observations. So #3, yeah no i would never actually do that, but '42' the school im in makes us redo those funcs, so no worries about those. Obs #5, i dont consider the hash function part of my code so.. yea left it out. About obs #2, that's interesting i didnt know that thank you, so basically your saying to place that info above every prototype of the functions, in the header file ? Obs #1, i see, so your saying, that i need to mention that piece of info, i didnt think of that, and ur right its pretty important \$\endgroup\$AymenTM– AymenTM2018年12月01日 17:41:21 +00:00Commented Dec 1, 2018 at 17:41
-
\$\begingroup\$ Obs #4: Yeah that was pretty stupid on my part :33 and the
2147483647
was really bad and sloppy, should defently be macro'd. @KIIV \$\endgroup\$AymenTM– AymenTM2018年12月01日 17:54:23 +00:00Commented Dec 1, 2018 at 17:54 -
\$\begingroup\$ For observation #1, i added the missing info & in
test_main.c
, fixed the memory leak, thanks for that. I'll take note of observation #2 for next time, and for the future. Observation #4, got rid of theif (... || result == 2147483647)
, and will macro from now on those things and stop being lazy. For observation #5, i added below everyHASHCODE()
thehash()
dep. And on top of all that, a big thanks, again. :D \$\endgroup\$AymenTM– AymenTM2018年12月01日 18:17:51 +00:00Commented Dec 1, 2018 at 18:17 -
\$\begingroup\$ I have a question, you know how my implementation does this: "the
hashtable_insert_entry
takes ownership of the value, but it makes it's own copy of the key" ; should it be like that or should it be different ? Is that the correct way of doing it, how would you make it ? or what's the convention; from what i've seen, people make duplicates of the key and some make a copy of the value if it is achar *
, but obviously if its avoid *
you can't do that, so they just take ownership of it.. right ? \$\endgroup\$AymenTM– AymenTM2018年12月01日 18:22:30 +00:00Commented Dec 1, 2018 at 18:22 -
\$\begingroup\$ @AymenTM How do I know? I used the valgrind to check memory leaks and it showed to me some lost memory related to the ft_strdup and hashtable_insert_entry, so I take a look to the code and the value is directly assigned into the item, but key is duplicated by another ft_strdup. The taking ownership means your library takes over the responsibility for freeing it. \$\endgroup\$KIIV– KIIV2018年12月01日 18:25:39 +00:00Commented Dec 1, 2018 at 18:25
test_main.c
, you can litterally download the directory, then just hitmake
and you can play around with thetest_main.c
file. Again here is the github repo \$\endgroup\$test_main.c:42:9: error: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘void *’ [-Werror=format=]
. But yes, that's a minor problem. \$\endgroup\$