The functions
The functionsmalloc
andrealloc
might fail if there is not enough memory, and in this case they would returnNULL
, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.malloc
andrealloc
might fail if there is not enough memory, and in this case they would returnNULL
, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.
Update: a comment was skeptical about this recommendation, noting that on some operating systems, trying to write through a null pointer will result in a segmentation fault, and for some programs this is acceptable behaviour. This might indeed be fine in some cases, but when writing a reusable component like a hash table library, it is important to do better than that. The problems are: (i) Writing through a null pointer does not result in a segmentation fault in all situations, for example the code might be running on an embedded controller using an operating system that does not protect memory at address zero, or on hardware that lacks memory protection at all. The world does not consist only of Windows and Linux. (ii) In the presence of an optimizing compiler, there is no guarantee that you will merely get a segmentation fault. Writing through a null pointer is undefined behaviour, and might lead to data corruption or a security vulnerability instead of (or as well as) a crash. See this article by Harry Lee for some examples. (iii) In many cases it's not acceptable to crash, for example in safety-critical software. There are techniques for coping with memory exhaustion such as refusing requests until memory pressure abates, or garbage collection.
If the hash table is full, and you call
insert
with a key that hashes to zero (modulo the size of the table) then thewhile
loop will never terminate and the program will hang.If the hash table is full, and you call
insert
with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.In
Prime_num
, ifold_size
is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after2*old_size
(114) is 127, which is more thanold_size * 2 + 10
(124). (If you compiled with warnings turned on then your compiler would have complained about this.)In
get_item
, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.In
get_item
, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.In
resize_HashTable
, thetmp_array
is allocated but never freed. This is a memory leak.In
resize_HashTable
, the original bucket structures are not freed. This is a memory leak.In
contains
, there is no check against theoccupied
flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.The algorithm in
delete
is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.
The functions
malloc
andrealloc
might fail if there is not enough memory, and in this case they would returnNULL
, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.If the hash table is full, and you call
insert
with a key that hashes to zero (modulo the size of the table) then thewhile
loop will never terminate and the program will hang.If the hash table is full, and you call
insert
with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.In
Prime_num
, ifold_size
is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after2*old_size
(114) is 127, which is more thanold_size * 2 + 10
(124). (If you compiled with warnings turned on then your compiler would have complained about this.)In
get_item
, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.In
get_item
, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.In
resize_HashTable
, thetmp_array
is allocated but never freed. This is a memory leak.In
resize_HashTable
, the original bucket structures are not freed. This is a memory leak.In
contains
, there is no check against theoccupied
flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.The algorithm in
delete
is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.
- The functions
malloc
andrealloc
might fail if there is not enough memory, and in this case they would returnNULL
, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.
Update: a comment was skeptical about this recommendation, noting that on some operating systems, trying to write through a null pointer will result in a segmentation fault, and for some programs this is acceptable behaviour. This might indeed be fine in some cases, but when writing a reusable component like a hash table library, it is important to do better than that. The problems are: (i) Writing through a null pointer does not result in a segmentation fault in all situations, for example the code might be running on an embedded controller using an operating system that does not protect memory at address zero, or on hardware that lacks memory protection at all. The world does not consist only of Windows and Linux. (ii) In the presence of an optimizing compiler, there is no guarantee that you will merely get a segmentation fault. Writing through a null pointer is undefined behaviour, and might lead to data corruption or a security vulnerability instead of (or as well as) a crash. See this article by Harry Lee for some examples. (iii) In many cases it's not acceptable to crash, for example in safety-critical software. There are techniques for coping with memory exhaustion such as refusing requests until memory pressure abates, or garbage collection.
If the hash table is full, and you call
insert
with a key that hashes to zero (modulo the size of the table) then thewhile
loop will never terminate and the program will hang.If the hash table is full, and you call
insert
with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.In
Prime_num
, ifold_size
is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after2*old_size
(114) is 127, which is more thanold_size * 2 + 10
(124). (If you compiled with warnings turned on then your compiler would have complained about this.)In
get_item
, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.In
get_item
, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.In
resize_HashTable
, thetmp_array
is allocated but never freed. This is a memory leak.In
resize_HashTable
, the original bucket structures are not freed. This is a memory leak.In
contains
, there is no check against theoccupied
flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.The algorithm in
delete
is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.
I spotted some bugs:
The functions
malloc
andrealloc
might fail if there is not enough memory, and in this case they would returnNULL
, and since the code does not check the result, this would lead to undefined behaviour. It's necessary to check the results of these functions.If the hash table is full, and you call
insert
with a key that hashes to zero (modulo the size of the table) then thewhile
loop will never terminate and the program will hang.If the hash table is full, and you call
insert
with a key that hashes to something other than zero (modulo the size of the table) then the new key will overwrite an old key without any kind of warning or error.In
Prime_num
, ifold_size
is 57, then the function does not find a suitable new size, and so control runs off the end of the function and nothing is returned. This is because the next prime number after2*old_size
(114) is 127, which is more thanold_size * 2 + 10
(124). (If you compiled with warnings turned on then your compiler would have complained about this.)In
get_item
, there is no way for the caller to tell if the key was found or not. If it was found, then a bucket structure from the hash table is returned. But if it was not found, then a newly allocated bucket structure is returned. There is no way for the caller to tell these apart—the newly allocated bucket structure could contain any data whatsoever.In
get_item
, if the key is found, then a bucket structure is allocated and neither freed nor returned to the caller. This is a memory leak.In
resize_HashTable
, thetmp_array
is allocated but never freed. This is a memory leak.In
resize_HashTable
, the original bucket structures are not freed. This is a memory leak.In
contains
, there is no check against theoccupied
flag. This means that if a key is not present in the table then it will be compared against all the keys in the table, including keys that were deleted or never initialized.The algorithm in
delete
is incorrect: if the deleted key is in the middle of a chain of keys with identical hashes, then deleting it leaves all the following keys in the chain unfindable.
This is a common mistake! In The Art of Computer Programming, Knuth comments, "Many computer programmers have great faith in algorithms, and they are surprised to find that the obvious way to delete records from a hash table doesn't work." (Vol. III, p. 533.)
If you want to get this right, Knuth gives an algorithm (pp. 533–4) for deletion in a open hash table with linear probing.
But a simpler approach is to mark the deleted key with a flag meaning "there was a key here but it was deleted" which you later remove when growing or shrinking the table.