I've implemented a separate chaining hash table in C++ and would appreciate any feedback or suggestions for improvement.
Code:
#include <iostream>
#include <bitset>
#include <functional> // for hash<class template> class
using namespace std;
const int M = 97; // no of chains
template <typename Key, typename Value>
struct Node
{
Key key;
Value val;
Node<Key, Value> *next;
Node() : next(nullptr) {}
Node(Key key, Value val, Node<Key, Value> *next) : key(key), val(val), next(next){};
};
template <typename Key, typename Value>
class SeparateChainingHashST
{
Node<Key, Value> *st[M]; // array of chain
int Hash(Key);
public:
SeparateChainingHashST();
~SeparateChainingHashST();
bool contain(Key);
void put(Key, Value);
Value get(Key);
bool remove(Key);
void display();
};
template <typename Key, typename Value>
SeparateChainingHashST<Key, Value>::SeparateChainingHashST() {
for (int i = 0; i < M; i++) {
st[i] = nullptr;
}
}
template <typename Key, typename Value>
SeparateChainingHashST<Key, Value>::~SeparateChainingHashST() {
for (int i = 0; i < M; i++) {
Node<Key, Value>* current = st[i];
while (current != nullptr) {
Node<Key, Value>* toDelete = current;
current = current->next;
delete toDelete;
}
}
}
template <typename Key, typename Value>
int SeparateChainingHashST<Key, Value>::Hash(Key key)
{
hash<Key> hashCode; // An int between -2^31 and 2^31 - 1.
return (hashCode(key) & 0x7fffffff) % M; // 0x7fffffff = 2147483647.
}
template <typename Key, typename Value>
bool SeparateChainingHashST<Key, Value>::contain(Key key)
{
int i = Hash(key);
for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
{
if (x->key == key)
return true;
}
return false;
}
template <typename Key, typename Value>
Value SeparateChainingHashST<Key, Value>::get(Key key)
{
int i = Hash(key);
for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
{
if (x->key == key)
return x->val;
}
throw runtime_error("\n Key not present.");
}
template <typename Key, typename Value>
void SeparateChainingHashST<Key, Value>::put(Key key, Value val)
{
int i = Hash(key);
for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
{
if (x->key == key)
{
x->val = val;
return;
}
}
st[i] = new Node<Key, Value>(key, val, st[i]);
}
template <typename Key, typename Value>
bool SeparateChainingHashST<Key, Value>::remove(Key key)
{
if (!contain(key))
return false;
int i = Hash(key);
if (st[i]->key == key)
{
st[i] = st[i]->next;
return true;
}
Node<Key, Value> *temp = st[i];
Node<Key, Value> *x = st[i]->next;
while (x->next != nullptr)
{
if (x->key == key)
{
temp->next = x->next;
return true;
}
temp = x;
x = x->next;
}
if (x->next == nullptr)
temp->next = nullptr;
return true;
}
template <typename Key, typename Value>
void SeparateChainingHashST<Key, Value>::display()
{
for (int i = 0; i < M; i++)
{
if (st[i] != nullptr)
cout << "\n table[" << i << "] -> ";
for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
cout << x->key << ": " << x->val << ", ";
}
}
int main()
{
SeparateChainingHashST<string, int> *st = new SeparateChainingHashST<string, int>();
st->put("Nafees", 5);
st->put("Zaid", 10);
st->put("Tauseef", 9);
st->put("Ali", 2);
st->put("Awais", 3);
st->put("Muhammad", 1);
if (st->remove("Zaid"))
cout << " Pair Deleted.";
else
cout << " Key Not Found!";
st->display();
try
{
cout << "\n Value associated with 'Zaid': " << st->get("Zaid") << endl;
}
catch (const runtime_error &e)
{
cout << e.what() << endl;
}
}
Are there any improvements or optimizations I can make in my implementation and the hash function that I implemented is working well?
I want to improve my 'hash' and 'remove' the function.
1 Answer 1
Missing include of <stdexcept>
Compiling and running this program reveals two of the memory leaks:
==1399969== HEAP SUMMARY:
==1399969== in use at exit: 1,064 bytes in 7 blocks
==1399969== total heap usage: 11 allocs, 4 frees, 76,003 bytes allocated
==1399969=わ=わ
=わ=わ1399969=わ=わ 48 bytes in 1 blocks are definitely lost in loss record 6 of 7
==1399969== at 0x4842F83: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1399969== by 0x10AF99: SeparateChainingHashST<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>::put(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) (293098.cpp:102)
==1399969== by 0x10A34F: main (293098.cpp:152)
==1399969=わ=わ
=わ=わ1399969=わ=わ 1,016 (776 direct, 240 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 7
==1399969== at 0x4842F83: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1399969== by 0x10A2A0: main (293098.cpp:150)
Fixing the first requires us to delete
the node we unlink in remove()
.
The second is trivially fixed by adding delete st
to main()
- or better, using an object directly rather than via pointer.
The other memory problem not revealed by the demo program is the double-delete when a hash-table is copied - the default generated copy-constructor and copy-assignment operator both copy the pointer values, leaving the new object trying to share ownership with the existing. That never ends well, which is why resource-owning objects such as this need to implement the Rule of Five.
The alternative is not to own the nodes directly, but instead use a smart pointer to manage the ownership. This can simplify the code, but naively using smart-pointer linked lists can lead to very deep recursion, so it's not a panacea.
using namespace std
is harmful - don't drag the huge (and growing) standard-library into the global namespace, especially in a library (which then inflicts that poor decision onto all its users).
M
is a very uninformative name for a global identifier (and being all-caps is misleading, as conventionally that's reserved for macros). If it really needs to be global (I don't believe it does), then it needs a better name. It also appears to be a candidate for constexpr
. In any case, a fixed number of buckets is likely to be wasteful for some uses, and inadequate for others.
I don't think that Node
needs to be a global type - why can't it be a private member type of SeparateChainingHashST
?
Its default constructor doesn't initialise key
or val
. I don't recommend leaving them uninitialised, as that often leads to undefined behaviour.
The three-argument constructor can copy more efficiently from its arguments:
Node(Key key, Value val, Node<Key,Value> *next)
: key{std::move(key)},
val{std::move(val)},
next{std::move(next)}
{}
contain()
and get()
shouldn't be modifying the hash table; I think they should both be declared const
.
for (int i = 0; i < M; i++) { st[i] = nullptr; }
looks like a simple std::fill()
operation.
This comment is misleading:
hash<Key> hashCode; // An int between -2^31 and 2^31 - 1.
hashCode
is a std::hash
object, not an int
value. And certainly not an integer in that particular range.
hashCode(key)
does return an integer value, but it's a std::size_t
rather than (signed) int
.
It's not normal for an exception message to begin with whitespace. I wouldn't expect a library to do this, for example:
throw std::runtime_error("\n Key not present.");
Element access functions have a lot of commonality. Think about how you could separate the list traversal from the subsequent operations, and refactor that into a private helper function.
display()
hard-codes the stream to output to. Consider rewriting in the usual form as a <<
operator that takes a std::ostream&
and (a const-ref to) a hash table, so we can write to any stream. We'll need this for it to be testable by writing to a std::ostringstream
, for example.