Separate Chaining Hash Table Implementation in C++

Question 1

I've implemented a separate chaining hash table in C++ and would appreciate any feedback or suggestions for improvement.

Code:

#include <iostream>
#include <bitset>
#include <functional> // for hash<class template> class
using namespace std;
const int M = 97; // no of chains
template <typename Key, typename Value>
struct Node
{
 Key key;
 Value val;
 Node<Key, Value> *next;
 Node() : next(nullptr) {}
 Node(Key key, Value val, Node<Key, Value> *next) : key(key), val(val), next(next){};
};
template <typename Key, typename Value>
class SeparateChainingHashST
{
 Node<Key, Value> *st[M]; // array of chain
 int Hash(Key);
public:
 SeparateChainingHashST();
 ~SeparateChainingHashST();
 bool contain(Key);
 void put(Key, Value);
 Value get(Key);
 bool remove(Key);
 void display();
};
template <typename Key, typename Value>
SeparateChainingHashST<Key, Value>::SeparateChainingHashST() {
 for (int i = 0; i < M; i++) {
 st[i] = nullptr;
 }
}
template <typename Key, typename Value>
SeparateChainingHashST<Key, Value>::~SeparateChainingHashST() {
 for (int i = 0; i < M; i++) {
 Node<Key, Value>* current = st[i];
 while (current != nullptr) {
 Node<Key, Value>* toDelete = current;
 current = current->next;
 delete toDelete;
 }
 }
}
template <typename Key, typename Value>
int SeparateChainingHashST<Key, Value>::Hash(Key key)
{
 hash<Key> hashCode; // An int between -2^31 and 2^31 - 1.
 return (hashCode(key) & 0x7fffffff) % M; // 0x7fffffff = 2147483647. 
}
template <typename Key, typename Value>
bool SeparateChainingHashST<Key, Value>::contain(Key key)
{
 int i = Hash(key);
 for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
 {
 if (x->key == key)
 return true;
 }
 return false;
}
template <typename Key, typename Value>
Value SeparateChainingHashST<Key, Value>::get(Key key)
{
 int i = Hash(key);
 for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
 {
 if (x->key == key)
 return x->val;
 }
 throw runtime_error("\n Key not present.");
}
template <typename Key, typename Value>
void SeparateChainingHashST<Key, Value>::put(Key key, Value val)
{
 int i = Hash(key);
 for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
 {
 if (x->key == key)
 {
 x->val = val;
 return;
 }
 }
 st[i] = new Node<Key, Value>(key, val, st[i]);
}
template <typename Key, typename Value>
bool SeparateChainingHashST<Key, Value>::remove(Key key)
{
 if (!contain(key))
 return false;
 int i = Hash(key);
 if (st[i]->key == key)
 {
 st[i] = st[i]->next;
 return true;
 }
 Node<Key, Value> *temp = st[i];
 Node<Key, Value> *x = st[i]->next;
 while (x->next != nullptr)
 {
 if (x->key == key)
 {
 temp->next = x->next;
 return true;
 }
 temp = x;
 x = x->next;
 }
 if (x->next == nullptr)
 temp->next = nullptr;
 return true;
}
template <typename Key, typename Value>
void SeparateChainingHashST<Key, Value>::display()
{
 for (int i = 0; i < M; i++)
 {
 if (st[i] != nullptr)
 cout << "\n table[" << i << "] -> ";
 for (Node<Key, Value> *x = st[i]; x != nullptr; x = x->next)
 cout << x->key << ": " << x->val << ", ";
 }
}
int main()
{
 SeparateChainingHashST<string, int> *st = new SeparateChainingHashST<string, int>();
 st->put("Nafees", 5);
 st->put("Zaid", 10);
 st->put("Tauseef", 9);
 st->put("Ali", 2);
 st->put("Awais", 3);
 st->put("Muhammad", 1);
 if (st->remove("Zaid"))
 cout << " Pair Deleted.";
 else
 cout << " Key Not Found!";
 st->display();
 try
 {
 cout << "\n Value associated with 'Zaid': " << st->get("Zaid") << endl;
 }
 catch (const runtime_error &e)
 {
 cout << e.what() << endl;
 }
}

Are there any improvements or optimizations I can make in my implementation and the hash function that I implemented is working well?

I want to improve my 'hash' and 'remove' the function.

Question 2

Missing include of <stdexcept>

Compiling and running this program reveals two of the memory leaks:

==1399969== HEAP SUMMARY:
==1399969== in use at exit: 1,064 bytes in 7 blocks
==1399969== total heap usage: 11 allocs, 4 frees, 76,003 bytes allocated
==1399969=わ=わ 
=わ=わ1399969=わ=わ 48 bytes in 1 blocks are definitely lost in loss record 6 of 7
==1399969== at 0x4842F83: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1399969== by 0x10AF99: SeparateChainingHashST<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>::put(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) (293098.cpp:102)
==1399969== by 0x10A34F: main (293098.cpp:152)
==1399969=わ=わ 
=わ=わ1399969=わ=わ 1,016 (776 direct, 240 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 7
==1399969== at 0x4842F83: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1399969== by 0x10A2A0: main (293098.cpp:150)

Fixing the first requires us to delete the node we unlink in remove().

The second is trivially fixed by adding delete st to main() - or better, using an object directly rather than via pointer.

The other memory problem not revealed by the demo program is the double-delete when a hash-table is copied - the default generated copy-constructor and copy-assignment operator both copy the pointer values, leaving the new object trying to share ownership with the existing. That never ends well, which is why resource-owning objects such as this need to implement the Rule of Five.

The alternative is not to own the nodes directly, but instead use a smart pointer to manage the ownership. This can simplify the code, but naively using smart-pointer linked lists can lead to very deep recursion, so it's not a panacea.

using namespace std is harmful - don't drag the huge (and growing) standard-library into the global namespace, especially in a library (which then inflicts that poor decision onto all its users).

M is a very uninformative name for a global identifier (and being all-caps is misleading, as conventionally that's reserved for macros). If it really needs to be global (I don't believe it does), then it needs a better name. It also appears to be a candidate for constexpr. In any case, a fixed number of buckets is likely to be wasteful for some uses, and inadequate for others.

I don't think that Node needs to be a global type - why can't it be a private member type of SeparateChainingHashST?

Its default constructor doesn't initialise key or val. I don't recommend leaving them uninitialised, as that often leads to undefined behaviour.

The three-argument constructor can copy more efficiently from its arguments:

 Node(Key key, Value val, Node<Key,Value> *next)
 : key{std::move(key)},
 val{std::move(val)},
 next{std::move(next)}
 {}

contain() and get() shouldn't be modifying the hash table; I think they should both be declared const.

for (int i = 0; i < M; i++) { st[i] = nullptr; } looks like a simple std::fill() operation.

This comment is misleading:

 hash<Key> hashCode; // An int between -2^31 and 2^31 - 1.

hashCode is a std::hash object, not an int value. And certainly not an integer in that particular range.

hashCode(key) does return an integer value, but it's a std::size_t rather than (signed) int.

It's not normal for an exception message to begin with whitespace. I wouldn't expect a library to do this, for example:

 throw std::runtime_error("\n Key not present.");

Element access functions have a lot of commonality. Think about how you could separate the list traversal from the subsequent operations, and refactor that into a private helper function.

display() hard-codes the stream to output to. Consider rewriting in the usual form as a << operator that takes a std::ostream& and (a const-ref to) a hash table, so we can write to any stream. We'll need this for it to be testable by writing to a std::ostringstream, for example.

score 1 · Answer 1 · 2024-07-30 12:32:39Z

Missing include of <stdexcept>

Compiling and running this program reveals two of the memory leaks:

==1399969== HEAP SUMMARY:
==1399969== in use at exit: 1,064 bytes in 7 blocks
==1399969== total heap usage: 11 allocs, 4 frees, 76,003 bytes allocated
==1399969=わ=わ 
=わ=わ1399969=わ=わ 48 bytes in 1 blocks are definitely lost in loss record 6 of 7
==1399969== at 0x4842F83: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1399969== by 0x10AF99: SeparateChainingHashST<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>::put(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) (293098.cpp:102)
==1399969== by 0x10A34F: main (293098.cpp:152)
==1399969=わ=わ 
=わ=わ1399969=わ=わ 1,016 (776 direct, 240 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 7
==1399969== at 0x4842F83: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1399969== by 0x10A2A0: main (293098.cpp:150)