Hash table using open addressing with double hashing probing

Question 1

Looking for a general review:

#ifndef HASH_TABLE_HASH_TABLE_H
#define HASH_TABLE_HASH_TABLE_H
#include <utility>
#include <stdexcept>
namespace mds {
 template<typename K, typename V >
 class hash_table {
 std::size_t M;
 std::pair<K, V> *table;
 bool *mark;
 std::size_t N;
 std::size_t hash(std::size_t h, std::size_t i) const {
 auto hash2 = (std::hash<std::size_t>()(h) % (M - 1));
 return ((h % M) + i * (hash2 | 0x1)) % M;
 }
 std::size_t power_of_2_ceiling(std::size_t x) const {
 auto bits = sizeof(std::size_t) << 3;
 --x;
 for (auto i = 1u; i != bits; i <<= 1) {
 x |= x >> i;
 }
 return x + 1;
 }
 public:
 hash_table() : hash_table(0) { }
 hash_table(std::size_t m)
 : M(power_of_2_ceiling(m)),
 table(new std::pair<K, V>[M]),
 mark(new bool[M]), N(0)
 {
 for (std::size_t i = 0; i < M; ++i) {
 mark[i] = false;
 }
 }
 hash_table(hash_table<K, V>&& map)
 : M(map.M), table(map.table), 
 mark(map.mark), N(map.N) {
 map.M = map.N = 0;
 map.mark = nullptr;
 map.table = nullptr;
 }
 hash_table(const hash_table<K, V>& map)
 : M(map.M), table(new std::pair<K, V>[map.M]),
 mark(new bool[map.M]), N(map.N) {
 for (std::size_t i = 0; i < M; ++i) {
 mark[i] = map.mark[i];
 table[i] = map.table[i];
 }
 }
 hash_table(std::unordered_map<K, V>&& map)
 : M(power_of_2_ceiling(map.size())),
 table(new std::pair<K, V>[M]),
 mark(new bool[M]), N(0)
 {
 for (std::size_t i = 0; i < M; ++i) {
 mark[i] = false;
 }
 for (auto &p : map) {
 insert(std::move(p));
 }
 }
 hash_table<K, V>& operator=(hash_table<K, V> other) {
 swap(*this, other);
 return *this;
 }
 friend void swap(hash_table<K, V>& first, hash_table<K, V>& second) noexcept {
 using std::swap;
 swap(first.M, second.M);
 swap(first.N, second.N);
 swap(first.table, second.table);
 swap(first.mark, second.mark);
 }
 ~hash_table() {
 delete[] table;
 delete[] mark;
 }
 const std::pair<K, V>* find(const K& key) const {
 auto hash_code = std::hash<K>()(key);
 auto i = 0u;
 auto h = hash(hash_code, i);
 while (i < M && mark[h]) {
 if (table[h].first == key ) {
 return &table[h];
 }
 h = hash(hash_code, ++i);
 }
 return nullptr;
 }
 bool insert(std::pair<K, V>&& p) {
 auto hash_code = std::hash<K>()(p.first);
 for (std::size_t i = 0; i < M; ++i) {
 auto h = hash(hash_code, i);
 if (!mark[h]) {
 table[h] = std::move(p);
 ++N;
 return mark[h] = true;
 }
 else if (table[h].first == p.first) {
 return false;
 }
 }
 throw std::overflow_error("overflow");
 }
 bool constains(const K& key) const {
 return find(key) != nullptr;
 }
 std::size_t size() {
 return N;
 }
 };
}
#endif // !HASH_TABLE_HASH_TABLE_H

Question 2

Here are a few things I noticed reading your code:

You should probably use more descriptive identifiers for M and N here.
Instead of dynamic allocation you could also use std::vector<...> for table and mark. That would allow to simplify your code considerably. For example you could drop the custom copy/move constructors/assignment operators and the destructor and use the implicitly defined ones instead. You could also use the provided .size() method of std::vector to get M instead of saving it explicitly.

If you didn't use std::vector for performance considerations, then I doubt that is a problem. Element access is through one indirection anyway and you do not have resize operations. Simply use reserve in your constructor to allocate the proper size immediately. Only the required memory for std::vector might be a bit larger (additional integer for current size/reserved size and doubling of length M).

Even if you still don't want to use std::vector, then you should consider writting an additional class (heap_array?) to handle the memory management or you could use std::unique_ptr, which would at least make your custom move constructor/assignment operator and the destructor redundant.

In hash_table(std::size_t m): The body is simply zero-initialization of the mark array. The same can be achieved with:

hash_table(std::size_t m)
 : M(power_of_2_ceiling(m)),
 table(new std::pair<K, V>[M]),
 mark(new bool[M]{}), N(0)
{ }

In hash_table(hash_table<K, V>&& map): Is there a reason to set M and N to zero? A moved object only needs to be properly destructable, but the destructor does not use M or N.
You can refer to the current instantiation of a template class without specifying the arguments, i.e. hash_table(hash_table&& map) would be fine for the move constructor, too.
To use std::unordered_map you need to include <unordered_map>.
hash_table(std::unordered_map<K, V>&& map): You have a rvalue reference constructor here, but none for const references. That seems inconsistent.

hash_table(std::unordered_map<K, V>&& map) initially does the same as hash_table(std::size_t), you could simply call it:

hash_table(std::unordered_map<K, V>&& map)
 : hash_table(map.size())
{
 for (auto &p : map) {
 insert(std::move(p));
 }
}

(dropped)
(dropped)
You could use just one array of type std::tuple<K,V,bool> (or a custom struct) instead of two arrays table and mark. Then you would not have to repeat the memory management code twice, and with the access pattern related data is closer in memory. Together with my point 2. this would reduce overhead even more.
Your only method to return elements find returns a const std::pair<K,V>*, implying that it is impossible to modify the value of an entry in-place. I think that is a rather strong limitation. Of course one may not change the key in-place, but the value should be modifiable. Therefore the type should be std::pair<const K,V>* and table also should be of type std::pair<const K,V>.
The maximal size of your hash_table is kind of arbitarily chosen and once it is full there will be an exception thrown. This might not have been the scope of your intentions but really the size should grow when a certain load is reached and rehashing should happen.
Your insert takes a rvalue reference. This means that only temporaries or moved elements can be inserted. There should also be an overload for const references (and without move).
size() should be const.

user116966user116966user116966 · Accepted Answer · 2016-09-25 19:40:08Z

Here are a few things I noticed reading your code:

You should probably use more descriptive identifiers for M and N here.
Instead of dynamic allocation you could also use std::vector<...> for table and mark. That would allow to simplify your code considerably. For example you could drop the custom copy/move constructors/assignment operators and the destructor and use the implicitly defined ones instead. You could also use the provided .size() method of std::vector to get M instead of saving it explicitly.

If you didn't use std::vector for performance considerations, then I doubt that is a problem. Element access is through one indirection anyway and you do not have resize operations. Simply use reserve in your constructor to allocate the proper size immediately. Only the required memory for std::vector might be a bit larger (additional integer for current size/reserved size and doubling of length M).

Even if you still don't want to use std::vector, then you should consider writting an additional class (heap_array?) to handle the memory management or you could use std::unique_ptr, which would at least make your custom move constructor/assignment operator and the destructor redundant.

In hash_table(std::size_t m): The body is simply zero-initialization of the mark array. The same can be achieved with:

hash_table(std::size_t m)
 : M(power_of_2_ceiling(m)),
 table(new std::pair<K, V>[M]),
 mark(new bool[M]{}), N(0)
{ }

In hash_table(hash_table<K, V>&& map): Is there a reason to set M and N to zero? A moved object only needs to be properly destructable, but the destructor does not use M or N.
You can refer to the current instantiation of a template class without specifying the arguments, i.e. hash_table(hash_table&& map) would be fine for the move constructor, too.
To use std::unordered_map you need to include <unordered_map>.
hash_table(std::unordered_map<K, V>&& map): You have a rvalue reference constructor here, but none for const references. That seems inconsistent.

hash_table(std::unordered_map<K, V>&& map) initially does the same as hash_table(std::size_t), you could simply call it:

hash_table(std::unordered_map<K, V>&& map)
 : hash_table(map.size())
{
 for (auto &p : map) {
 insert(std::move(p));
 }
}

(dropped)
(dropped)
You could use just one array of type std::tuple<K,V,bool> (or a custom struct) instead of two arrays table and mark. Then you would not have to repeat the memory management code twice, and with the access pattern related data is closer in memory. Together with my point 2. this would reduce overhead even more.
Your only method to return elements find returns a const std::pair<K,V>*, implying that it is impossible to modify the value of an entry in-place. I think that is a rather strong limitation. Of course one may not change the key in-place, but the value should be modifiable. Therefore the type should be std::pair<const K,V>* and table also should be of type std::pair<const K,V>.
The maximal size of your hash_table is kind of arbitarily chosen and once it is full there will be an exception thrown. This might not have been the scope of your intentions but really the size should grow when a certain load is reached and rehashing should happen.
Your insert takes a rvalue reference. This means that only temporaries or moved elements can be inserted. There should also be an overload for const references (and without move).
size() should be const.

Stack Exchange Network

Hash table using open addressing with double hashing probing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Hash table using open addressing with double hashing probing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions