Looking for a general review:
#ifndef HASH_TABLE_HASH_TABLE_H
#define HASH_TABLE_HASH_TABLE_H
#include <utility>
#include <stdexcept>
namespace mds {
template<typename K, typename V >
class hash_table {
std::size_t M;
std::pair<K, V> *table;
bool *mark;
std::size_t N;
std::size_t hash(std::size_t h, std::size_t i) const {
auto hash2 = (std::hash<std::size_t>()(h) % (M - 1));
return ((h % M) + i * (hash2 | 0x1)) % M;
}
std::size_t power_of_2_ceiling(std::size_t x) const {
auto bits = sizeof(std::size_t) << 3;
--x;
for (auto i = 1u; i != bits; i <<= 1) {
x |= x >> i;
}
return x + 1;
}
public:
hash_table() : hash_table(0) { }
hash_table(std::size_t m)
: M(power_of_2_ceiling(m)),
table(new std::pair<K, V>[M]),
mark(new bool[M]), N(0)
{
for (std::size_t i = 0; i < M; ++i) {
mark[i] = false;
}
}
hash_table(hash_table<K, V>&& map)
: M(map.M), table(map.table),
mark(map.mark), N(map.N) {
map.M = map.N = 0;
map.mark = nullptr;
map.table = nullptr;
}
hash_table(const hash_table<K, V>& map)
: M(map.M), table(new std::pair<K, V>[map.M]),
mark(new bool[map.M]), N(map.N) {
for (std::size_t i = 0; i < M; ++i) {
mark[i] = map.mark[i];
table[i] = map.table[i];
}
}
hash_table(std::unordered_map<K, V>&& map)
: M(power_of_2_ceiling(map.size())),
table(new std::pair<K, V>[M]),
mark(new bool[M]), N(0)
{
for (std::size_t i = 0; i < M; ++i) {
mark[i] = false;
}
for (auto &p : map) {
insert(std::move(p));
}
}
hash_table<K, V>& operator=(hash_table<K, V> other) {
swap(*this, other);
return *this;
}
friend void swap(hash_table<K, V>& first, hash_table<K, V>& second) noexcept {
using std::swap;
swap(first.M, second.M);
swap(first.N, second.N);
swap(first.table, second.table);
swap(first.mark, second.mark);
}
~hash_table() {
delete[] table;
delete[] mark;
}
const std::pair<K, V>* find(const K& key) const {
auto hash_code = std::hash<K>()(key);
auto i = 0u;
auto h = hash(hash_code, i);
while (i < M && mark[h]) {
if (table[h].first == key ) {
return &table[h];
}
h = hash(hash_code, ++i);
}
return nullptr;
}
bool insert(std::pair<K, V>&& p) {
auto hash_code = std::hash<K>()(p.first);
for (std::size_t i = 0; i < M; ++i) {
auto h = hash(hash_code, i);
if (!mark[h]) {
table[h] = std::move(p);
++N;
return mark[h] = true;
}
else if (table[h].first == p.first) {
return false;
}
}
throw std::overflow_error("overflow");
}
bool constains(const K& key) const {
return find(key) != nullptr;
}
std::size_t size() {
return N;
}
};
}
#endif // !HASH_TABLE_HASH_TABLE_H
1 Answer 1
Here are a few things I noticed reading your code:
You should probably use more descriptive identifiers for
M
andN
here.Instead of dynamic allocation you could also use
std::vector<...>
fortable
andmark
. That would allow to simplify your code considerably. For example you could drop the custom copy/move constructors/assignment operators and the destructor and use the implicitly defined ones instead. You could also use the provided.size()
method ofstd::vector
to getM
instead of saving it explicitly.If you didn't use
std::vector
for performance considerations, then I doubt that is a problem. Element access is through one indirection anyway and you do not have resize operations. Simply usereserve
in your constructor to allocate the proper size immediately. Only the required memory forstd::vector
might be a bit larger (additional integer for current size/reserved size and doubling of lengthM
).Even if you still don't want to use
std::vector
, then you should consider writting an additional class (heap_array
?) to handle the memory management or you could usestd::unique_ptr
, which would at least make your custom move constructor/assignment operator and the destructor redundant.In
hash_table(std::size_t m)
: The body is simply zero-initialization of themark
array. The same can be achieved with:hash_table(std::size_t m) : M(power_of_2_ceiling(m)), table(new std::pair<K, V>[M]), mark(new bool[M]{}), N(0) { }
In
hash_table(hash_table<K, V>&& map)
: Is there a reason to setM
andN
to zero? A moved object only needs to be properly destructable, but the destructor does not useM
orN
.You can refer to the current instantiation of a template class without specifying the arguments, i.e.
hash_table(hash_table&& map)
would be fine for the move constructor, too.To use
std::unordered_map
you need to include<unordered_map>
.hash_table(std::unordered_map<K, V>&& map)
: You have a rvalue reference constructor here, but none for const references. That seems inconsistent.hash_table(std::unordered_map<K, V>&& map)
initially does the same ashash_table(std::size_t)
, you could simply call it:hash_table(std::unordered_map<K, V>&& map) : hash_table(map.size()) { for (auto &p : map) { insert(std::move(p)); } }
(dropped)
(dropped)
You could use just one array of type
std::tuple<K,V,bool>
(or a custom struct) instead of two arraystable
andmark
. Then you would not have to repeat the memory management code twice, and with the access pattern related data is closer in memory. Together with my point 2. this would reduce overhead even more.Your only method to return elements
find
returns aconst std::pair<K,V>*
, implying that it is impossible to modify the value of an entry in-place. I think that is a rather strong limitation. Of course one may not change the key in-place, but the value should be modifiable. Therefore the type should bestd::pair<const K,V>*
andtable
also should be of typestd::pair<const K,V>
.The maximal size of your
hash_table
is kind of arbitarily chosen and once it is full there will be an exception thrown. This might not have been the scope of your intentions but really the size should grow when a certain load is reached and rehashing should happen.Your
insert
takes a rvalue reference. This means that only temporaries ormove
d elements can be inserted. There should also be an overload for const references (and withoutmove
).size()
should beconst
.