I am learning how to write generic code in C++ and I decided to write a hash table. Its key needs to be a primitive type and value can be any type. The hash function needs to be provided by the user. Type of the function is enforced to the user.
I am looking for tips on efficiency of the code and whether forwarding of the arguments are working as intended (perfect forwarding). Also in the get method, would it be better to return value type pointer instead of std::optional?
It only contains emplace back and get methods keep things tidy.
#pragma once
#include <list>
#include <optional>
#include <type_traits>
template< class KeyType > using hashFunction_t = size_t(const KeyType& key);
template < class KeyType, class ValueType, size_t nHashGroups, hashFunction_t<KeyType>* hashFunction>
class BasicHash
{
public:
template< class... ValueArgTypes >
void EmplaceBack(KeyType&& key, ValueArgTypes&&... args)
{
size_t hash = Hash(std::forward<KeyType>(key));
for (auto& pair : hashGroups[hash])
if (pair.key == key)
return;
hashGroups[hash].emplace_back(
std::forward<KeyType>(key),
std::forward<ValueType>(ValueType{ args... })
);
}
decltype(auto) Get(KeyType&& key)
{
size_t hash = Hash(std::forward<KeyType>(key));
for (auto& pair : hashGroups[hash])
if (pair.key == key)
return std::optional<ValueType>{ pair.value };
return std::optional<ValueType>{};
}
private:
size_t Hash(KeyType&& key)
{
return (*hashFunction)(static_cast<const KeyType&>(key)) % nHashGroups;
}
struct KeyValuePair
{
KeyValuePair(KeyType&& key, ValueType&& value) :
key(key), value(value)
{
}
KeyType key;
ValueType value;
};
std::list<KeyValuePair> hashGroups[nHashGroups];
};
Minimal usage of the hash table:
#include "BasicHash.h"
#include <string>
#include <iostream>
//djb2
size_t hashFunction(const std::string& key)
{
unsigned long hash = 5381;
int c;
const char* str = key.c_str();
while (c = *str++)
hash = ((hash << 5) + hash) + c;
return hash;
}
class TwoNumberStore
{
public:
TwoNumberStore(int a, int b) :
n1{ a }, n2{ b }
{
}
int n1 = 0;
int n2 = 0;
};
int main()
{
BasicHash<std::string, TwoNumberStore, 50, hashFunction> table;
table.EmplaceBack("First Two", 1, 2);
auto firstTwo = table.Get("First Two");
if (firstTwo.has_value())
{
std::cout << firstTwo.value().n1 << '\n';
std::cout << firstTwo.value().n2;
}
}
I just realised I was copying value in std::optional. Changed get to return a optional reference_wrapper instead.
decltype(auto) Get(KeyType&& key)
{
size_t hash = Hash(std::forward<KeyType>(key));
for (auto& pair : hashGroups[hash])
if (pair.key == key)
return std::optional<std::reference_wrapper<ValueType>>{ pair.value };
return std::optional<std::reference_wrapper<ValueType>>{};
}
This makes retrieving data even messier.
if (firstTwo.has_value())
{
std::cout << firstTwo.value().get().n1 << '\n';
std::cout << firstTwo.value().get().n2;
}
1 Answer 1
Make it look like std::unordered_map
When in Rome, do as the Romans do. In C++, if you are writing a container, it would be very nice if it has the same API as any other STL container. This avoids having to learn a different interface, and also makes it easier to use your container as a drop-in for an STL container without having to rewrite a lot of code that uses it. This is also part of making code more generic.
In particular, make sure the public member functions look like those of the closest matching STL container, which would be a std::unordered_map
. So:
- Change
EmplaceBack()
toemplace()
- Consider changing
Get()
tooperator[]
which looks up or inserts. - Add other typical STL container functions, like
clear()
,size()
,erase()
.
Note that the STL containers don't return std::optional
s. Of course, the STL containers themselves were designed way before std::optional
was a thing. Sometimes you know the item already exists, or maybe if you don't and it doesn't, you just want a default-constructed one. In those cases, operator[]()
and at()
do exactly what you want. A different function that returns a std::optional
might be nice though, since the standard way to do the same with STL containers:
auto it = container.find(key);
if (it != container.end()) {
auto &value = it->second;
...
}
...is quite annoying. Perhaps a better name than Get()
could be used though, which hints that an optional value will be returned.
Passing a hash function
Passing a function as a template parameter limits the types of functions you can pass. The STL containers that take a hash function take this as a parameter to the constructor. You can do the same thing:
template < class KeyType, class ValueType, size_t nHashGroups, class HashFunction>
class BasicHash
{
public:
BasicHash(const HashFunction& hashFunction = HashFunction()) :
hashFunction(hashFunction) {}
...
private:
size_t Hash(const KeyType& key)
{
return hashFunction(key) % nHashGroups;
}
...
HashFunction hashFunction;
};
The only drawback is that you have to pass the hash function as a function object, but that's not a problem if you use lambdas:
auto hashFunction = [](const std::string& key) -> size_t {
...
return hash;
}
...
BasicHash<std::string, TwoNumberStore, 50, decltype(hashFunction)> table(hashFunction);
Another option would be to not make the hash function type a template parameter, but to pass the hash function using a std::function<size_t(const KeyType&)>
. This has pros and cons; the advantage is that you don't need to specify the hash function type explicitly anymore, and you can pass a regular function to the constructor. The con is that there might be some performance overhead associated with std::function
.
Pass by const
reference where appropriate
You pass key
to Get()
and Hash()
using an r-value reference. However, this is not necessary, since you are not going to move from it. It also might prevent some optimizations. Prefer using const
reference whenever you want to pass a parameter you are not going to modify in any way.
One issue with r-value references is that they cannot bind to a const
l-value. Consider writing this:
std::string key = "First Two";
auto firstTwo = table.Get(key);
This would fail to compile if Get()
takes an r-value reference. A const
reference parameter will take anything.
Missing std::move()
in the constructor of KeyValuePair
The constructor of KeyValuePair
takes r-value references, but it just copies the values of the parameters into the member variables. You have to explicitly use std::move()
to ensure the move constructor is used if possible:
KeyValuePair(KeyType&& key, ValueType&& value) :
key(std::move(key)), value(std::move(value)) {}
Efficient storage of keys and values
Your hash table works well if you provide a good hash function and now up front how many elements you are going to store. However, it is usually hard to know exactly how much you are going to store up front, and therefore requiring nHashGroups
to be specified as a template constant is either going to make the hash table too small, leading to large buckets and eventually \$O(N)\$ performance, or it is too large and you waste a lot of memory on empty buckets.
Consider making the size of the hash table dynamic. Let the user optionally reserve()
a certain size if they do have a good estimate.
Furthermore, an empty std::list
uses 24 bytes on a 64-bit architecture, and the moment you start using it, it will have to allocate memory to store the actual contents. Instead of using buckets, consider just using a single array of KeyValuePair
s, and use open addressing to handle collisions. This does require that you can resize your hash table though.
Fix your hash function
GCC and Clang will warn about using the result of the assignment in the while
-loop in hashFunction()
as the condition. You could solve it by adding some parentheses, but I would rewrite it to proper C++ instead:
for (auto c: key)
hash = ((hash << 5) + hash) + c;
Also note that std::string
s can contain an arbitrary number of NUL-characters.
-
\$\begingroup\$ Thank you for your valuable input. I agree with all the points but I am a little confused by "Pass by const reference where appropriate". Since
KeyType
is a template r-value reference, it would expand to aconst KeyType&
if l-value argument is passed to it andKeyType
if anything else is passed. This is where I got that information from. Am I misunderstanding something? \$\endgroup\$fbi open_up– fbi open_up2022年01月29日 00:58:15 +00:00Commented Jan 29, 2022 at 0:58 -
\$\begingroup\$ I have addressed it in the answer. \$\endgroup\$G. Sliepen– G. Sliepen2022年01月29日 12:21:10 +00:00Commented Jan 29, 2022 at 12:21
size_t
come from? Is it some alias forstd::size_t
? \$\endgroup\$