Optimizing Hash Table

Question 1

I am trying to write my own implementation of a hash table (hash map) in C++. It turns out that my code is unoptimized as I can't pass performance tests. Can you please give some advice for optimization of the program?

I am using open addressing for resolving collisions with quadratic probing. Hash function for string: value of polynom, coefs of polynom - every single char of string. Only strings are inserted into hash table

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cmath>
//HashTable class
/*
 Using open adressing for resolving collisions with quadratic probing
*/
class HashTable {
public:
 HashTable();
 bool Set(std::string key);
 bool Remove(std::string key);
 bool Get(std::string key);
private:
 int GetHash(std::string text, int table_size);
 std::vector<std::pair<std::string, bool>> table;
 void ExtendTable();
 size_t fullness = 0;
};
HashTable::HashTable() {
 //first element of pair - key, second - is the cell free
 for(int i = 0; i < 8; i++) {
 table.push_back(std::make_pair("",true));
 }
}
int HashTable::GetHash(std::string text, int table_size) {
 //Hash function for string. Calculates polynom value using Horner method
 int b = text[0];
 int point = (int)sqrt(table_size); //x value for polynom
 for(int i = 1; i < text.size(); i++) {
 b = (text[i] + b*point) % table_size;
 }
 return b % table_size;
}
void HashTable::ExtendTable() {
 //Extends table if fullness == 3/4 * size of table
 //creates new table
 std::vector<std::pair<std::string, bool>> new_table;
 for(int i = 0; i < 2*this->table.size(); i++) {
 new_table.push_back(std::make_pair("", true));
 }
 //copying old table to a new one
 for(int i = 0; i < this->table.size(); i++) {
 int new_index = GetHash(this->table[i].first, 2*this->table.size());
 int step = 1;
 while(!new_table[new_index].second) {
 new_index += pow(step, 2);
 new_index %= 2*this->table.size();
 step++;
 }
 new_table[new_index] = this->table[i];
 }
 this->table = new_table;
}
bool HashTable::Set(std::string key) {
 //Adding new element to hash table. Dublicates are ignored
 int index = GetHash(key, this->table.size());
 int step = 1;
 while(!this->table[index].second) {
 if(this->table[index].first == key) return false;
 index += pow(step, 2);
 index %= this->table.size();
 step++;
 }
 this->table[index] = std::make_pair(key, false);
 this->fullness++;
 if(this->fullness*4 >= this->table.size()*3)
 ExtendTable();
 return true;
}
bool HashTable::Get(std::string key) {
 //Checks if element is in hash table
 int index = GetHash(key, this->table.size());
 int step = 1;
 int counter = 0;
 while(counter < this->fullness+1) {
 if(this->table[index].first == key) return true;
 index += pow(step, 2);
 index %= this->table.size();
 step++;
 counter++;
 }
 return false;
}
bool HashTable::Remove(std::string key) {
 //removes hash table
 int index = GetHash(key, this->table.size());
 int step = 0;
 int counter = 0;
 while(counter < this->fullness+1) {
 if(this->table[index].first == key) {
 this->fullness--;
 this->table[index].first = "";
 this->table[index].second = true;
 return true;
 }
 index += pow(step, 2);
 index %= this->table.size();
 step++;
 counter++;
 }
 return false;
}
int main() {
 HashTable table;
 std::ios_base::sync_with_stdio(0),std::cin.tie(0),std::cout.tie(0);
 std::string input = "";
 while(std::cin >> input) {
 if(input == "+") {
 std::cin >> input;
 bool flag = table.Set(input);
 if(flag) std::cout << "OK" << std::endl;
 else std::cout << "FAIL" << std::endl;
 }
 else if(input == "?") {
 std::cin >> input;
 bool flag = table.Get(input);
 if(flag) std::cout << "OK" << std::endl;
 else std::cout << "FAIL" << std::endl;
 }
 else if(input == "-") {
 std::cin >> input;
 bool flag = table.Remove(input);
 if(flag) std::cout << "OK" << std::endl;
 else std::cout << "FAIL" << std::endl;
 }
 }
 return 0;
}

Question 2

The Hash

Your table looks to have a power-of-two size, starting at 8 and then doubling if needed. That's fine, good even, depending on how you use it.

But then this happens:

int HashTable::GetHash(std::string text, int table_size) {
 //Hash function for string. Calculates polynom value using Horner method
 int b = text[0];
 int point = (int)sqrt(table_size); //x value for polynom
 for(int i = 1; i < text.size(); i++) {
 b = (text[i] + b*point) % table_size;
 }
 return b % table_size;
}

The slow square root is a problem of its own (how big of a problem depends on the size of the strings), but a very bad effect happens if table_size is a power of four (which it is half the time): point would be a power of two. So the multiplication (modulo a power of two) just shifts bits out of the top and loses them, deleting bits in a first-in-first-out fashion: the final hash is only affected by the last couple of characters, the bits from the first characters get shifted out. The effect gets worse as the table gets bigger, eventually only the very last character would be part of the hash.

The overall effect on your program is that as the table gets bigger, performance fluctuates between OK (probably) for odd power sizes and Increasingly Bad for even power sizes, getting worse and worse for bigger tables and long strings that share a suffix.

This wouldn't have been an issue for prime size tables, but that comes with a significant downside of its own.

What to use instead: std::hash<std::string> probably, or write a hash that does not suffer from this problem, there are many string hashing algorithms that don't have this issue.

Also b should really be some unsigned type, both to avoid the scary UB nature of signed integer overflow and also the more practical concern of avoiding a negative value as result (as a reminder, % on signed types returns the signed remainder, the result can be negative depending on the inputs). Which leads to:

The Types

A lot of variables and return types here are of type int. Many of them should be something else, such as size_t. Using int results in many unexpected type conversions, for example in index %= this->table.size(); which actually converts index to size_t first, then does the remainder, then converts back to an int again. Having a signed index risks overflowing it if the step gets big, and often costs an explicit sign-extension operation.

The first index, which comes from GetHash, could be negative, which would be bad (indexing the vector at a negative index).

The Quadratic step

You wrote:

new_index += pow(step, 2);

That's a common thing for beginners to write, but unfortunately, that's a floating point square. The resulting code on x64 with Clang 9 and -O2 is:

 xorps xmm0, xmm0
 cvtsi2sd xmm0, r14d
 xorps xmm1, xmm1
 cvtsi2sd xmm1, ebx
 mulsd xmm0, xmm0
 addsd xmm1, xmm0
 cvttsd2si eax, xmm1

A lot of converting and other floating point operations.

Writing it as new_index += step * step; results in:

 mov eax, edi
 imul eax, edi
 add eax, ebx

But it turns out you don't even need this, see further below..

The Modulo

You wrote:

index %= this->table.size();

Which does not use that the table size is a power of two, so for example on x64 with Clang 9 and -O2 again, that results in:

 cdqe
 xor edx, edx
 div rsi
 ; use rdx (the remainder)

A 64-bit div ranges from slow to very slow. The time depends on the processor model and on the values being divided and on whether we're measuring latency or throughput (or a bit of both?), so it's difficult to pin a single number to it, but as a ballpark number let's say it's around 50x as slow as integer addition. Division (and therefore remainder) is a difficult operation, so this issue is not restricted to x64 processors.

Having a power-of-two sized table is perfect to avoid that operation, you can use (and it's a waste of the opportunity to not use this):

index &= this->table.size() - 1;

Unfortunately compilers are not yet so sophisticated that they can discover and track the property of the size being a power of two through the program. Such optimizations do happen locally, if the divisor is obviously a power of two which is much easier for a compiler to discover than a more "global" invariant of your data structure.

Non-termination of `Set`

It's unlikely, but Set could loop forever. What that takes is a probe sequence that visits only filled slots. You might expect the 75% fullness bound to prevent that, but this probe sequence (cumulatively adding i2 modulo a power of two), while not too bad, does not guarantee visiting 75% of the slots. There is a probe sequence that does guarantee that:

index += step;

Doesn't look quadratic? It is! step goes up by 1 every step, so the sequence formed by index has the closed form formula index(i) = initialHash + (i2 + i) / 2: the famous triangular numbers based quadratic probing. That will visit every slot (if necessary, of course escaping from the loop early is encouraged!) with no duplicates, so there would be no accidental infinite loop.

Well this may all look pretty negative, but there are a couple of things I definitely liked in that code: firstly using std::vector for the storage, so this class does not need to concern itself with memory management. And this trick, this->fullness*4 >= this->table.size()*3, rather than multiplying by a floating point number.

Question 3

Wow! Thanks for an amazing code review!

Question 4

By the way, can I replace int point = (int)sqrt(table_size) with int point = 31 or another prime number?

Question 5

@ДанилаМишин yes that would work

user555045 12.4k1 gold badge19 silver badges39 bronze badges · Accepted Answer · 2019-12-06 17:02:33Z

The Hash

Your table looks to have a power-of-two size, starting at 8 and then doubling if needed. That's fine, good even, depending on how you use it.

But then this happens:

int HashTable::GetHash(std::string text, int table_size) {
 //Hash function for string. Calculates polynom value using Horner method
 int b = text[0];
 int point = (int)sqrt(table_size); //x value for polynom
 for(int i = 1; i < text.size(); i++) {
 b = (text[i] + b*point) % table_size;
 }
 return b % table_size;
}

The slow square root is a problem of its own (how big of a problem depends on the size of the strings), but a very bad effect happens if table_size is a power of four (which it is half the time): point would be a power of two. So the multiplication (modulo a power of two) just shifts bits out of the top and loses them, deleting bits in a first-in-first-out fashion: the final hash is only affected by the last couple of characters, the bits from the first characters get shifted out. The effect gets worse as the table gets bigger, eventually only the very last character would be part of the hash.

The overall effect on your program is that as the table gets bigger, performance fluctuates between OK (probably) for odd power sizes and Increasingly Bad for even power sizes, getting worse and worse for bigger tables and long strings that share a suffix.

This wouldn't have been an issue for prime size tables, but that comes with a significant downside of its own.

What to use instead: std::hash<std::string> probably, or write a hash that does not suffer from this problem, there are many string hashing algorithms that don't have this issue.

Also b should really be some unsigned type, both to avoid the scary UB nature of signed integer overflow and also the more practical concern of avoiding a negative value as result (as a reminder, % on signed types returns the signed remainder, the result can be negative depending on the inputs). Which leads to:

The Types

A lot of variables and return types here are of type int. Many of them should be something else, such as size_t. Using int results in many unexpected type conversions, for example in index %= this->table.size(); which actually converts index to size_t first, then does the remainder, then converts back to an int again. Having a signed index risks overflowing it if the step gets big, and often costs an explicit sign-extension operation.

The first index, which comes from GetHash, could be negative, which would be bad (indexing the vector at a negative index).

The Quadratic step

You wrote:

new_index += pow(step, 2);

That's a common thing for beginners to write, but unfortunately, that's a floating point square. The resulting code on x64 with Clang 9 and -O2 is:

 xorps xmm0, xmm0
 cvtsi2sd xmm0, r14d
 xorps xmm1, xmm1
 cvtsi2sd xmm1, ebx
 mulsd xmm0, xmm0
 addsd xmm1, xmm0
 cvttsd2si eax, xmm1

A lot of converting and other floating point operations.

Writing it as new_index += step * step; results in:

 mov eax, edi
 imul eax, edi
 add eax, ebx

But it turns out you don't even need this, see further below..

The Modulo

You wrote:

index %= this->table.size();

Which does not use that the table size is a power of two, so for example on x64 with Clang 9 and -O2 again, that results in:

 cdqe
 xor edx, edx
 div rsi
 ; use rdx (the remainder)

A 64-bit div ranges from slow to very slow. The time depends on the processor model and on the values being divided and on whether we're measuring latency or throughput (or a bit of both?), so it's difficult to pin a single number to it, but as a ballpark number let's say it's around 50x as slow as integer addition. Division (and therefore remainder) is a difficult operation, so this issue is not restricted to x64 processors.

Having a power-of-two sized table is perfect to avoid that operation, you can use (and it's a waste of the opportunity to not use this):

index &= this->table.size() - 1;

Unfortunately compilers are not yet so sophisticated that they can discover and track the property of the size being a power of two through the program. Such optimizations do happen locally, if the divisor is obviously a power of two which is much easier for a compiler to discover than a more "global" invariant of your data structure.

Non-termination of `Set`

It's unlikely, but Set could loop forever. What that takes is a probe sequence that visits only filled slots. You might expect the 75% fullness bound to prevent that, but this probe sequence (cumulatively adding i2 modulo a power of two), while not too bad, does not guarantee visiting 75% of the slots. There is a probe sequence that does guarantee that:

index += step;

Doesn't look quadratic? It is! step goes up by 1 every step, so the sequence formed by index has the closed form formula index(i) = initialHash + (i2 + i) / 2: the famous triangular numbers based quadratic probing. That will visit every slot (if necessary, of course escaping from the loop early is encouraged!) with no duplicates, so there would be no accidental infinite loop.

Well this may all look pretty negative, but there are a couple of things I definitely liked in that code: firstly using std::vector for the storage, so this class does not need to concern itself with memory management. And this trick, this->fullness*4 >= this->table.size()*3, rather than multiplying by a floating point number.

By the way, can I replace int point = (int)sqrt(table_size) with int point = 31 or another prime number?

Stack Exchange Network

Optimizing Hash Table

1 Answer 1

The Hash

The Types

The Quadratic step

The Modulo

Non-termination of `Set`

You must log in to answer this question.

Hot Network Questions

Optimizing Hash Table

1 Answer 1

The Hash

The Types

The Quadratic step

The Modulo

Non-termination of Set

You must log in to answer this question.

Related

Hot Network Questions

Non-termination of `Set`