Resize the hash table and rehash every element

Question 1

I'm trying to increase my hash table size by two if the table is filled more than half of it. I tried to rehash every single element from the old table and put it back to my new table, then assign my old table as the new table again. What do you guys think of my own created function? I'm practicing for my final exam and only creating this in paper. I haven't tried yet using compiler and so forth.

I'm using quadratic probing to insert the keys, and once more than half of the table is filled, resize the hash table by calling the function resize().

//find index
int HashingTable::hash(int x, int size){
 return x % size;
}
//insert value to the array, but using quadratic probing
void HashingTable::insertQuadratic(int x, int T[]) {
 int index = hash(x, size);
 int i = 1;
 //the quadratic probing
 while (T[index] != empty) {
 index = (index + (i*i)) % size; //increment quadraticly
 i = i + 1;
 }
 T[index] = x;
 numOfItems = numOfItems + 1;
 double loadFactor = numOfItems / size;
 if (loadFactor >= 0.5) {
 resize(T);
 }
}
void HashingTable::resize(int T[]) {
 int newSize = 2 * size;
 int* newTable = new int[newSize];
 for (int i = 0; i < newSize; i++) {
 newTable[i] = empty;
 }
 for (int i = 0; i < newSize; i++) {
 if (T[i] != empty) {
 int index = hash(T[i], newSize);
 int i = 1;
 while (newTable[index] != empty) {
 index = (index + (i*i)) % newSize;
 }
 newTable[index] = T[i];
 }
 }
 T = newTable;
 delete newTable;
}

Question 2

Could you include the header with the class definition?

Question 3

STL

I suppose using a dynamic array is mandatory for the exercise? Otherwise you should use one from the standard library.

If you can use the STL but need a dynamic array, you should use std::fill_n(array, newSize, empty); to initialize your array with empty everywhere.

Modularity

index = (index + (i*i)) % size; should be in a function, not copy pasted.

Memory

This part of the code actually delete the memory you want to use.

T = newTable;
delete newTable;

Another problem is that you allocate memory with new[] and free it with delete. Altogether you should write:

delete[] T;
T = newTable;

Question 4

Its easier to review code if its complete, so please include the class declaration and full definition. However, working from the top I'll make some suggestions.

Parameter lists

If your HashingTable declaration (header file) has a data member T, then member functions don't need it passed in as a parameter, namely

void HashingTable::insertQuadratic(int x, int T[])

could simply be

void HashingTable::insertQuadratic(int x)

similarly

void HashingTable::resize(int T[])

could be simplified as

void HashingTable::resize()

This works because T is already scoped to HashingTable so every member function already has access to T. Then there's the bonus of not having to make a copy of the array pointer each time, which speeds up your code a bit.

Variable Names

More of an aesthetic point than a functional one, butT would be easier for user's of your HashingTable to understand if it were table or the like. Two reasons for this: 1) The name T is typically used as for template parameter types, and 2) T itself doesn't really express what the variable is used for. This means well named variables add the benefit of making your code self-documenting.

Storage Space

Only one decimal place precision is checked for in loadFactor >= 0.5, so a double is more storage space than is necessary. float would work just as well and saves some memory (on most machines).

Memory Management

Your code deletes the block of memory it intends to use for T after its been assigned to T, namely:

T = newTable;
delete newTable;

That will result in undefined behavior once T is accessed again, and it also leaks the original T's memory. So just switch it around:

delete[] T;
T = newTable;

Note the [] in delete[] as well. You need [] because newTable is allocated as an array:

int* newTable = new int[newSize];

so it must be freed as an array withdelete[] (not delete) otherwise newTable will also be a memory leak because only its first element would have actually been deallocated.

Be sure to assign size = newSize; after everything is done copying, otherwise indexing through the changed T will be troublesome, and resize won't actually grow the array the second time its called.

Re-inventing the wheel

If your empty variable is one-byte in length (EG: 0x00 or 0xff), then the loop where the array of integers is set to a specific value:

for (int i = 0; i < newSize; i++) {
 newTable[i] = empty;
}

could be replaced by the more efficient (and easier to code) C standard library function:

memset(newTable, empty, sizeof(int) * newSize);

The first parameter indicates the starting address of the memory block you want to assign values to. The second parameter indicates the value each byte of the memory block will be filled with. The third parameter indicates the number of bytes forward of the starting address to affect. For example, if empty == 0xff then all your signed ints will be -1 thanks to twos-complement. Beyond memset there are functions that allow values longer than one byte to be assigned to each index of an array, as other answers have mentioned.

Copy-Pasta Bugs

Copy-paste is a useful tool for repeated segments of code. However, I always ask myself if the code I'm about to copy-paste would be better off in its own function. One of the goals of writing efficient code is to reduce the amount of repeated code. The HashingTable copy operation in resize:

for (int i = 0; i < newSize; i++) {
 if (T[i] != empty) {
 int index = hash(T[i], newSize);
 int i = 1;
 while (newTable[index] != empty) {
 index = (index + (i*i)) % newSize;
 }
 newTable[index] = T[i];
 }
}

Made me wonder if your insertQuadratic function could be called instead of copy-pasting its while loop. In this case you made the right choice to just inline the loop, but it's something to keep in mind.

The dark side of copy-paste is the potential bugs it introduces. The variable i gets redefined halfway into your copy operation. That will confuse the compiler and throws a monkey wrench into your copy-probing. Namely, which i should the compiler use use here?

index = (index + (i*i)) % newSize;

The one defined at for or the one defined just before the while? The quick-fix is to rename one of the variables, however another thing I ask myself when I'm about to copy-paste is if there are variables I'd have to hunt down and rename like that. If so the code is better off typed from scratch, plus you get to think about your logic a bit more as you type it out.

Indexing

The difference between newSize and size is significant, but the data in that gap is not. Also, going past size will give an index-out-of-bounds/read-access error on T, so:

for (int i = 0; i < newSize; i++)

should be

for (int i = 0; i < size; i++)

Performance

Bitwise operators are significantly more efficient compared to modulo % and other binary/unary operations. For example, since size is class-scoped a second variable int tableMask; could be declared as part of HashingTable. So when a HashingTable is constructed, and after size = newSize; in resize just set

tableMask = size - 1;

Then hash keys can quickly be calculated using the bitwise-and &:

return x & tableMask;

This prevents the indexing from going beyond size automatically (no bounds checking required). The downside is it will cause more collisions on average compared to %, but its something to think about given that you're already handling collisions.

To solve the excessive collision issue just ensure size is always a power of two by using the bitshift-left bitwise operator << and bitshift-right bitwise operator >>:

int power = 0;
while (size >> ++power)
 ;
size = 1 << power;

Bitwise operators are certainly something worth learning.

Question 5

Advice 1

for (int i = 0; i < newSize; i++) {
 if (T[i] != empty) {

Did you intend to iterate only over size entries from T? If so, you need:

for (int i = 0; i < size; i++) {
 if (T[i] != empty) {

Advice 2

T = newTable;
delete newTable;

You deallocate the actual new table array. Also, since you don't delete the table pointed to by T, you leak memory. I think your intention was to write:

delete[] T;
T = newTable;

Advice 3

T is not a good name for a hash table, since people can confuse it with a type parameter. Consider using something like storage_array, storage_table, table, or similar.

Question 6

Did you mean delete[] T?

Stud Stud 7863 silver badges13 bronze badges · Answer 1 · 2016-12-11 08:16:55Z

STL

I suppose using a dynamic array is mandatory for the exercise? Otherwise you should use one from the standard library.

If you can use the STL but need a dynamic array, you should use std::fill_n(array, newSize, empty); to initialize your array with empty everywhere.

Modularity

index = (index + (i*i)) % size; should be in a function, not copy pasted.

Memory

This part of the code actually delete the memory you want to use.

T = newTable;
delete newTable;

Another problem is that you allocate memory with new[] and free it with delete. Altogether you should write:

delete[] T;
T = newTable;

TOM__ TOM__ 1057 bronze badges · Answer 2 · 2016-12-11 10:39:00Z

Its easier to review code if its complete, so please include the class declaration and full definition. However, working from the top I'll make some suggestions.

Parameter lists

If your HashingTable declaration (header file) has a data member T, then member functions don't need it passed in as a parameter, namely

void HashingTable::insertQuadratic(int x, int T[])

could simply be

void HashingTable::insertQuadratic(int x)

similarly

void HashingTable::resize(int T[])

could be simplified as

void HashingTable::resize()

This works because T is already scoped to HashingTable so every member function already has access to T. Then there's the bonus of not having to make a copy of the array pointer each time, which speeds up your code a bit.

Variable Names

More of an aesthetic point than a functional one, butT would be easier for user's of your HashingTable to understand if it were table or the like. Two reasons for this: 1) The name T is typically used as for template parameter types, and 2) T itself doesn't really express what the variable is used for. This means well named variables add the benefit of making your code self-documenting.

Storage Space

Only one decimal place precision is checked for in loadFactor >= 0.5, so a double is more storage space than is necessary. float would work just as well and saves some memory (on most machines).

Memory Management

Your code deletes the block of memory it intends to use for T after its been assigned to T, namely:

T = newTable;
delete newTable;

That will result in undefined behavior once T is accessed again, and it also leaks the original T's memory. So just switch it around:

delete[] T;
T = newTable;

Note the [] in delete[] as well. You need [] because newTable is allocated as an array:

int* newTable = new int[newSize];

so it must be freed as an array withdelete[] (not delete) otherwise newTable will also be a memory leak because only its first element would have actually been deallocated.

Be sure to assign size = newSize; after everything is done copying, otherwise indexing through the changed T will be troublesome, and resize won't actually grow the array the second time its called.

Re-inventing the wheel

If your empty variable is one-byte in length (EG: 0x00 or 0xff), then the loop where the array of integers is set to a specific value:

for (int i = 0; i < newSize; i++) {
 newTable[i] = empty;
}

could be replaced by the more efficient (and easier to code) C standard library function:

memset(newTable, empty, sizeof(int) * newSize);

The first parameter indicates the starting address of the memory block you want to assign values to. The second parameter indicates the value each byte of the memory block will be filled with. The third parameter indicates the number of bytes forward of the starting address to affect. For example, if empty == 0xff then all your signed ints will be -1 thanks to twos-complement. Beyond memset there are functions that allow values longer than one byte to be assigned to each index of an array, as other answers have mentioned.

Copy-Pasta Bugs

Copy-paste is a useful tool for repeated segments of code. However, I always ask myself if the code I'm about to copy-paste would be better off in its own function. One of the goals of writing efficient code is to reduce the amount of repeated code. The HashingTable copy operation in resize:

for (int i = 0; i < newSize; i++) {
 if (T[i] != empty) {
 int index = hash(T[i], newSize);
 int i = 1;
 while (newTable[index] != empty) {
 index = (index + (i*i)) % newSize;
 }
 newTable[index] = T[i];
 }
}

Made me wonder if your insertQuadratic function could be called instead of copy-pasting its while loop. In this case you made the right choice to just inline the loop, but it's something to keep in mind.

The dark side of copy-paste is the potential bugs it introduces. The variable i gets redefined halfway into your copy operation. That will confuse the compiler and throws a monkey wrench into your copy-probing. Namely, which i should the compiler use use here?

index = (index + (i*i)) % newSize;

The one defined at for or the one defined just before the while? The quick-fix is to rename one of the variables, however another thing I ask myself when I'm about to copy-paste is if there are variables I'd have to hunt down and rename like that. If so the code is better off typed from scratch, plus you get to think about your logic a bit more as you type it out.

Indexing

The difference between newSize and size is significant, but the data in that gap is not. Also, going past size will give an index-out-of-bounds/read-access error on T, so:

for (int i = 0; i < newSize; i++)

should be

for (int i = 0; i < size; i++)

Performance

Bitwise operators are significantly more efficient compared to modulo % and other binary/unary operations. For example, since size is class-scoped a second variable int tableMask; could be declared as part of HashingTable. So when a HashingTable is constructed, and after size = newSize; in resize just set

tableMask = size - 1;

Then hash keys can quickly be calculated using the bitwise-and &:

return x & tableMask;

This prevents the indexing from going beyond size automatically (no bounds checking required). The downside is it will cause more collisions on average compared to %, but its something to think about given that you're already handling collisions.

To solve the excessive collision issue just ensure size is always a power of two by using the bitshift-left bitwise operator << and bitshift-right bitwise operator >>:

int power = 0;
while (size >> ++power)
 ;
size = 1 << power;

Bitwise operators are certainly something worth learning.

coderodde coderodde 31.8k15 gold badges77 silver badges202 bronze badges · Answer 3 · 2016-12-11 08:10:06Z

Advice 1

for (int i = 0; i < newSize; i++) {
 if (T[i] != empty) {

Did you intend to iterate only over size entries from T? If so, you need:

for (int i = 0; i < size; i++) {
 if (T[i] != empty) {

Advice 2

T = newTable;
delete newTable;

You deallocate the actual new table array. Also, since you don't delete the table pointed to by T, you leak memory. I think your intention was to write:

delete[] T;
T = newTable;

Advice 3

T is not a good name for a hash table, since people can confuse it with a type parameter. Consider using something like storage_array, storage_table, table, or similar.

1

\$\begingroup\$ Did you mean delete[] T? \$\endgroup\$

TOM__
– TOM__

2016年12月13日 08:26:34 +00:00
Commented Dec 13, 2016 at 8:26

Stack Exchange Network

Resize the hash table and rehash every element

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Resize the hash table and rehash every element

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions