I'm trying to increase my hash table size by two if the table is filled more than half of it. I tried to rehash every single element from the old table and put it back to my new table, then assign my old table as the new table again. What do you guys think of my own created function? I'm practicing for my final exam and only creating this in paper. I haven't tried yet using compiler and so forth.
I'm using quadratic probing to insert the keys, and once more than half of the table is filled, resize the hash table by calling the function resize().
//find index
int HashingTable::hash(int x, int size){
return x % size;
}
//insert value to the array, but using quadratic probing
void HashingTable::insertQuadratic(int x, int T[]) {
int index = hash(x, size);
int i = 1;
//the quadratic probing
while (T[index] != empty) {
index = (index + (i*i)) % size; //increment quadraticly
i = i + 1;
}
T[index] = x;
numOfItems = numOfItems + 1;
double loadFactor = numOfItems / size;
if (loadFactor >= 0.5) {
resize(T);
}
}
void HashingTable::resize(int T[]) {
int newSize = 2 * size;
int* newTable = new int[newSize];
for (int i = 0; i < newSize; i++) {
newTable[i] = empty;
}
for (int i = 0; i < newSize; i++) {
if (T[i] != empty) {
int index = hash(T[i], newSize);
int i = 1;
while (newTable[index] != empty) {
index = (index + (i*i)) % newSize;
}
newTable[index] = T[i];
}
}
T = newTable;
delete newTable;
}
-
4\$\begingroup\$ Could you include the header with the class definition? \$\endgroup\$200_success– 200_success2016年12月11日 07:43:58 +00:00Commented Dec 11, 2016 at 7:43
3 Answers 3
STL
I suppose using a dynamic array is mandatory for the exercise? Otherwise you should use one from the standard library.
If you can use the STL but need a dynamic array, you should use std::fill_n(array, newSize, empty);
to initialize your array with empty
everywhere.
Modularity
index = (index + (i*i)) % size;
should be in a function, not copy pasted.
Memory
This part of the code actually delete the memory you want to use.
T = newTable;
delete newTable;
Another problem is that you allocate memory with new[] and free it with delete. Altogether you should write:
delete[] T;
T = newTable;
Its easier to review code if its complete, so please include the class declaration and full definition. However, working from the top I'll make some suggestions.
Parameter lists
If your HashingTable
declaration (header file) has a data member T
, then member functions don't need it passed in as a parameter, namely
void HashingTable::insertQuadratic(int x, int T[])
could simply be
void HashingTable::insertQuadratic(int x)
similarly
void HashingTable::resize(int T[])
could be simplified as
void HashingTable::resize()
This works because T
is already scoped to HashingTable
so every member function already has access to T
. Then there's the bonus of not having to make a copy of the array pointer each time, which speeds up your code a bit.
Variable Names
More of an aesthetic point than a functional one, butT
would be easier for user's of your HashingTable
to understand if it were table
or the like. Two reasons for this: 1) The name T
is typically used as for template parameter types, and 2) T
itself doesn't really express what the variable is used for. This means well named variables add the benefit of making your code self-documenting.
Storage Space
Only one decimal place precision is checked for in loadFactor >= 0.5
, so a double
is more storage space than is necessary. float
would work just as well and saves some memory (on most machines).
Memory Management
Your code delete
s the block of memory it intends to use for T
after its been assigned to T
, namely:
T = newTable;
delete newTable;
That will result in undefined behavior once T
is accessed again, and it also leaks the original T
's memory. So just switch it around:
delete[] T;
T = newTable;
Note the []
in delete[]
as well. You need []
because newTable
is allocated as an array:
int* newTable = new int[newSize];
so it must be freed as an array withdelete[]
(not delete
) otherwise newTable
will also be a memory leak because only its first element would have actually been deallocated.
Be sure to assign size = newSize;
after everything is done copying, otherwise indexing through the changed T
will be troublesome, and resize
won't actually grow the array the second time its called.
Re-inventing the wheel
If your empty
variable is one-byte in length (EG: 0x00
or 0xff
), then the loop where the array of integers is set to a specific value:
for (int i = 0; i < newSize; i++) {
newTable[i] = empty;
}
could be replaced by the more efficient (and easier to code) C standard library function:
memset(newTable, empty, sizeof(int) * newSize);
The first parameter indicates the starting address of the memory block you want to assign values to. The second parameter indicates the value each byte of the memory block will be filled with. The third parameter indicates the number of bytes forward of the starting address to affect. For example, if empty == 0xff
then all your signed int
s will be -1
thanks to twos-complement. Beyond memset
there are functions that allow values longer than one byte to be assigned to each index of an array, as other answers have mentioned.
Copy-Pasta Bugs
Copy-paste is a useful tool for repeated segments of code. However, I always ask myself if the code I'm about to copy-paste would be better off in its own function. One of the goals of writing efficient code is to reduce the amount of repeated code. The HashingTable
copy operation in resize
:
for (int i = 0; i < newSize; i++) {
if (T[i] != empty) {
int index = hash(T[i], newSize);
int i = 1;
while (newTable[index] != empty) {
index = (index + (i*i)) % newSize;
}
newTable[index] = T[i];
}
}
Made me wonder if your insertQuadratic
function could be called instead of copy-pasting its while
loop. In this case you made the right choice to just inline the loop, but it's something to keep in mind.
The dark side of copy-paste is the potential bugs it introduces. The variable i
gets redefined halfway into your copy operation. That will confuse the compiler and throws a monkey wrench into your copy-probing. Namely, which i
should the compiler use use here?
index = (index + (i*i)) % newSize;
The one defined at for
or the one defined just before the while
? The quick-fix is to rename one of the variables, however another thing I ask myself when I'm about to copy-paste is if there are variables I'd have to hunt down and rename like that. If so the code is better off typed from scratch, plus you get to think about your logic a bit more as you type it out.
Indexing
The difference between newSize
and size
is significant, but the data in that gap is not. Also, going past size
will give an index-out-of-bounds/read-access error on T
, so:
for (int i = 0; i < newSize; i++)
should be
for (int i = 0; i < size; i++)
Performance
Bitwise operators are significantly more efficient compared to modulo %
and other binary/unary operations. For example, since size
is class
-scoped a second variable int tableMask;
could be declared as part of HashingTable
. So when a HashingTable
is constructed, and after size = newSize;
in resize
just set
tableMask = size - 1;
Then hash keys can quickly be calculated using the bitwise-and &
:
return x & tableMask;
This prevents the indexing from going beyond size
automatically (no bounds checking required). The downside is it will cause more collisions on average compared to %
, but its something to think about given that you're already handling collisions.
To solve the excessive collision issue just ensure size
is always a power of two by using the bitshift-left bitwise operator <<
and bitshift-right bitwise operator >>
:
int power = 0;
while (size >> ++power)
;
size = 1 << power;
Bitwise operators are certainly something worth learning.
Advice 1
for (int i = 0; i < newSize; i++) {
if (T[i] != empty) {
Did you intend to iterate only over size
entries from T
? If so, you need:
for (int i = 0; i < size; i++) {
if (T[i] != empty) {
Advice 2
T = newTable;
delete newTable;
You deallocate the actual new table array. Also, since you don't delete the table pointed to by T
, you leak memory. I think your intention was to write:
delete[] T;
T = newTable;
Advice 3
T
is not a good name for a hash table, since people can confuse it with a type parameter. Consider using something like storage_array
, storage_table
, table
, or similar.
-
1\$\begingroup\$ Did you mean
delete[] T
? \$\endgroup\$TOM__– TOM__2016年12月13日 08:26:34 +00:00Commented Dec 13, 2016 at 8:26