Remove a key from a hash table efficiently

Question 1

I am trying to remove a key from a hash table, which is built using linear probing. Removing a key works fine but I know I need to rehash a portion of the table after the removal but my method down below is not quite efficient. It basically rehashes the whole table that is unnecessary. Is there anyway that I could improve the performance while rehashing.

bool remove(int x, int H[], int m)
{
 int hashVal = hash(x, m);
 for (int i = 0; H[(hashVal + i) % m] != EMPTY; i++)
 {
 if (H[(hashVal + i) % m] == x)
 {
 H[(hashVal + i) % m] = EMPTY;
 //Rehash the table
 //I think the problem is here. This for loop condition can be better
 for (int k = 1; H[(hashVal + i + k) % m] != EMPTY ; k++)
 {
 int keyShiftUp = H[(hashVal + i + k) % m];
 H[(hashVal + i + k) % m] = EMPTY;
 insertLP(keyShiftUp, H, m);
 cout << "keyShiftUp: " << keyShiftUp << ends;
 }
 return true;
 }
 }
return false;
}

Question 2

Welcome to Code Review! Good job on your first question.

Question 3

C++, not C

First things first, I note that remove is a free function. But that's not how we would do something like this in C++. We would prefer to have a type HashTable in which we'd write:

class HashTable {
public:
 ...
 bool remove(int key);
 ...
private:
 int hash(int );
 std::vector<int> table; // vector is better than int*
 // table is a better name than H
};

Your naming also could use some work - what is x and what is m? I'm guessing m is the capacity and x is the key that we're erasing, but where are the values? We just have int H[], so is this more like a hash set than a hash table? It's unclear.

Looping and separation of concerns

The hash function should just do the hashing (and thus shouldn't take in the capacity). The loop itself should loop over slots, not indices - this will localize your modulo operation so it doesn't creep in everywhere:

int next(int i) {
 return (i+1) % table.size();
}
bool remove(int key) {
 int i = hash(key) % table.size(); 
 if (table[i] == EMPTY) { // break early
 return false;
 }
 for (; table[i] != EMPTY; i = next(i)) {
 if (table[i] == key) {
 table[i] = EMPTY;
 for (int k = next(i); table[k] != EMPTY; k = next(k)) {
 ...
 }
 return true;
 }
 }
 return false;
}

Once you reorient looping over additive indices to looping over actual indices, the inner body becomes more straightforward too:

int old_key = table[k];
table[k] = EMPTY;
insert(old_key);

Find the slot

Your i loop is all about finding the slot where the key is at. The body of that loop isn't really a loop, it's a find-type algorithm. So let's rewrite it like one:

bool remove(int key) {
 int i = hash(key) % table.size(); 
 for (; table[i] != key; i = next(i)) {
 if (table[i] == EMPTY) { // break early
 return false;
 }
 }
 // now, table[i] == key
 table[i] = EMPTY;
 for (int k = next(i); table[k] != EMPTY; k = next(k)) {
 int old_key = table[k];
 table[k] = EMPTY;
 insert(old_key);
 }
 return true;
}

EMPTY

In addition to not really being a hash table, you're seemingly limiting the possible valid keys to exclude something like 0 or -1 or whatever the value of EMPTY is. May be worth separating the key from the empty concept.

Barry Barry 18.6k1 gold badge40 silver badges92 bronze badges · Answer 1 · 2015-12-06 16:58:35Z

C++, not C

First things first, I note that remove is a free function. But that's not how we would do something like this in C++. We would prefer to have a type HashTable in which we'd write:

class HashTable {
public:
 ...
 bool remove(int key);
 ...
private:
 int hash(int );
 std::vector<int> table; // vector is better than int*
 // table is a better name than H
};

Your naming also could use some work - what is x and what is m? I'm guessing m is the capacity and x is the key that we're erasing, but where are the values? We just have int H[], so is this more like a hash set than a hash table? It's unclear.

Looping and separation of concerns

The hash function should just do the hashing (and thus shouldn't take in the capacity). The loop itself should loop over slots, not indices - this will localize your modulo operation so it doesn't creep in everywhere:

int next(int i) {
 return (i+1) % table.size();
}
bool remove(int key) {
 int i = hash(key) % table.size(); 
 if (table[i] == EMPTY) { // break early
 return false;
 }
 for (; table[i] != EMPTY; i = next(i)) {
 if (table[i] == key) {
 table[i] = EMPTY;
 for (int k = next(i); table[k] != EMPTY; k = next(k)) {
 ...
 }
 return true;
 }
 }
 return false;
}

Once you reorient looping over additive indices to looping over actual indices, the inner body becomes more straightforward too:

int old_key = table[k];
table[k] = EMPTY;
insert(old_key);

Find the slot

Your i loop is all about finding the slot where the key is at. The body of that loop isn't really a loop, it's a find-type algorithm. So let's rewrite it like one:

bool remove(int key) {
 int i = hash(key) % table.size(); 
 for (; table[i] != key; i = next(i)) {
 if (table[i] == EMPTY) { // break early
 return false;
 }
 }
 // now, table[i] == key
 table[i] = EMPTY;
 for (int k = next(i); table[k] != EMPTY; k = next(k)) {
 int old_key = table[k];
 table[k] = EMPTY;
 insert(old_key);
 }
 return true;
}

EMPTY

In addition to not really being a hash table, you're seemingly limiting the possible valid keys to exclude something like 0 or -1 or whatever the value of EMPTY is. May be worth separating the key from the empty concept.

Stack Exchange Network

Remove a key from a hash table efficiently

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Remove a key from a hash table efficiently

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions