A ChainedHashTable data structure uses hashing with chaining to store data as an array, $ \mathtt{t}$, of lists. An integer, $ \mathtt{n}$, keeps track of the total number of items in all lists (see Figure 5.1):
List<T>[] t; int n;
To add an element, $ \mathtt{x}$, to the hash table, we first check if the length of $ \mathtt{t}$ needs to be increased and, if so, we grow $ \mathtt{t}$. With this out of the way we hash $ \mathtt{x}$ to get an integer, $ \mathtt{i}$, in the range $ \{0,\ldots,\ensuremath{\mathtt{t.length}}-1\}$ and we append $ \mathtt{x}$ to the list $ \mathtt{t[i]}$:
boolean add(T x) {
if (find(x) != null) return false;
if (n+1 > t.length) resize();
t[hash(x)].add(x);
n++;
return true;
}
Growing the table, if necessary, involves doubling the length of
$ \mathtt{t}$
and reinserting all elements into the new table. This is exactly the
same strategy used in the implementation of ArrayStack and the same
result applies: The cost of growing is only constant when amortized
over a sequence of insertions (see Lemma 2.1 on
page [*]).
Besides growing, the only other work done when adding $ \mathtt{x}$ to a ChainedHashTable involves appending $ \mathtt{x}$ to the list $ \mathtt{t[hash(x)]}$. For any of the list implementations described in Chapters 2 or 3, this takes only constant time.
To remove an element $ \mathtt{x}$ from the hash table we iterate over the list $ \mathtt{t[hash(x)]}$ until we find $ \mathtt{x}$ so that we can remove it:
T remove(T x) {
Iterator<T> it = t[hash(x)].iterator();
while (it.hasNext()) {
T y = it.next();
if (y.equals(x)) {
it.remove();
n--;
return y;
}
}
return null;
}
This takes
$ O(\ensuremath{\mathtt{n}}_{\ensuremath{\mathtt{hash(x)}}})$ time, where
$ \ensuremath{\mathtt{n}}_{\ensuremath{\mathtt{i}}}$ denotes the length
of the list stored at
$ \mathtt{t[i]}$.
Searching for the element $ \mathtt{x}$ in a hash table is similar. We perform a linear search on the list $ \mathtt{t[hash(x)]}$:
T find(Object x) {
for (T y : t[hash(x)])
if (y.equals(x))
return y;
return null;
}
Again, this takes time proportional to the length of the list
$ \mathtt{t[hash(x)]}$.
The performance of a hash table depends critically on the choice of the hash function. A good hash function will spread the elements evenly among the $ \mathtt{t.length}$ lists, so that the expected size of the list $ \mathtt{t[hash(x)]}$ is $ O(\ensuremath{\mathtt{n}}/\ensuremath{\mathtt{t.length)}} = O(1)$. On the other hand, a bad hash function will hash all values (including $ \mathtt{x}$) to the same table location, in which case the size of the list $ \mathtt{t[hash(x)]}$ will be $ \mathtt{n}$. In the next section we describe a good hash function.
Multiplicative hashing is an efficient method of generating hash values based on modular arithmetic (discussed in Section 2.3) and integer division. It uses the $ \ddiv $ operator, which calculates the integral part of a quotient, while discarding the remainder. Formally, for any integers $ a\ge 0$ and $ b\ge 1$, $ a\ddiv b = \lfloor a/b\rfloor$.
In multiplicative hashing, we use a hash table of size $ 2^{\ensuremath{\mathtt{d}}}$ for some integer $ \mathtt{d}$ (called the dimension). The formula for hashing an integer $ \ensuremath{\mathtt{x}}\in\{0,\ldots,2^{\ensuremath{\mathtt{w}}}-1\}$ is
Here, $ \mathtt{z}$ is a randomly chosen odd integer in $ \{1,\ldots,2^{\ensuremath{\mathtt{w}}}-1\}$. This hash function can be realized very efficiently by observing that, by default, operations on integers are already done modulo $ 2^{\ensuremath{\mathtt{w}}}$ where $ \ensuremath{\mathtt{w}}$ is the number of bits in an integer. (See Figure 5.2.) Furthermore, integer division by [画像:$ 2^{\ensuremath{\mathtt{w}}-\ensuremath{\mathtt{d}}}$] is equivalent to dropping the rightmost $ \ensuremath{\mathtt{w}}-\ensuremath{\mathtt{d}}$ bits in a binary representation (which is implemented by shifting the bits right by $ \ensuremath{\mathtt{w}}-\ensuremath{\mathtt{d}}$). In this way, the code that implements the above formula is simpler than the formula itself:
int hash(Object x) {
return (z * x.hashCode()) >>> (w-d);
}
The following lemma, whose proof is deferred until later in this section, shows that multiplicative hashing does a good job of avoiding collisions:
With Lemma 5.1, the performance of $ \mathtt{remove(x)}$, and $ \mathtt{find(x)}$ are easy to analyze:
Now, we want to prove Lemma 5.1, but first we need a result from number theory. In the following proof, we use the notation $ (b_r,\ldots,b_0)_2$ to denote [画像:$ \sum_{i=0}^r b_i2^i$], where each $ b_i$ is a bit, either 0 or 1. In other words, $ (b_r,\ldots,b_0)_2$ is the integer whose binary representation is given by $ b_r,\ldots,b_0$. We use $ \star$ to denote a bit of unknown value.
Suppose, for the sake of contradiction, that there are two such values $ \mathtt{z}$ and $ \mathtt{z'}$, with $ \ensuremath{\mathtt{z}}>\ensuremath{\mathtt{z}}'$. Then
Furthermore $ k\neq 0$ since $ q\neq 0$ and $ \ensuremath{\mathtt{z}}-\ensuremath{\mathtt{z}}'\neq 0$. Since $ q$ is odd, it has no trailing 0's in its binary representation:
The utility of Lemma 5.3 comes from the following observation: If $ \mathtt{z}$ is chosen uniformly at random from $ S$, then $ \mathtt{zt}$ is uniformly distributed over $ S$. In the following proof, it helps to think of the binary representation of $ \mathtt{z}$, which consists of $ \ensuremath{\mathtt{w}}-1$ random bits followed by a 1.
Let $ q$ be the unique odd integer such that $ (\ensuremath{\mathtt{x}}-\ensuremath{\mathtt{y}})\bmod 2^{\ensuremath{\mathtt{w}}}=q2^r$ for some integer $ r\ge 0$. By Lemma 5.3, the binary representation of $ \ensuremath{\mathtt{z}}q\bmod 2^{\ensuremath{\mathtt{w}}}$ has $ \ensuremath{\mathtt{w}}-1$ random bits, followed by a 1:
Therefore, the binary representation of $ \ensuremath{\mathtt{z}}(\ensuremath{\mathtt{x}}-\ensuremath{\mathtt{y}})\bmod ... ...emath{\mathtt{w}}}=\ensuremath{\mathtt{z}}q2^r\bmod 2^{\ensuremath{\mathtt{w}}}$ has $ \ensuremath{\mathtt{w}}-r-1$ random bits, followed by a 1, followed by $ r$ 0's: We can now finish the proof: If $ r > \ensuremath{\mathtt{w}}-\ensuremath{\mathtt{d}}$, then the $ \mathtt{d}$ higher order bits of $ \ensuremath{\mathtt{z}}(\ensuremath{\mathtt{x}}-\ensuremath{\mathtt{y}})\bmod 2^{\ensuremath{\mathtt{w}}}$ contain both 0's and 1's, so the probability that $ \ensuremath{\mathtt{z}}(\ensuremath{\mathtt{x}}-\ensuremath{\mathtt{y}})\bmod 2^{\ensuremath{\mathtt{w}}}$ looks like (5.2) or (5.3) is 0. If $ \ensuremath{\mathtt{r}}=\ensuremath{\mathtt{w}}-\ensuremath{\mathtt{d}}$, then the probability of looking like (5.2) is 0, but the probability of looking like (5.3) is [画像:$ 1/2^{\ensuremath{\mathtt{d}}-1}=2/2^{\ensuremath{\mathtt{d}}}$] (since we must have $ b_1,\ldots,b_{d-1}=1,\ldots,1$). If $ r < \ensuremath{\mathtt{w}}-\ensuremath{\mathtt{d}}$ then we must have $ b_{\ensuremath{\mathtt{w}}-r-1},\ldots,b_{\ensuremath{\mathtt{w}}-r-\ensuremath{\mathtt{d}}}=0,\ldots,0$ or $ b_{\ensuremath{\mathtt{w}}-r-1},\ldots,b_{\ensuremath{\mathtt{w}}-r-\ensuremath{\mathtt{d}}}=1,\ldots,1$. The probability of each of these cases is [画像:$ 1/2^{\ensuremath{\mathtt{d}}}$] and they are mutually exclusive, so the probability of either of these cases is [画像:$ 2/2^{\ensuremath{\mathtt{d}}}$]. This completes the proof. $ \qedsymbol$
The following theorem summarizes the performance of the ChainedHashTable data structure:
Furthermore, beginning with an empty ChainedHashTable, any sequence of $ m$ $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ operations results in a total of $ O(m)$ time spent during all calls to $ \mathtt{grow()}$.