The following question is from MIT-OCW 6.006, Spring-2008, Problem-Set 2, Q-3.c.
Suppose you have a hash table where the load-factor $\alpha$ is related to the number $n$ of elements in the table by the following formula: $$ \alpha = 1 − \frac{1}{\log n} $$ If you resolve collisions by chaining, what is the expected time for an unsuccessful search in terms of n? (assume simple uniform hashing)
My reasoning: the expected-time for an unsuccessful search would be $\Theta(1 + E[X])$, where $E[X]$ is the expected number of elements in any slot after the insertion of n-elements.
Now, by definition, we have $\alpha = \frac{n}{m}$, which combined with the above formula we get $\frac{1}{m} = \frac{1}{n}\Bigl(1 - \frac{1}{\log n}\Bigr)$. And based on simple-uniform hashing, this is the probability with which the $(n+1)^{th}$ element will hash to any particular slot.
Let $X_{i}$ be an indicator random-variable, where $i = 1, 2, ..., n$, which indicates that the $i^{th}$ element added to the hash-table (which at this piont contains $(i-1)$ elements) hashes to a particular slot, i.e. $$ X_{i} = \begin{cases} 1, & \frac{1}{m} = \frac{1}{(i-1)}\Bigl(1 - \frac{1}{\log (i - 1)}\Bigr) \\ 0, & 1 - \frac{1}{m} \end{cases} $$ Then, the total number of elements $X$ in any particular slot , is given by $$ X = \sum_{i=1}^{n}X_{i} $$ And the expected number of elements in any particular slot becomes: $$ \begin{align} E[X] & = \sum_{i=1}^{n} E[X_{i}] \\ & = \sum_{i=1}^{n} \biggl\{\frac{1}{(i-1)}\Bigl(1 - \frac{1}{\log (i - 1)}\Bigr)\biggr\} \end{align} $$
But the $|E[X_i]| = \infty$, when $i = 1, 2$. This is where I got stuck!
So, is my reasoning correct? And if not, how to solve this problem?
1 Answer 1
Didn’t read your solution.
You have n slots, alpha x n are occupied, (1-alpha) n are empty. Every hash code defines a sequence of indices that will be checked until either the item or an empty slot is found. Let E(Alpha) be the expected number of slots visited, then E(alpha) = 1 x (1-alpha) + alpha (1 + E(alpha)) or E(alpha) = 1 / (1-alpha). With your alpha = 1 - 1/log n, E = log n.
PS. I have never seen an implementation that doesn’t store all items in the table. In general it’s better to generate locations based on the hash code, and not each location based on the previous location, so items with different hash codes can hit the same first slot but then follow different paths.
There is a complication when items can be deleted. Deleted slots cannot be changed to "empty" but must be changed to "deleted" to stop chains from being broken. So load factor would take used and deleted slots into account.
-
$\begingroup$ The solution you gave deals with open-addressing, but the question above is only concerned with chaining. $\endgroup$x.projekt– x.projekt2021年09月25日 14:36:45 +00:00Commented Sep 25, 2021 at 14:36
-
$\begingroup$ As I said, never seen chaining used outside theory. $\endgroup$gnasher729– gnasher7292021年09月26日 15:17:48 +00:00Commented Sep 26, 2021 at 15:17
-
$\begingroup$ I agree, but this is a theoretical question. $\endgroup$x.projekt– x.projekt2021年09月28日 04:41:17 +00:00Commented Sep 28, 2021 at 4:41
-
$\begingroup$ Well, if it is never ever used, then the theoretical question is quite useless, isn’t it? $\endgroup$gnasher729– gnasher7292024年02月12日 18:27:15 +00:00Commented Feb 12, 2024 at 18:27
Explore related questions
See similar questions with these tags.