I was going throught the text Introduction to Algorithm by Cormen et. al. where I came across the two alternative definitions of Universal Class of Hash Function.
The versions are as follows:
Definition 1: Let $\mathscr{H}$ be a finite collection of hash functions that map a given universe $U$ of keys into the range ${0,1,...,m-1}$. Such a collection is said to be universal if for each pair of distinct keys $k,l\in U$, the number of hash functions $h\in\mathscr{H}$ for which $h(k)= h(l)$ is at most $|\mathscr{H}|/m$.
Now in the above definition the authors are using "for each pair of distinct keys $k,l\in U$" to mean that probably we do not leave out any possible key pair for which collision occurs.
Definition 2: With a hash function randomly chosen from $\mathscr{H}$, the chance of a collision between distinct keys $k$ and $l$ is no more than the chance 1ドル/m$ which is the chance of a collision if $h(k)$ and $h(l)$ were randomly and independently chosen from the set $\{0,1,...,m-1\}$.
Through-out the rest of the book the author uses the second definition. I felt like finding the equivalence of the two definitions.
Definition 1 $\implies$ Definition 2
Definition 1ドル$ says that for a pair of distinct keys says $k,l$ for the set $A=\{h \in \mathscr{H} : h(k)=h(l)\}$ ,
$$|A|\leqslant|\mathscr{H}|/m \tag 1$$
Now if we chose a hash function $h\in \mathscr{H}$ at random then for the said pair of distinct keys $k,l$ we have,
$$ Pr\{h(k)=h(l)\} = \frac{\text{no. of functions in A}}{\text{no. of functions in $\mathscr{H}$}}=\frac{|A|}{|\mathscr{H}|} \leqslant\frac{1}{m} \quad\text{using (1)}$$
I could not quite find a way to show that: $$\text{Definition 2 $\implies$ Definition 1}$$
Or the way I proved the first implication actually proves the equivalent...
Previously I made an attempt but it was wrong as it was pointed out and so that wrong portion has been edited out.
I had found two similar questions 1 and 2 but it did not quite seem to answer my query.
-
1$\begingroup$ Your $X_i$'s and $X$ are constants, since $h_i$ is a fixed function, so there is no random choice going on. Try starting with definition 2, and explicitly write what is the probability of collision. $\endgroup$Ariel– Ariel2020年06月26日 08:10:38 +00:00Commented Jun 26, 2020 at 8:10
-
$\begingroup$ @Ariel Thanks for pointing out my flaw. I am trying to rectify it.. $\endgroup$Abhishek Ghosh– Abhishek Ghosh2020年06月26日 08:43:30 +00:00Commented Jun 26, 2020 at 8:43
-
$\begingroup$ @Ariel Can you please help me out . I seem to sort have got stuck ... Using (1) where $h$ is randomly chosen how to get the number hash functions where collisions occurs. That $h$ is randomly chosen is fine but I am unable to make out any thing from it. Can you please give me some hint or a very short proof strategy... $\endgroup$Abhishek Ghosh– Abhishek Ghosh2020年06月26日 09:27:15 +00:00Commented Jun 26, 2020 at 9:27
-
$\begingroup$ @Ariel please check it now and comment about the correct-ness. Could the other way implication be simply found by multiplying the probability with $|\mathscr{H}|$ and by the definition of the classical probability we shall get the number of functions in $A$. $\endgroup$Abhishek Ghosh– Abhishek Ghosh2020年06月26日 14:53:47 +00:00Commented Jun 26, 2020 at 14:53
-
1$\begingroup$ The probability is $p=\frac{|A|}{|\mathcal{H}|},ドル so $|A|\le\frac{|\mathcal{H}|}{m}\iff p\le\frac{1}{m}$. $\endgroup$Ariel– Ariel2020年06月26日 15:27:16 +00:00Commented Jun 26, 2020 at 15:27
1 Answer 1
Both definitions are equivalent, and I think you have almost specified the reason behind it.
The below equivalence work in both ways.
From Definition 2:
"the chance of a collision between distinct keys $k$ and $l$" $\le$ 1ドル/m$
$\Leftrightarrow$
{note that the "chance" here is about the process of picking $h$ from the class $\mathscr{H}$}
(Number of functions in the class where $k$ and $l$ collide) $/$ (Total functions in the class) $\le$ 1ドル/m$
$|A|/|\mathscr{H}| \le 1/m$
$\Leftrightarrow$
$|A| \le |\mathscr{H}|/m$
(which is the wording used in Definition 1)