1
$\begingroup$

I'm having troubles understanding two things from some notes about Uniform Hashing. Here's the copy-pasted part of the notes:

Let us first argue by a counting argument why the uniformity property, we required to good hash functions, is computationally hard to guarantee. Recall that we are interested in hash functions which map keys in $U$ to integers in $\{0, 1, ..., m-1\}$. The total number of such hash functions is $m^{|U|}$, given that each key among the $|U|$ ones can be mapped into $m$ slots of the hash table. In order to guarantee uniform distribution of the keys and independence among them, our hash function should be anyone of those ones. But, in this case, its representation would need $\Omega(log_2 m^{|U|}) = \Omega(|U| log_2 m)$ bits, which is really too much in terms of space occupancy and in the terms of computing time (i.e. it would take at least $\Omega(\frac{|U|log_2 m}{log_2 |U|})$ time to just read the hash encoding).

The part I put in bold is the first thing is confusing me.

Why the function should be any one of those? Shouldn't you avoid a good part of them, like the ones sending every element from the universe $U$ into the same number and thus not distributing the elements?

The second thing is the last "$\Omega$". Why would it take $\Omega(\frac{|U|log_2 m}{log_2 |U|})$ time just to read the hash encoding?

The numerator is the number of bits needed to index every hash function in the space of such functions, the denominator is the size in bits of a key. Why this ratio gives a lower bound on the time needed to read the encoding? And what hash encoding?

asked Jun 24, 2020 at 16:50
$\endgroup$

1 Answer 1

1
$\begingroup$

Suppose that we sample a hash function $h$ at random from a collection $\mathcal{H}\subseteq [m]^U$ (we denote by $[m]$ the set $\{0,1,...,m-1\}$). The best possible thing we can require from this choice is that $\left(h(x_1),h(x_2),....,h(x_{|u|})\right)$ are jointly uniformly distributed over $[m]^U$, where $U=\{x_1,...,x_{|U|}\}$ (this in turn implies independence). Fix some $h_0\in [m]^U$, then under the previous requirement we have:

$\Pr\limits_{h\sim\mathcal{H}}[h=h_0]=\Pr\limits_{h\sim\mathcal{H}}\left[h(x_1)=h_0(x_1)\land...\land h(x_{|U|})=h_0(x_{|U|})\right]=\frac{1}{m^{|U|}}$.

It follows that $h$ is uniformly distributed over all possible functions in $[m]^U$ (in any other case we will fail either in uniformity or in independence). This is what is meant by "$h$ should be anyone of these functions".

answered Jun 24, 2020 at 19:38
$\endgroup$
2
  • $\begingroup$ The thing I don't get is why are we talking about the time it takes to read a function and not the time it takes to compute it $\endgroup$ Commented Jun 25, 2020 at 15:02
  • $\begingroup$ I'm removing the second part of my answer, as it's obviously false (encoding of an element in $U$ requires $\log U$ bits, but repeatedly asking for $h(x)$ for all $x\in U$ where each value requires time $T$ to compute takes $|U|T$ time and not $T\log |U|$. Since we can generate a truly random $h$ by sampling a uniform binary string of length $U\log m$ where $h(x_i)$ is given by the $i$'th block of size $\log m,ドル which makes reading $h(x)$ easy, I'm not sure what is meant here. Perhaps you can link the full text to clarify. $\endgroup$ Commented Jun 25, 2020 at 16:24

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.