I'm stuck with the following problem by Skiena (The Algorithm Design Manual, p. 106):
Problem: Give an efficient algorithm to determine whether two sets (of size $m$ and $n$, respectively) are disjoint. Analyze the worst-case complexity in terms of $m$ and $n$, considering the case where $m$ is substantially smaller than $n$.
Solution: First sort the big set – The big set can be sorted in $O(n\log n)$ time. We can now do a binary search with each of the m elements in the second, looking to see if it exists in the big set. The total time will be $O((n + m)\log n)$.
My question: Why is the running time $O((n + m)\log n)$? I would have to perform m binary searches in total (one binary search for every element in $m$) - as one binary search has a running time of $O(\log n)$, I would have to perform $m \cdot \log n$ operations in the worst case. How does this - if at all - translate into $O((n + m)\log n)$?
-
1$\begingroup$ It takes $O(n\log n)$ to sort the bigger set, and to $O(m\log n)$ to do the binary search, for a total of $O(n\log n + m\log n) = O((m+n)\log n)$. Note that you can do even better if you sort the smaller set instead, replacing $\log n$ with $\log m$. $\endgroup$Yuval Filmus– Yuval Filmus2020年05月27日 18:07:17 +00:00Commented May 27, 2020 at 18:07
-
$\begingroup$ Thanks, I got it know! $\endgroup$Moritz Loritz– Moritz Loritz2020年05月28日 09:44:02 +00:00Commented May 28, 2020 at 9:44
1 Answer 1
Sorting the big set takes time $O(n\log n)$. You perform $m$ binary searches, each taking $O(\log n)$, for a total of $O(m\log n)$ time spent on binary search. The total running time of the algorithm is thus $$ O(n\log n + m\log n) = O((n + m)\log n) = O(n\log n), $$ assuming $m \leq n$.
Note that it is even better to sort the smaller list, since then the running time improves to $O(n\log m)$.