We have two sets of vectors of positive numbers, $X$ and $Y$ where for $x\in X$ we write $x=(x_1,x_2,\ldots,x_k)$ and similarly for $y\in Y$ we write $y=(y_1,y_2,\ldots,y_k$).
We are given two vectors $l=(l_1,l_2,\ldots,l_k),ドル and $u=(u_1,u_2,\ldots,u_k)$ such that $l_i\le u_i$ for all $i$.
We want to find all pairs $(x,y),ドル $x\in X,ドル and $y\in Y$ such that $l_i\le x_i+y_i\le u_i$
Handwaving a bit, we can do this in a divide and conquer sort of way, separating each set into pieces that are larger and smaller than $l_i/2$ and throwing away the pairs where both are in the smaller half. This gives a recurrence $$T(m,n) = T(m,n/2) + T(m/2, n/2) + c(m+n)$$ where $m=|X|$ and $n=|Y|$. For equal size sets, this gives $T(n,n) = O(n^{1.7}),ドル which is better than quadratic, but less than I would hope for.
A similar question could be asked for three or more sets.
-
1$\begingroup$ That recurrence does not look adequately justified to me. You seem to be assuming that half of the numbers will be $\ge l_i/2$ and half $< l_i/2,ドル but this is not guaranteed (it could be false). Therefore, I don't think your claim of a $O(n^{1.7})$ time bound is valid. $\endgroup$D.W.– D.W. ♦2013年10月09日 04:49:23 +00:00Commented Oct 9, 2013 at 4:49
-
1$\begingroup$ Also, you don't seem to be taking the dependence on $k$ into account, in your recurrence. Merely testing whether $x,y$ are a valid solution takes $O(k)$ time. $\endgroup$D.W.– D.W. ♦2013年10月09日 05:05:10 +00:00Commented Oct 9, 2013 at 5:05
1 Answer 1
One approach would be to solve this using orthogonal range trees. Orthogonal range trees allow range queries in $k$ dimensions.
Store the $n$ values of $Y$ in an orthogonal range tree. The space complexity is $O(n (\lg n)^{k-1})$.
Next, for each $x=(x_1,\dots,x_k) \in X,ドル we're going to search for a matching $y \in Y$. Notice that for $y$ to match this particular $x,ドル we must have $l_i - x_i \le y_i \le u_i - x_i$ for each $i$. This means that $y$ must fall into the $k$-dimensional box $[l_1-x_1,u_1-x_1] \times \cdots \times [l_k-x_k,u_k-x_k]$. So, we want to know if there is any element of $Y$ that falls within this box. That's a $k$-dimension range query, which an orthogonal range tree can handle.
The time complexity is $O((\lg n)^k)$ for each range query. We do $m$ range queries, so the total running time of this algorithm is $O(n (\lg n)^{k-1} + m (\lg n)^k)$. Of course, there is a symmetry between $X$ and $Y,ドル so you can swap $m,n$ if that reduces the running time.
It is possible to reduce the asymptotic running time slightly further by using more sophisticated data structures and algorithms, but I don't know if they will actually help in practice.
Bottom line: A slightly pessimistic bound is that the running time is $O((n+m) (\lg n)^k)$ (but the preceding discussion shows you how to get a slightly tighter bound). If $k$ is small enough this might be better than the $O(nmk)$ running time of the naive ("quadratic") algorithm.
Independently: Here is a heuristic that might help you in practice (even though it might not offer any guarantees in worst-case complexity).
Pick some value $k'$ such that $k'< k$. Now randomly select a subset of $k'$ of the $k$ dimensions, and project everything to those $k'$ dimensions (so we temporarily ignore all of the other $k-k'$ coordinates). That reduces the problem from a $k$-dimensional problem to a $k'$-dimensional problem. In lower dimensions, the problem becomes easier, so you can solve the $k'$-dimensional problem to find all pairs $x\in X, y \in Y$ such that $x+y$ is correct in those particular $k'$ dimensions. Then, you can test each such pair to see whether $x+y$ is correct in all $k$ dimensions. The total running time will be the running time to solve the $k'$-dimensional problem, plus the running time to examine all pairs that survive the first phase. You can then try to choose $k'$ to balance these two contributions to the total running time.
I suspect that if you experiment with this sort of idea, then in practice this might significantly improve performance (at least if you start with a large value of $k$).