Considering two sets $A, B$ containing some $p$-dimensional points $x \in \mathbb{R}^p$. Let $d_x^S = \min_{x' \in S \setminus \{x\}} \lVert \mathbf{x} - \mathbf{x'} \rVert$ denote the Euclidean distance from $x$ to its nearest point in $S$. We have a very simple algorithm:
- $\forall x \in A,ドル if $d_x^A > d_x^B$ then move $x$ from $A$ to $B$.
- $\forall x \in B,ドル if $d_x^A < d_x^B$ then move $x$ from $B$ to $A$.
- Repeat (1) and (2) until convergence
Convergence is when there is no more $x \in A$ such that $d_x^A > d_x^B,ドル and there is no more $x \in B$ such that $d_x^A < d_x^B$.
How could I figure out which function does this algorithm minimize or maximize at each iteration ? The function $\Phi(A)+\Phi(B) = \sum_{x \in A} d_x^A + \sum_{x \in B} d_x^B$ does not seem to decrease at each iteration.
Note: another version of this algorithm is when we define $d_x^S$ as the mean distance from $x$ to its $k$ nearest points in $S,ドル instead of the distance to its nearest point in $S$. I don't know if $k > 1$ would make the proof more complicated or not.
-
2$\begingroup$ So the question is: why does the algorithm converge? $\endgroup$Yuval Filmus– Yuval Filmus2013年11月26日 09:38:28 +00:00Commented Nov 26, 2013 at 9:38
-
1$\begingroup$ (Or rather: why are there no cycles.) Note that $A = \emptyset$ and $B = \emptyset$ are always local optima, so the algorithm probably doesn't minimize any global objective function. At most you stop at some local minimum. $\endgroup$Yuval Filmus– Yuval Filmus2013年11月26日 09:49:26 +00:00Commented Nov 26, 2013 at 9:49
-
$\begingroup$ @YuvalFilmus yes the question is why are there no cycles, how to prove that. $\endgroup$shn– shn2013年11月26日 14:31:00 +00:00Commented Nov 26, 2013 at 14:31
1 Answer 1
It takes at most $|A \cup B|$ iterations until convergence. Consider the directed graph $D(V,E)$ with a vertex $v \in V$ for each point $p_v \in (A \cup B)$ and a directed edge $(u,v) \in E$ such that $p_v, v \neq u$ is the nearest point to $p_u$ (connect each point to its nearest neighbour point).
If points $p_u$ and $p_v$ are each other's nearest then $(u,v),(v,u) \in E$. On the other hand, there is no cycle consisting of more than two edges. For contradiction assume the corresponding points in such a cycle as $p_0,p_1,...,p_k,p_{0}$. Since each $p_{((i+1) \bmod k)}$ is the nearest point to $p_i$ the distances decrease as we go along the cycle. This means that $p_k$ is closer to $p_0$ than $p_1$ and by construction this is not possible. A cycle of length three or more could cause an infinite chain of label changes (switching between labels $A$ and $B$).
The out degree of vertices in $D$ is exactly one, and in any path the last vertex have an out edge to its previous vertex in that path (because they are mutually each other's nearest). So the last vertex change label at most once, in the first iteration. Then its label propagates backward along all the path ending at this vertex until all the vertices in those paths receive this label.
Therefore, at each iteration, at least one vertex in any path gets its final label. So the number of iterations is the length of the longest path in $D$ which is at most $|V|-1$.
-
$\begingroup$ Do this reasoning still exactly the same if we consider $d_x^S$ to be the mean distance from $x$ to its $k$ nearest points in $S$ ? That is $x$ is moved from $A$ to $B$ if the distance from $x$ to its $k$ nearest points in $A$ is higher than the distance to its $k$ nearest points in $B$. Actually, in the case that you described (and that I mentionel in my question) $k$ is 1ドル,ドル and at each iteration a given point $x$ takes the label of its nearest point (linked to it by an edge), so what would this situation becomes if $k > 1$ ? Would that reasoning be the same ? Thank you. $\endgroup$shn– shn2013年11月26日 16:25:24 +00:00Commented Nov 26, 2013 at 16:25
-
$\begingroup$ In this case we can have cycles. Consider four points on corners of a square with edge length 5. For $k=2$ this instance creates a cycle. It's possible to alter the four distances to be unequal and still the cycle remains. Even if it was the $k$th nearest point (without "mean") in the problem then this bad example still holds. $\endgroup$Parham– Parham2013年11月26日 19:14:57 +00:00Commented Nov 26, 2013 at 19:14
-
$\begingroup$ If you have 4 points, you cannot compute the mean distance from a given point to the 2 nearest points of each set because there is no sufficient number of points. Tale two points labelled with A and two points labelled with B: if you take one point x from A, then there is only one nearest point to x in A. K should be chosen > min(|A|,|B|). Moreover, you cannot give any example where the algorithm does not terminate even for $k > 1$. If you ave a conter example please give it (I tried but did not found). $\endgroup$shn– shn2013年11月26日 19:45:16 +00:00Commented Nov 26, 2013 at 19:45
-
$\begingroup$ I didn't take into account the labels! But you can add two new points far from these four. Does this work? $\endgroup$Parham– Parham2013年11月26日 20:12:18 +00:00Commented Nov 26, 2013 at 20:12
-
1$\begingroup$ Yes it does works for $k > 1$. I didn't found manually any counter example where the algorithm can stuck inside an infinite loop. I also confirmed that experimentally (by running it for a long time on randomly generated points). It only remains for me to prove that it terminates, but I can't find which function or quantity is minimised at each iteration. $\endgroup$shn– shn2013年11月26日 20:31:14 +00:00Commented Nov 26, 2013 at 20:31
Explore related questions
See similar questions with these tags.