I am seeking help understanding Floyd's cycle detection algorithm. I have gone through the explanation on wikipedia (http://en.wikipedia.org/wiki/Cycle_detection#Tortoise_and_hare)
I can see how the algorithm detects cycle in O(n) time. However, I am unable to visualise the fact that once the tortoise and hare pointers meet for the first time, the start of the cycle can be determined by moving tortoise pointer back to start and then moving both tortoise and hare one step at a time. The point where they first meet is the start of the cycle.
Can someone help by providing an explanation, hopefully different from the one on wikipedia, as I am unable to understand/visualise it?
8 Answers 8
You can refer to "Detecting start of a loop in singly linked list", here's an excerpt:
Distance travelled by slowPointer
before meeting $= x+y$
Distance travelled by fastPointer
before meeting $=(x + y + z) + y = x + 2y + z$
Since fastPointer
travels with double the speed of slowPointer
, and time is constant for both when both pointers reach the meeting point. So by using simple speed, time and distance relation (slowPointer
traveled half the distance):
\begin{align*} 2*\operatorname{dist}(\text{slowPointer}) &= \operatorname{dist}(\text{fastPointer})\\ 2(x+y) &= x+2y+z\\ 2x+2y &= x+2y+z\\ x &= z \end{align*}
Hence by moving slowPointer
to start of linked list, and making both slowPointer
and fastPointer
to move one node at a time, they both have same distance to cover.
They will reach at the point where the loop starts in the linked list.
-
13$\begingroup$ Here you have made an assumption that they will meet after one rotation. There can be cases (where cycle is small) where they might meet after certain no. of rotations. $\endgroup$ Commented Jul 19, 2018 at 9:34
-
2$\begingroup$ @JotWaraich the image is not representative of all cases; the logic however still holds $\endgroup$– denis631Commented Dec 3, 2018 at 18:51
-
7$\begingroup$ this is the most straight forward answer about this algorithm in the whole Internet $\endgroup$ Commented Feb 16, 2019 at 15:20
-
5$\begingroup$ I'm not very sure if this proof is correct. Distance traveled by slow pointer should be (x + (y + z) * α + y) and the distance traveled by the fast pointer should be (x + (y + z) * β + y) where α and β are the number of cycles traversed by the slow and fast pointer. The proof seems to fail with this addition, $\endgroup$ Commented Dec 22, 2019 at 8:44
-
$\begingroup$ @hashcode55 yes i agree with you this isnt clear to me $\endgroup$– TrajanCommented Jan 27, 2021 at 14:32
I have seen the accepted answer as proof elsewhere too. However, while its easy to grok, it is incorrect. What it proves is
$x = z$ (which is obviously wrong, and the diagram just makes it seem plausible due to the way it is sketched).
What you really want to prove is (using the same variables as described in the diagram in the accepted answer above):
$z = x\ mod\ (y + z)$
$(y + z)$ is the loop length, $L$
so, what we want to prove is:
$z = x\ mod\ L$
Or that z is congruent to x (modulo L)
Following proof makes more sense to me:
Meeting point, $M = x + y$
2ドル(x + y) = M + kL,ドル where $k$ is some constant. Basically, distance travelled by the fast pointer is $x + y$ plus some multiple of loop length, $L$
$x + y = kL$
$x = kL - y$
The above equation proves that $x$ is the same as some multiple of loop length, $L$ minus $y$. So, if the fast pointer starts at the meeting point, $M$ or at $x + y,ドル then it will end up at the start of the loop.
-
2$\begingroup$ I don't know why is this answer underrated. This seems to be the most formal proof. $\endgroup$– StuxenCommented Dec 25, 2019 at 16:09
-
$\begingroup$ @I8Again if x=z is wrong then then logic in moving fast pointer back to the starting point & move both fast & slow pointer to find the start of the loop wouldn't work. i dont understand what you meant by x=z is wrong, please explain how that is? $\endgroup$– divineCommented Sep 16, 2020 at 5:21
-
2$\begingroup$ @divine x=z is wrong because it implies that both are equal, but it’s acceptable to have a very long "tail" (big x) and a very small circle (small z). By using the modulo we are formally correct, we just want to prove that x % L == z, in other words the pointer that travels in the circle waiting to join the other one can make as many loops as it wants, as long as the last steps (I.e. the remainder after looping) will lead to the start of the circle itself. $\endgroup$ Commented Nov 20, 2020 at 21:42
-
-
2$\begingroup$ For those of you who still didn't understand why x = k * L - y leads to z = x mod L. From x = k * L - y ---> x = (k - 1) * L + L - y and then now you can tell L - y has to be z. Now, you get x = (k - 1) * L + z $\endgroup$ Commented Feb 15, 2021 at 4:37
Say there are $l$ elements before the loop starts and $n$ elements in the loop. And $e_l$ is the first element of the loop which is seen when we traverse the linked-list. When we will say "an element $x$ steps ahead of $e$", that will mean, we can reach that element taking $x$ steps from $e$.
Now, when Tr (tortoise) reaches $e_l$, after $l$ iterations, say, Hr (Hare) is $x$ steps ahead of $e_l$. Since Hr has taken total 2ドルl$ steps by then ($l$ steps prior to the loop), so:
$x = l \bmod n$.
In each future iteration, Tr and Hr will progress by 1 and 2 steps respectively, and so each iteration will increase their "gap" by 1. So they will meet after $n-x$ further iterations, when their gap will become $x + (n-x) = n$. So, the meeting element $M$ will be $n-x$ steps ahead of $e_l$. Now that means, stepping $x$ steps after $M$ will again bring us to $e_l$. Our goal is to locate $e_l$.
So, when we start with one reference Tr at $M$ and another reference Hr at the head of the linked-list, and progress both of them 1 step at a time, after $l$ iterations:
Hr will be $l$ steps ahead of the head, which is $e_l$.
Tr will be $(l \bmod n)$ steps ahead of $M$, that is, $x$ steps ahead of $M$, which is $e_l$.
Thus when they have met, we know it is $e_l$.
See this article (written by me) for details. There you will find a slightly modified method to locate $e_l$.
I try to draw function curves to see what happen when two pointers velocities are not 2:1,and verified the correctness of finding the start of cycle.
I found the answer on stackoverflow. Thanks if anyone was looking into this for me. And for those who like me wanted an explanation, please refer to: https://stackoverflow.com/questions/3952805/proof-of-detecting-the-start-of-cycle-in-linked-list The chosen answer to the question, explains it!
I also think the top answer is incomplete, and while some of the other explanation are also good, I have another version without MOD to show the same thing, which is perhaps easier for some people.
The same setup, consider $X\ge 0$ being the distance before the loop, $N \le \text{size of list}$ being the size of the loop (the quantity $y+z$ in the picture), and the meeting point be $Y$ in the Loop. At this point, notice that we have also made $X+Y \le N$.
Now, the slow pointer travels $X+Y+pN$, where $p$ is an arbitrary integer. Then fast pointer travelled 2ドル(X+Y+pN)$.
Now, we claim that they meet at $Y$.
Proof:
If they meet at $Y$, it must be that the faster pointer travelled exactly $qN$ more than the slower pointer, where $q$ is an integer. Then we have: $$\text{distance difference = }2(X+Y+pN)-(X+Y+pN)=X+Y+pN$$. But we had $Y$ being an arbitrary meeting point in the loop, so we can choose simply $X+Y=N$, which holds as $X+Y \le N$.
Therefore, we have: $$ X+Y+pN=N+pN=(p+1)N = qN $$ which is true if $p+1=q$. Now, since both $p$ and $q$ are arbitrary integers, we can just choose the minimum so that $p=0,q=1$, which corresponds to: $$ \text{distance travelled by slow pointer}=X+Y+pN=X+Y $$ and $$ \text{distance travelled by fast pointer}=(X+Y+pN)+qN=X+Y + N $$ so the fast pointer, at the first time meeting the slow pointer, travelled exactly $N$ more.
Assume there are l steps to enter the loop, and the loop has length n. Using Floyd’s algorithm, tortoise 🐢 and hare 🐇 meet at the smallest m >= 1 where $x_m = x_{2m}$. m steps are one or more complete cycles.
If we compare $x_i$ and $x_{i+m}$ then they are the same if and only if $x_i$ is within the cycle, that is i >= l. We set tortoise = $x_0$ and hare = $x_m$ and as long as they are not the same increase them both by one step, counting the steps. They are the same for the first time when the tortoise enters the cycle, and that is after i = l steps.
PS. To find n, iterate the hare from $x_m$ until it reaches $x_m$ again after n steps. You would do that at the same time as finding the steps until the cycle. If n <= l then we find n at the same time or before l, otherwise at least we have done l steps of the search already.
PS. We know that n divides m. So if we haven’t found n after x steps where x < m is the largest divisor of m, then n = m.
PS. Brent’s loop finding algorithm usually runs faster, determines the cycle length immediately, and often knows an element $x_{m-l}$ shortly before entering the loop, makeing it faster to determine the number of steps until the loop.
Let t = h = start, m = 0
Repeat
t = t -> next, h = h -> next -> next, m = m+1
Until t = h
Let t = start, i = 0, x = h
While t ≠ h
t = t -> next, h = h-> next, i = i + 1
If h = x and n unknown then n = i
l = i
The tortoise and the hare are two pointers that are initialized with the value of the "top" of the list. In each cycle of the loop, the hare increases by 2 incremental items, while the tortoise increases by one. If at any point, the hare equals the hare, a loop has been detected. If the hare equals the end, a loop is not found.
This post explains it better line-by-line:
fast
variable, or the "hare" needs to move at twice the speed as tortoise, rather than just one ahead? $\endgroup$