Longest Collatz sequence in Python

Question 1

I came across this problem:

The following iterative sequence is defined for the set of positive integers:
 n → n/2 (n is even)
 n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

and to this I gave this solution:

def number(n):
 ctr = 1
 if n % 2 == 0:
 k = n / 2
 ctr += 1
 else:
 k = 3 * n + 1
 ctr += 1
 while k > 1:
 if k % 2 == 0:
 k /= 2
 ctr += 1
 else:
 k = k * 3 + 1
 ctr += 1
 return ctr
def k(upto):
 largest_number=0
 last_ctr=0
 for num in range(1,upto):
 if number(num)>last_ctr:
 last_ctr=number(num)
 largest_number=num
 return largest_number
print k(999999)

I am new to Python and don't think this is a good solution to this problem and maybe its complexity is \$\mathcal{O}(n^2)\$. So what is the best code for this problem if the number is really huge? How do I reduce the complexity?

And yes I read other questions related to this topic but they were not satisfactory answers to this in Python.

Question 2

Note: This is Project Euler's 14th problem

Question 3

Your current solution is a faithful representation of the "brute force" approach. While there's room for improvement programming-wise, any radical change can only be achieved by doing the maths to find a better approach, I'm afraid.

Question 4

If you don't mind using lots of memory, you could go for a sort of dynamic programming approach, and store the values for each number as you find them, so once you reach the start of a chain you've already calculated, you don't have to do the work again.

Question 5

related: stackoverflow.com/q/13219320/674039

Question 6

The key to solving this problem (and many others) is using memoization: The numbers of the collatz sequence converge in a kind of tree-like structure. Once you have calculated the value of, say collatz(100) on one path, you do not need to calculate it again when you encounter it again on another path.

One way to do this would be to have (a) a cache and (b) a list of all the numbers on the current path. As soon as you encounter a number that is already in the cache, go your path backwards and add all the numbers to the cache.

cache = {1: 1}
def collatz(n):
 path = [n]
 while n not in cache:
 if n % 2:
 n = 3 * n + 1
 else:
 n = n / 2
 path.append(n)
 for i, m in enumerate(reversed(path)):
 cache[m] = cache[n] + i
 return cache[path[0]]

Another, maybe more elegant way would be using recursion and a decorator function. You can define a @memo decorator and use it again and again for all sorts of function you want to apply memoization to. This decorator automatically replaces your function with a version that looks in the cache, first.

def memo(f):
 f.cache = {}
 def _f(*args):
 if args not in f.cache:
 f.cache[args] = f(*args)
 return f.cache[args]
 return _f
@memo
def collatz(n):
 if n == 1:
 return 1
 if n % 2 == 0:
 return 1 + collatz(n / 2)
 if n % 2 == 1:
 return 1 + collatz(3 * n + 1)

In both cases, call like this and after a few seconds you should get your result.

print(max(range(1, 10**6), key=collatz))

Question 7

Like some of the comments said, you can use dynamic approach:

def k(upto):
 #save the reults you already have
 results=[0 for i in range(upto+1)]
 results[0]=1
 results[1]=1
 #for every other number
 for num in range(2,upto+1):
 col=num
 ctr=0
 shortCut=0
 #if we don't already know how long the sequence is
 while(shortCut==0):
 #one round of collatz
 if col%2==0:
 ctr+=1
 col=col >> 1
 else:
 ctr+=2
 col=(3*col+1)>>1
 #if our number is small enough to be in the list
 if col<upto:
 #try to take a shortcut
 shortCut=results[col]
 results[num]=results[col]+ctr
 return results[1:]
print max(k(1000000))

the idea is that if you start with 16 for example

You get to 8 after one iteration

But we have already calculated the sequence length for 8, so why do it again?

So we save all lengths and return the steps it took to get to 8 + the length of the sequence from 8 on.

This should run in around O(n) but it is hard to say with Collatz

(If Collatz is wrong it doesn't ever finish, but I don't think you have enough memory to reach those numbers)

Question 8

Faster thanks to the help of dynamic programming, at the cost of some extra memory usage. We save the lengths of each chain we calculate, and reuse them so we only need to calculate them once. This implementation only saves lengths for values below the number we're going up to, but since the collatz function will go above this, we could raise the range of values we save if we don't mind increasing space used

def k(upto):
 def collatz(n):
 if n < upto and lst[n] > 0:
 return lst[n]
 if n % 2 == 0:
 val = collatz(n/2) + 1
 else:
 val = collatz((3*n + 1)/2) + 2
 if n < upto:
 lst[n] = val
 return val
 lst = [0]*upto
 lst[1] = 1
 lst[0] = 1
 for i in range(upto):
 collatz(i)
 return max(lst)

Question 9

Wow that's beautifully short. Might I suggest using return collatz((3*n+1)/2) +2 instead of return collatz(3*n + 1) + 1, as 3n+1 is always divisible by 2? @StephenTG

Question 10

@JeD I suppose. Since if 3n+1 is in our list, so is 3n+1 / 2, right?

Question 11

Yep, It's even more probable that 3n+1/2 is in the list since it is smaller. @StephenTG

tobias_k tobias_k 1,69914 silver badges18 bronze badges · Answer 1 · 2016-03-03 21:19:01Z

The key to solving this problem (and many others) is using memoization: The numbers of the collatz sequence converge in a kind of tree-like structure. Once you have calculated the value of, say collatz(100) on one path, you do not need to calculate it again when you encounter it again on another path.

One way to do this would be to have (a) a cache and (b) a list of all the numbers on the current path. As soon as you encounter a number that is already in the cache, go your path backwards and add all the numbers to the cache.

cache = {1: 1}
def collatz(n):
 path = [n]
 while n not in cache:
 if n % 2:
 n = 3 * n + 1
 else:
 n = n / 2
 path.append(n)
 for i, m in enumerate(reversed(path)):
 cache[m] = cache[n] + i
 return cache[path[0]]

Another, maybe more elegant way would be using recursion and a decorator function. You can define a @memo decorator and use it again and again for all sorts of function you want to apply memoization to. This decorator automatically replaces your function with a version that looks in the cache, first.

def memo(f):
 f.cache = {}
 def _f(*args):
 if args not in f.cache:
 f.cache[args] = f(*args)
 return f.cache[args]
 return _f
@memo
def collatz(n):
 if n == 1:
 return 1
 if n % 2 == 0:
 return 1 + collatz(n / 2)
 if n % 2 == 1:
 return 1 + collatz(3 * n + 1)

In both cases, call like this and after a few seconds you should get your result.

print(max(range(1, 10**6), key=collatz))

JeD JeD 1666 bronze badges · Answer 2 · 2016-03-03 21:00:09Z

Like some of the comments said, you can use dynamic approach:

def k(upto):
 #save the reults you already have
 results=[0 for i in range(upto+1)]
 results[0]=1
 results[1]=1
 #for every other number
 for num in range(2,upto+1):
 col=num
 ctr=0
 shortCut=0
 #if we don't already know how long the sequence is
 while(shortCut==0):
 #one round of collatz
 if col%2==0:
 ctr+=1
 col=col >> 1
 else:
 ctr+=2
 col=(3*col+1)>>1
 #if our number is small enough to be in the list
 if col<upto:
 #try to take a shortcut
 shortCut=results[col]
 results[num]=results[col]+ctr
 return results[1:]
print max(k(1000000))

the idea is that if you start with 16 for example

You get to 8 after one iteration

But we have already calculated the sequence length for 8, so why do it again?

So we save all lengths and return the steps it took to get to 8 + the length of the sequence from 8 on.

This should run in around O(n) but it is hard to say with Collatz

(If Collatz is wrong it doesn't ever finish, but I don't think you have enough memory to reach those numbers)

StephenTG StephenTG 1212 bronze badges · Answer 3 · 2016-03-03 21:13:55Z

Faster thanks to the help of dynamic programming, at the cost of some extra memory usage. We save the lengths of each chain we calculate, and reuse them so we only need to calculate them once. This implementation only saves lengths for values below the number we're going up to, but since the collatz function will go above this, we could raise the range of values we save if we don't mind increasing space used

def k(upto):
 def collatz(n):
 if n < upto and lst[n] > 0:
 return lst[n]
 if n % 2 == 0:
 val = collatz(n/2) + 1
 else:
 val = collatz((3*n + 1)/2) + 2
 if n < upto:
 lst[n] = val
 return val
 lst = [0]*upto
 lst[1] = 1
 lst[0] = 1
 for i in range(upto):
 collatz(i)
 return max(lst)

Wow that's beautifully short. Might I suggest using return collatz((3*n+1)/2) +2 instead of return collatz(3*n + 1) + 1, as 3n+1 is always divisible by 2? @StephenTG
@JeD I suppose. Since if 3n+1 is in our list, so is 3n+1 / 2, right?
Yep, It's even more probable that 3n+1/2 is in the list since it is smaller. @StephenTG

Stack Exchange Network

Longest Collatz sequence in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Longest Collatz sequence in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions