Project Euler 78: Coin Partitions in Python

Question 1

I was trying to solve Project Euler Problem 78: Coin Partitions.

Let p(n) represent the number of different ways in which n coins can be separated into piles. For example, five coins can be separated into piles in exactly seven different ways, so p(5)=7. Find the least value of n for which p(n) is divisible by one million.

My solution is in Python. It uses the formula described here in the Wikipedia article, a simple formula really. It takes around 12 seconds, but ideally I'd get it around two or three seconds if not quicker.

Here's my code. I would appreciate any comments or help people could offer.

from timeit import default_timer as timer
start = timer()
partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]
n = 11
partition = 42
pentagonals = []
for k in range(1, 2000):
 pentagonals.append(k*(3*k-1)/2)
 k *= -1
 pentagonals.append(k*(3*k-1)/2)
while partition != 0:
 partition = 0
 pentagonal = 0
 index = 0
 temp = 0
 is_positive = False
 while pentagonal <= n:
 if is_positive: partition += temp
 else: partition -= temp
 pentagonal = pentagonals[index]
 temp = partitions[n - pentagonal]
 if index % 2 == 0: is_positive = not is_positive
 index += 1
 partition %= 1000000
 partitions.append(partition)
 n += 1
ans = n - 1
elapsed_time = (timer() - start) * 1000
print "Found %d as the answer in %d ms." % (ans, elapsed_time)

Question 2

Have you tested this to see what values it is getting in the partitions list?

Question 3

@GarethRees yes, it returns a very large number, something like 6.5 * 10^240

Question 4

And what values are in the partitions list?

Question 5

@GarethRees the first few numbers in the list are correct values, up to around six or seven digit numbers is what I can verify. [124754, 147273, 173525] ... etc

Question 6

Are you sure you get 124754? I don't—when I run your code I get 124754.0.

Question 7

1. Debugging

The first version of your code was broken because you used the expression

(-1)**(k-1)

where k is sometimes negative. The documentation says:

For int operands, the result has the same type as the operands unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered.

(my emphasis) so when k was negative, this expression evaluated to a float, either 1.0 or -1.0, and then the usual rules of arithmetic conversion caused all the other intermediate results to become floats too. Floats have limited precision, so eventually these were approximate rather than exact.

How did I discover this? Well, I ran the program under the Python debugger:

$ python -m pdb cr93311.py 
> cr93311.py(1)<module>()
-> partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]

I wanted to see what values actually showed up in the partitions list, so I set a breakpoint on line 17 (just after the partitions.append):

(Pdb) break 17
Breakpoint 1 at cr93311.py:17

And then I continued to the breakpoint and printed the partitions array:

(Pdb) continue
> cr93311.py(17)<module>()
-> n += 1
(Pdb) p partitions
[1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56.0]

The 56.0 stands out immediately as an error: it should be 56. (Possibly this was not obvious to you: in which case you might appreciate David Goldberg's article "What Every Computer Scientist Should Know About Floating-Point Arithmetic" or the beginner's version thereof.)

How did this floating-point number get there? Let's find out by restarting the program and setting a breakpoint on line 14, just after the computation of pentagonal and temp:

(Pdb) run
Restarting cr93311.py with arguments:
 cr93311.py
> cr93311.py(1)<module>()
-> partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]
(Pdb) break 14
Breakpoint 2 at cr93311.py:14
(Pdb) continue
> cr93311.py(14)<module>()
-> if k > 0: k *= -1
(Pdb) p pentagonal, temp
(1, 42)

Nothing wrong there; let's try the next iteration of the loop:

(Pdb) continue
> cr93311.py(14)<module>()
-> if k > 0: k *= -1
(Pdb) p pentagonal, temp
(2, 30.0)

So it's the computation of temp that wrongly converted to float. Let's check the sub-expressions of that computation:

(Pdb) p k, (-1)**(k - 1), pentagonal, partitions[n - pentagonal]
(-1, 1.0, 2, 30)

So it's the (-1)**(k - 1) that's the culprit.

2. Review

It is easier to test and measure code if it is organized into functions. Functions can be tested and timed independently of each other, making it possible to isolate an error or a performance problem.
Functions should be given docstrings describing what they do, so that it's possible to check the implementation against the documentation.
When a function produces a sequence of results it's usually convenient in Python to generate those results using the yield statement. That way, we don't have to commit in advance as to how many results we are going to compute (as we would if the function returned a list), and we don't have to store all the results in memory at once.
It is easier to test functions if they have adjustable parameters (instead of fixed constants like 1000000) so that we can run and check them on small inputs.

So I would break up the code into functions, like this:

def pentagonal():
 """Generate the generalized pentagonal numbers."""
def partitions_modulo(m):
 """Generate the partition numbers modulo m."""
def problem78(m=10**6):
 """Return the least value of n for which the partition number p(n) is
 divisible by m."""

This makes the code easy to test on small instances:

>>> from itertools import islice
>>> list(islice(pentagonal(), 16))
[1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, 77, 92, 100]
>>> list(islice(partitions_modulo(100), 16))
[1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 1, 35, 76]
>>> problem78(23)
32

and easy to measure:

>>> from timeit import timeit
>>> timeit(problem78, number=1)
9.393578507006168

3. Performance

Your best bet is to use PyPy — this kind of simple numerical code is where PyPy really shines. I get more than ten times speedup simply from running the code in the post under PyPy.

If you're determined to stick with CPython, then here are some techniques I found that were helpful:

Only generating as many generalized pentagonal numbers as necessary: it turns out that we only need the first 200 or so, not the first 2000.
The generalized pentagonal numbers are $$ 0, 1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, \dotsc $$ If we take the differences between consective elements, we get the sequence $$ 1, 1, 3, 2, 5, 3, 7, 4, 9, 5, 11, 6, 13, \dotsc $$ whose \$n\$th item is \$n\$ if \$n\$ is odd, or \$n\over 2\$ if \$n\$ is even. This means that the generalized pentagonal numbers can be generated using itertools.count and itertools.accumulate, like this:
```
accumulate(k if k % 2 else k // 2 for k in count(1))
```
The signs repeat in the cycle $$ 1, 1, -1, -1, 1, 1, -1, -1, \dotsc $$ and so they can be generated using itertools.cycle, like this:
```
cycle((1, 1, -1, -1))
```
Using sum instead of explicit accumulation.

But the revised code is only about twice as fast as the original:

from itertools import accumulate, count, cycle, islice
def pentagonal():
 """Generate the generalized pentagonal numbers starting at 1.
 >>> list(islice(pentagonal(), 16)) # http://oeis.org/A001318
 [1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, 77, 92, 100]
 """
 return accumulate(k if k % 2 else k // 2 for k in count(1))
def partitions_modulo(m):
 """Generate the partition numbers modulo m.
 >>> list(islice(partitions_modulo(1000), 16)) # http://oeis.org/A000041
 [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 101, 135, 176]
 """
 signs = 1, 1, -1, -1
 partitions = [1] # Partition numbers up to p(n)
 pentagonals = [] # Generalized pentagonal numbers <= n
 pentagonal_it = pentagonal() # Iterator over generalized pentagonal numbers
 next_pentagonal = next(pentagonal_it)
 yield 1
 for n in count(1):
 if next_pentagonal <= n:
 pentagonals.append(next_pentagonal)
 next_pentagonal = next(pentagonal_it)
 p = sum(sign * partitions[-pent]
 for sign, pent in zip(cycle(signs), pentagonals)) % m
 partitions.append(p)
 yield p
def problem78(m=10**6):
 """Return the least value of n for which the partition number p(n) is
 divisible by m.
 >>> problem78(23)
 32
 """
 return next(i for i, p in enumerate(partitions_modulo(m)) if p % m == 0)

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2015-06-16 22:00:42Z

1. Debugging

The first version of your code was broken because you used the expression

(-1)**(k-1)

where k is sometimes negative. The documentation says:

For int operands, the result has the same type as the operands unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered.

(my emphasis) so when k was negative, this expression evaluated to a float, either 1.0 or -1.0, and then the usual rules of arithmetic conversion caused all the other intermediate results to become floats too. Floats have limited precision, so eventually these were approximate rather than exact.

How did I discover this? Well, I ran the program under the Python debugger:

$ python -m pdb cr93311.py 
> cr93311.py(1)<module>()
-> partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]

I wanted to see what values actually showed up in the partitions list, so I set a breakpoint on line 17 (just after the partitions.append):

(Pdb) break 17
Breakpoint 1 at cr93311.py:17

And then I continued to the breakpoint and printed the partitions array:

(Pdb) continue
> cr93311.py(17)<module>()
-> n += 1
(Pdb) p partitions
[1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56.0]

The 56.0 stands out immediately as an error: it should be 56. (Possibly this was not obvious to you: in which case you might appreciate David Goldberg's article "What Every Computer Scientist Should Know About Floating-Point Arithmetic" or the beginner's version thereof.)

How did this floating-point number get there? Let's find out by restarting the program and setting a breakpoint on line 14, just after the computation of pentagonal and temp:

(Pdb) run
Restarting cr93311.py with arguments:
 cr93311.py
> cr93311.py(1)<module>()
-> partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]
(Pdb) break 14
Breakpoint 2 at cr93311.py:14
(Pdb) continue
> cr93311.py(14)<module>()
-> if k > 0: k *= -1
(Pdb) p pentagonal, temp
(1, 42)

Nothing wrong there; let's try the next iteration of the loop:

(Pdb) continue
> cr93311.py(14)<module>()
-> if k > 0: k *= -1
(Pdb) p pentagonal, temp
(2, 30.0)

So it's the computation of temp that wrongly converted to float. Let's check the sub-expressions of that computation:

(Pdb) p k, (-1)**(k - 1), pentagonal, partitions[n - pentagonal]
(-1, 1.0, 2, 30)

So it's the (-1)**(k - 1) that's the culprit.

2. Review

It is easier to test and measure code if it is organized into functions. Functions can be tested and timed independently of each other, making it possible to isolate an error or a performance problem.
Functions should be given docstrings describing what they do, so that it's possible to check the implementation against the documentation.
When a function produces a sequence of results it's usually convenient in Python to generate those results using the yield statement. That way, we don't have to commit in advance as to how many results we are going to compute (as we would if the function returned a list), and we don't have to store all the results in memory at once.
It is easier to test functions if they have adjustable parameters (instead of fixed constants like 1000000) so that we can run and check them on small inputs.

So I would break up the code into functions, like this:

def pentagonal():
 """Generate the generalized pentagonal numbers."""
def partitions_modulo(m):
 """Generate the partition numbers modulo m."""
def problem78(m=10**6):
 """Return the least value of n for which the partition number p(n) is
 divisible by m."""

This makes the code easy to test on small instances:

>>> from itertools import islice
>>> list(islice(pentagonal(), 16))
[1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, 77, 92, 100]
>>> list(islice(partitions_modulo(100), 16))
[1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 1, 35, 76]
>>> problem78(23)
32

and easy to measure:

>>> from timeit import timeit
>>> timeit(problem78, number=1)
9.393578507006168

3. Performance

Your best bet is to use PyPy — this kind of simple numerical code is where PyPy really shines. I get more than ten times speedup simply from running the code in the post under PyPy.

If you're determined to stick with CPython, then here are some techniques I found that were helpful:

Only generating as many generalized pentagonal numbers as necessary: it turns out that we only need the first 200 or so, not the first 2000.
The generalized pentagonal numbers are $$ 0, 1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, \dotsc $$ If we take the differences between consective elements, we get the sequence $$ 1, 1, 3, 2, 5, 3, 7, 4, 9, 5, 11, 6, 13, \dotsc $$ whose \$n\$th item is \$n\$ if \$n\$ is odd, or \$n\over 2\$ if \$n\$ is even. This means that the generalized pentagonal numbers can be generated using itertools.count and itertools.accumulate, like this:
```
accumulate(k if k % 2 else k // 2 for k in count(1))
```
The signs repeat in the cycle $$ 1, 1, -1, -1, 1, 1, -1, -1, \dotsc $$ and so they can be generated using itertools.cycle, like this:
```
cycle((1, 1, -1, -1))
```
Using sum instead of explicit accumulation.

But the revised code is only about twice as fast as the original:

from itertools import accumulate, count, cycle, islice
def pentagonal():
 """Generate the generalized pentagonal numbers starting at 1.
 >>> list(islice(pentagonal(), 16)) # http://oeis.org/A001318
 [1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, 77, 92, 100]
 """
 return accumulate(k if k % 2 else k // 2 for k in count(1))
def partitions_modulo(m):
 """Generate the partition numbers modulo m.
 >>> list(islice(partitions_modulo(1000), 16)) # http://oeis.org/A000041
 [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 101, 135, 176]
 """
 signs = 1, 1, -1, -1
 partitions = [1] # Partition numbers up to p(n)
 pentagonals = [] # Generalized pentagonal numbers <= n
 pentagonal_it = pentagonal() # Iterator over generalized pentagonal numbers
 next_pentagonal = next(pentagonal_it)
 yield 1
 for n in count(1):
 if next_pentagonal <= n:
 pentagonals.append(next_pentagonal)
 next_pentagonal = next(pentagonal_it)
 p = sum(sign * partitions[-pent]
 for sign, pent in zip(cycle(signs), pentagonals)) % m
 partitions.append(p)
 yield p
def problem78(m=10**6):
 """Return the least value of n for which the partition number p(n) is
 divisible by m.
 >>> problem78(23)
 32
 """
 return next(i for i, p in enumerate(partitions_modulo(m)) if p % m == 0)

Stack Exchange Network

Project Euler 78: Coin Partitions in Python

1 Answer 1

1. Debugging

2. Review

3. Performance

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Project Euler 78: Coin Partitions in Python

1 Answer 1

1. Debugging

2. Review

3. Performance

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions