I was trying to solve Project Euler Problem 78: Coin Partitions.
Let p(n) represent the number of different ways in which n coins can be separated into piles. For example, five coins can be separated into piles in exactly seven different ways, so p(5)=7. Find the least value of n for which p(n) is divisible by one million.
My solution is in Python. It uses the formula described here in the Wikipedia article, a simple formula really. It takes around 12 seconds, but ideally I'd get it around two or three seconds if not quicker.
Here's my code. I would appreciate any comments or help people could offer.
from timeit import default_timer as timer
start = timer()
partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]
n = 11
partition = 42
pentagonals = []
for k in range(1, 2000):
pentagonals.append(k*(3*k-1)/2)
k *= -1
pentagonals.append(k*(3*k-1)/2)
while partition != 0:
partition = 0
pentagonal = 0
index = 0
temp = 0
is_positive = False
while pentagonal <= n:
if is_positive: partition += temp
else: partition -= temp
pentagonal = pentagonals[index]
temp = partitions[n - pentagonal]
if index % 2 == 0: is_positive = not is_positive
index += 1
partition %= 1000000
partitions.append(partition)
n += 1
ans = n - 1
elapsed_time = (timer() - start) * 1000
print "Found %d as the answer in %d ms." % (ans, elapsed_time)
1 Answer 1
1. Debugging
The first version of your code was broken because you used the expression
(-1)**(k-1)
where k
is sometimes negative. The documentation says:
For
int
operands, the result has the same type as the operands unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered.
(my emphasis) so when k
was negative, this expression evaluated to a float, either 1.0
or -1.0
, and then the usual rules of arithmetic conversion caused all the other intermediate results to become floats too. Floats have limited precision, so eventually these were approximate rather than exact.
How did I discover this? Well, I ran the program under the Python debugger:
$ python -m pdb cr93311.py
> cr93311.py(1)<module>()
-> partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]
I wanted to see what values actually showed up in the partitions
list, so I set a breakpoint on line 17 (just after the partitions.append
):
(Pdb) break 17
Breakpoint 1 at cr93311.py:17
And then I continued to the breakpoint and printed the partitions
array:
(Pdb) continue
> cr93311.py(17)<module>()
-> n += 1
(Pdb) p partitions
[1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56.0]
The 56.0
stands out immediately as an error: it should be 56
. (Possibly this was not obvious to you: in which case you might appreciate David Goldberg's article "What Every Computer Scientist Should Know About Floating-Point Arithmetic" or the beginner's version thereof.)
How did this floating-point number get there? Let's find out by restarting the program and setting a breakpoint on line 14, just after the computation of pentagonal
and temp
:
(Pdb) run
Restarting cr93311.py with arguments:
cr93311.py
> cr93311.py(1)<module>()
-> partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]
(Pdb) break 14
Breakpoint 2 at cr93311.py:14
(Pdb) continue
> cr93311.py(14)<module>()
-> if k > 0: k *= -1
(Pdb) p pentagonal, temp
(1, 42)
Nothing wrong there; let's try the next iteration of the loop:
(Pdb) continue
> cr93311.py(14)<module>()
-> if k > 0: k *= -1
(Pdb) p pentagonal, temp
(2, 30.0)
So it's the computation of temp
that wrongly converted to float. Let's check the sub-expressions of that computation:
(Pdb) p k, (-1)**(k - 1), pentagonal, partitions[n - pentagonal]
(-1, 1.0, 2, 30)
So it's the (-1)**(k - 1)
that's the culprit.
2. Review
It is easier to test and measure code if it is organized into functions. Functions can be tested and timed independently of each other, making it possible to isolate an error or a performance problem.
Functions should be given docstrings describing what they do, so that it's possible to check the implementation against the documentation.
When a function produces a sequence of results it's usually convenient in Python to generate those results using the
yield
statement. That way, we don't have to commit in advance as to how many results we are going to compute (as we would if the function returned a list), and we don't have to store all the results in memory at once.It is easier to test functions if they have adjustable parameters (instead of fixed constants like
1000000
) so that we can run and check them on small inputs.
So I would break up the code into functions, like this:
def pentagonal():
"""Generate the generalized pentagonal numbers."""
def partitions_modulo(m):
"""Generate the partition numbers modulo m."""
def problem78(m=10**6):
"""Return the least value of n for which the partition number p(n) is
divisible by m."""
This makes the code easy to test on small instances:
>>> from itertools import islice
>>> list(islice(pentagonal(), 16))
[1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, 77, 92, 100]
>>> list(islice(partitions_modulo(100), 16))
[1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 1, 35, 76]
>>> problem78(23)
32
and easy to measure:
>>> from timeit import timeit
>>> timeit(problem78, number=1)
9.393578507006168
3. Performance
Your best bet is to use PyPy — this kind of simple numerical code is where PyPy really shines. I get more than ten times speedup simply from running the code in the post under PyPy.
If you're determined to stick with CPython, then here are some techniques I found that were helpful:
Only generating as many generalized pentagonal numbers as necessary: it turns out that we only need the first 200 or so, not the first 2000.
The generalized pentagonal numbers are $$ 0, 1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, \dotsc $$ If we take the differences between consective elements, we get the sequence $$ 1, 1, 3, 2, 5, 3, 7, 4, 9, 5, 11, 6, 13, \dotsc $$ whose \$n\$th item is \$n\$ if \$n\$ is odd, or \$n\over 2\$ if \$n\$ is even. This means that the generalized pentagonal numbers can be generated using
itertools.count
anditertools.accumulate
, like this:accumulate(k if k % 2 else k // 2 for k in count(1))
The signs repeat in the cycle $$ 1, 1, -1, -1, 1, 1, -1, -1, \dotsc $$ and so they can be generated using
itertools.cycle
, like this:cycle((1, 1, -1, -1))
Using
sum
instead of explicit accumulation.
But the revised code is only about twice as fast as the original:
from itertools import accumulate, count, cycle, islice
def pentagonal():
"""Generate the generalized pentagonal numbers starting at 1.
>>> list(islice(pentagonal(), 16)) # http://oeis.org/A001318
[1, 2, 5, 7, 12, 15, 22, 26, 35, 40, 51, 57, 70, 77, 92, 100]
"""
return accumulate(k if k % 2 else k // 2 for k in count(1))
def partitions_modulo(m):
"""Generate the partition numbers modulo m.
>>> list(islice(partitions_modulo(1000), 16)) # http://oeis.org/A000041
[1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 101, 135, 176]
"""
signs = 1, 1, -1, -1
partitions = [1] # Partition numbers up to p(n)
pentagonals = [] # Generalized pentagonal numbers <= n
pentagonal_it = pentagonal() # Iterator over generalized pentagonal numbers
next_pentagonal = next(pentagonal_it)
yield 1
for n in count(1):
if next_pentagonal <= n:
pentagonals.append(next_pentagonal)
next_pentagonal = next(pentagonal_it)
p = sum(sign * partitions[-pent]
for sign, pent in zip(cycle(signs), pentagonals)) % m
partitions.append(p)
yield p
def problem78(m=10**6):
"""Return the least value of n for which the partition number p(n) is
divisible by m.
>>> problem78(23)
32
"""
return next(i for i, p in enumerate(partitions_modulo(m)) if p % m == 0)
Explore related questions
See similar questions with these tags.
partitions
list? \$\endgroup\$6.5 * 10^240
\$\endgroup\$partitions
list? \$\endgroup\$[124754, 147273, 173525]
... etc \$\endgroup\$124754
? I don't—when I run your code I get124754.0
. \$\endgroup\$