I've written an A* search to find the optimal solution to the equation
$$ \dfrac{4}{n} = \dfrac{1}{x} + \dfrac{1}{y} + \dfrac{1}{z} $$
with \$n, x, y, z\$ positive integers, and \$n\$ given; the optimal solution being the one with smallest sum \$ x + y + z \$.
(The Erdős–Straus conjecture states that there are solutions to this equation for all \$n>2\$.)
My working version:
def evaluate(option):
return sum(option)
def generate_options(option):
def option_valid(o):
prev = o[0]
for el in o[1:]:
if el < prev:
return False
prev = el
return True
x, y, z = option
opt = (x + 1, y, z)
if option_valid(opt): yield opt
opt = (x, y + 1, z)
if option_valid(opt): yield opt
opt = (x, y, z + 1)
if option_valid(opt): yield opt
def is_solution(current, n):
x, y, z = current
if (4 / n) == (1 / x + 1 / y + 1 / z):
return True
return False
def A(option, generate_options, evaluate, is_solution):
done = set()
paths = PriorityQueue()
paths.put((evaluate(option), option))
done.add(option)
while not paths.empty():
price, current = paths.get()
if is_solution(current):
return price, current
for gp in generate_options(current):
if gp not in done:
done.add(gp)
paths.put((evaluate(gp), gp))
return None
def main():
def solution(x):
return is_solution(x, n=18)
print(A(
(1, 1, 1),
generate_options=generate_options,
evaluate=evaluate,
is_solution=solution))
The problem I see with this code would be that since I store all the checked solutions into a set, I could potentially run out of memory for an \$n\$ that would require a large amount of checked options. In my mind, I could fix this by clearing the entire set every \$m\$ iterations, but that could lead back to some options being checked more than once.
I'm fairly new to Python, and I also wonder if there's a better way to write my generate_options
function. I do know there was no need to make it into a generator, but I did it that way for the sake of seeing how they work.
-
\$\begingroup\$ It found (12, 12, 18) with the pasted version when I tried it just now. \$\endgroup\$Pavlin– Pavlin2015年11月04日 15:25:28 +00:00Commented Nov 4, 2015 at 15:25
-
\$\begingroup\$ Yes it's python 3.5. Sorry for not specifying that explicitly, I never worked with python 2.7. \$\endgroup\$Pavlin– Pavlin2015年11月04日 15:28:22 +00:00Commented Nov 4, 2015 at 15:28
3 Answers 3
There's a bug in
is_solution
:>>> is_solution((2, 3, 6), 4) False
even though $$ {1\over2} + {1\over3} + {1\over6} = 1 = {4\over4}. $$
This is because \${1\over3}\$ and \${1\over6}\$ can't be exactly represented as floating-point numbers:
>>> 1/2 + 1/3 + 1/6 0.9999999999999999
The way to solve this is to represent fractions using the
fractions.Fraction
class:>>> Fraction(1, 2) + Fraction(1, 3) + Fraction(1, 6) == Fraction(4, 4) True
The question claims the algorithm is "A*", but that's not true. The A* search algorithm tries to find the least-cost path to an identifiable goal in a directed graph. But here you can't identify the goal (because you're looking for a minimum, you won't know what it is until the search is complete), and you don't care about the cost of the path taken to get there.
It would be better to call the algorithm "best-first search".
You're basically searching over all triples of integers looking for one that happens to add up to your target. This is a waste of time, because after you've picked \$y\$ and \$z\,ドル say, then you can know that $$ x = {1 \over {4\over n} - {1\over y} - {1\over z}} $$ and all you have to do is compute this and check that it's an integer.
So a (slightly) better strategy looks like this:
from collections import namedtuple from fractions import Fraction from itertools import chain, count, takewhile from math import floor Triple = namedtuple('Triple', 'x y z') def egyptian_triples(n): """All solutions to 1/x + 1/y + 1/z = 4/n in lexicographic order.""" tz = Fraction(4, n) for z in count(floor(1 / tz) + 1): ty = tz - Fraction(1, z) for y in range(floor(1 / ty) + 1, z + 1): tx = ty - Fraction(1, y) if tx.numerator == 1 and tx.denominator <= y: yield Triple(tx.denominator, y, z) def min_egyptian_triple(n): """Return the solution to 1/x + 1/y + 1/z = 4/n with minimum x + y + z.""" triples = egyptian_triples(n) # First solution in lexicographic order. first = next(triples) max_z = sum(first) - 2 # Return minimum solution. return min(chain((first,), takewhile(lambda t: t.z <= max_z, triples)), key=sum)
This is substantially faster:
>>> from timeit import timeit >>> timeit(lambda:main(115), number=1) (290, (60, 92, 138)) 19.941934717004187 >>> timeit(lambda:print(min_egyptian_triple(115)), number=1) Triple(x=60, y=92, z=138) 0.46365413605235517
-
\$\begingroup\$ Thank you, this is very helpful! I forgot that most programming languages handle floating point numbers like that. This was a simple assignment we got in AI while talking about A*, so naturally I didn't think much of it. A A* value gets calculated by adding the current path cost and the heuristic estimate. If we were to regard this implementation as A*, what would the summation of x, y and z here be classified as? It's not really a heuristic estimate, but I don't think it could be said it's the path travelled up to that point. You clearly know this well, what would this be classified as? \$\endgroup\$Pavlin– Pavlin2015年11月04日 19:03:49 +00:00Commented Nov 4, 2015 at 19:03
-
\$\begingroup\$ I've rephrased my objection to describing this algorithm as "A*". \$\endgroup\$Gareth Rees– Gareth Rees2015年11月04日 19:36:38 +00:00Commented Nov 4, 2015 at 19:36
-
\$\begingroup\$ Thank you very much for your time to do this. It helps a great deal. \$\endgroup\$Pavlin– Pavlin2015年11月05日 07:09:22 +00:00Commented Nov 5, 2015 at 7:09
Immediately evaluate
seems redundant, it's basically just calling sum
with a more generic unclear name. Just use sum
unless you intend to change this later or there could be more complicated evaluations. But if that's the case, note it in a comment or docstring. All of your functions could benefit from docstrings, as they're complicated and unclear about what problem they solve. See this PEP about good docstring practices.
Don't use inline if
statements. They may feel neater but they're less readable and just confusing. Instead
if option_valid(opt):
yield opt
Also instead of manually repeating the code, you can use a loop. I know you have different parameters, but you could loop over a tuple of them, like so:
for opt in ((x + 1, y, z), (x, y + 1, z), (x, y, z + 1)):
if option_valid(opt):
yield opt
In is_solution
you can just return the boolean result of your condition rather than having to return True
or False
.
def is_solution(current, n):
x, y, z = current
return (4 / n) == (1 / x + 1 / y + 1 / z)
You don't need to add return None
to the end of A
. Python defaults to returning None
in the absence of any other return value anyway.
With your memory issues, sets are expensive on space. A set of range(10)
is nearly three and half times as big as a list. They definitely speed up testing if a variable exists in them, but with significant memory usage.
If you want to stick with using a set you could test the length of done
and write it to a file when it reaches a certain size, then clear out the set within the script. Reading a file can be slow but if you store it intelligently it would be quicker. For example, store them based on the first value of the tuple. Then you know you can look up solutions in that store. A rudimentary way of doing this is have different files for each initial value, eg. 1
, 2
etc. That way you ignore any values that don't at least match the first value. As you're more familiar with the data input you might have more ideas about how to organise these.
-
\$\begingroup\$ Thank you, all those suggestions are very helpful. I tried to separate out the evaluation function so that other ones can be used. Do you have any idea of how I might solve the memory problem with the checked options? \$\endgroup\$Pavlin– Pavlin2015年11月04日 10:09:06 +00:00Commented Nov 4, 2015 at 10:09
-
\$\begingroup\$ @Pavlin I added a bit to my answer about that. I'm not sure if you were using the set knowing it's a memory cost for the sake of speed, but I'm unsure how you could reduce that memory without sacrificing the speed. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年11月04日 10:33:28 +00:00Commented Nov 4, 2015 at 10:33
-
\$\begingroup\$ The speed of the algorithm without marking solutions as done is a lot slower. Thank you for your answer. \$\endgroup\$Pavlin– Pavlin2015年11月04日 11:03:41 +00:00Commented Nov 4, 2015 at 11:03
-
\$\begingroup\$ @Pavlin I'm a little confused by that comment, my suggestions were to change to using either a list or a file for storing the values of
done
, I wasn't suggesting you remove it entirely. Also I recommend waiting before accepting an answer here. Code Review can take a while for good answers to appear, someone might have much better suggestions for the memory management. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年11月04日 11:06:52 +00:00Commented Nov 4, 2015 at 11:06 -
\$\begingroup\$ Oh, my bad. I understood your suggestion, I was simply commenting the memory / speed trade-off; before I put in the set to store checked options, finding some solutions took twice as long in some cases. Thank you for info, I'm completely new here, and don't know how long it usually takes. \$\endgroup\$Pavlin– Pavlin2015年11月04日 11:15:33 +00:00Commented Nov 4, 2015 at 11:15
is_solution
is not reliable, as explained by Gareth Rees. The floating point issues can be avoided by not using division. Multiplying both sides of the equation byn*x*y*z
leads to an equivalent form4*x*y*z == n*(x*y + x*z + y*z)
.- Note that
queue.PriorityQueue
has some overhead from being thread-safe. In single-threaded code you could leverage theheapq
module instead. - Since the items you add to the queue always increment the sum by one on
the items you pull from the queue, you could use a simple FIFO queue instead.
See
collections.deque
for that. - You can avoid both the
done
set and thepaths
queue by generating the options according to this approach:- Loop
i
through positive integers - For each
i
, generate all combinationsx,y,z
that sum toi
- Loop
My solution:
import itertools
def triples_to_sum(sum_xyz):
'''All (x,y,z) tuples where 1 <= x <= y <= z
and x+y+x = sum_xyz
'''
for x in range(1, sum_xyz // 3 + 1):
sum_yz = sum_xyz - x
for y in range(x, sum_yz // 2 + 1):
z = sum_yz - y
yield x,y,z
def triples():
'''All (x,y,z) tuples where 1 <= x <= y <= z
ordered by x+y+z
'''
for i in itertools.count():
yield from triples_to_sum(i)
def egyptian_triples(n):
'''All solutions to 1/x + 1/y + 1/z = 4/n
where 1 <= x <= y <= z, ordered by x+y+z
'''
for x,y,z in triples():
if 4*x*y*z == n*(x*y + x*z + y*z):
yield x,y,z
if __name__ == '__main__':
print(next(egyptian_triples(18)))
-
\$\begingroup\$ Thank you, this is helpful. I didn't know about heapq and it looks very useful. I don't know why anyone would use the standard priority queue knowing that exists. \$\endgroup\$Pavlin– Pavlin2015年11月05日 07:14:21 +00:00Commented Nov 5, 2015 at 7:14