Finding Egyptian fractions with A* search

Question 1

I've written an A* search to find the optimal solution to the equation

$$ \dfrac{4}{n} = \dfrac{1}{x} + \dfrac{1}{y} + \dfrac{1}{z} $$

with \$n, x, y, z\$ positive integers, and \$n\$ given; the optimal solution being the one with smallest sum \$ x + y + z \$.

(The Erdős–Straus conjecture states that there are solutions to this equation for all \$n>2\$.)

My working version:

def evaluate(option):
 return sum(option)
def generate_options(option):
 def option_valid(o):
 prev = o[0]
 for el in o[1:]:
 if el < prev:
 return False
 prev = el
 return True
 x, y, z = option
 opt = (x + 1, y, z)
 if option_valid(opt): yield opt
 opt = (x, y + 1, z)
 if option_valid(opt): yield opt
 opt = (x, y, z + 1)
 if option_valid(opt): yield opt
def is_solution(current, n):
 x, y, z = current
 if (4 / n) == (1 / x + 1 / y + 1 / z):
 return True
 return False
def A(option, generate_options, evaluate, is_solution):
 done = set()
 paths = PriorityQueue()
 paths.put((evaluate(option), option))
 done.add(option)
 while not paths.empty():
 price, current = paths.get()
 if is_solution(current):
 return price, current
 for gp in generate_options(current):
 if gp not in done:
 done.add(gp)
 paths.put((evaluate(gp), gp))
 return None
def main():
 def solution(x):
 return is_solution(x, n=18)
 print(A(
 (1, 1, 1),
 generate_options=generate_options,
 evaluate=evaluate,
 is_solution=solution))

The problem I see with this code would be that since I store all the checked solutions into a set, I could potentially run out of memory for an \$n\$ that would require a large amount of checked options. In my mind, I could fix this by clearing the entire set every \$m\$ iterations, but that could lead back to some options being checked more than once.

I'm fairly new to Python, and I also wonder if there's a better way to write my generate_options function. I do know there was no need to make it into a generator, but I did it that way for the sake of seeing how they work.

Question 2

It found (12, 12, 18) with the pasted version when I tried it just now.

Question 3

Yes it's python 3.5. Sorry for not specifying that explicitly, I never worked with python 2.7.

Question 4

There's a bug in is_solution:
```
>>> is_solution((2, 3, 6), 4)
False
```
even though $$ {1\over2} + {1\over3} + {1\over6} = 1 = {4\over4}. $$

This is because \${1\over3}\$ and \${1\over6}\$ can't be exactly represented as floating-point numbers:
```
>>> 1/2 + 1/3 + 1/6
0.9999999999999999
```
The way to solve this is to represent fractions using the fractions.Fraction class:
```
>>> Fraction(1, 2) + Fraction(1, 3) + Fraction(1, 6) == Fraction(4, 4)
True
```
The question claims the algorithm is "A*", but that's not true. The A* search algorithm tries to find the least-cost path to an identifiable goal in a directed graph. But here you can't identify the goal (because you're looking for a minimum, you won't know what it is until the search is complete), and you don't care about the cost of the path taken to get there.

It would be better to call the algorithm "best-first search".

You're basically searching over all triples of integers looking for one that happens to add up to your target. This is a waste of time, because after you've picked \$y\$ and \$z\,ドル say, then you can know that $$ x = {1 \over {4\over n} - {1\over y} - {1\over z}} $$ and all you have to do is compute this and check that it's an integer.

So a (slightly) better strategy looks like this:

from collections import namedtuple
from fractions import Fraction
from itertools import chain, count, takewhile
from math import floor
Triple = namedtuple('Triple', 'x y z')
def egyptian_triples(n):
 """All solutions to 1/x + 1/y + 1/z = 4/n in lexicographic order."""
 tz = Fraction(4, n)
 for z in count(floor(1 / tz) + 1):
 ty = tz - Fraction(1, z)
 for y in range(floor(1 / ty) + 1, z + 1):
 tx = ty - Fraction(1, y)
 if tx.numerator == 1 and tx.denominator <= y:
 yield Triple(tx.denominator, y, z)
def min_egyptian_triple(n):
 """Return the solution to 1/x + 1/y + 1/z = 4/n with minimum x + y + z."""
 triples = egyptian_triples(n)
 # First solution in lexicographic order.
 first = next(triples)
 max_z = sum(first) - 2
 # Return minimum solution.
 return min(chain((first,), takewhile(lambda t: t.z <= max_z, triples)),
 key=sum)

This is substantially faster:

>>> from timeit import timeit
>>> timeit(lambda:main(115), number=1)
(290, (60, 92, 138))
19.941934717004187
>>> timeit(lambda:print(min_egyptian_triple(115)), number=1)
Triple(x=60, y=92, z=138)
0.46365413605235517

Question 5

Thank you, this is very helpful! I forgot that most programming languages handle floating point numbers like that. This was a simple assignment we got in AI while talking about A*, so naturally I didn't think much of it. A A* value gets calculated by adding the current path cost and the heuristic estimate. If we were to regard this implementation as A*, what would the summation of x, y and z here be classified as? It's not really a heuristic estimate, but I don't think it could be said it's the path travelled up to that point. You clearly know this well, what would this be classified as?

Question 6

I've rephrased my objection to describing this algorithm as "A*".

Question 7

Thank you very much for your time to do this. It helps a great deal.

Question 8

Immediately evaluate seems redundant, it's basically just calling sum with a more generic unclear name. Just use sum unless you intend to change this later or there could be more complicated evaluations. But if that's the case, note it in a comment or docstring. All of your functions could benefit from docstrings, as they're complicated and unclear about what problem they solve. See this PEP about good docstring practices.

Don't use inline if statements. They may feel neater but they're less readable and just confusing. Instead

if option_valid(opt):
 yield opt

Also instead of manually repeating the code, you can use a loop. I know you have different parameters, but you could loop over a tuple of them, like so:

for opt in ((x + 1, y, z), (x, y + 1, z), (x, y, z + 1)):
 if option_valid(opt):
 yield opt

In is_solution you can just return the boolean result of your condition rather than having to return True or False.

def is_solution(current, n):
 x, y, z = current
 return (4 / n) == (1 / x + 1 / y + 1 / z)

You don't need to add return None to the end of A. Python defaults to returning None in the absence of any other return value anyway.

With your memory issues, sets are expensive on space. A set of range(10) is nearly three and half times as big as a list. They definitely speed up testing if a variable exists in them, but with significant memory usage.

If you want to stick with using a set you could test the length of done and write it to a file when it reaches a certain size, then clear out the set within the script. Reading a file can be slow but if you store it intelligently it would be quicker. For example, store them based on the first value of the tuple. Then you know you can look up solutions in that store. A rudimentary way of doing this is have different files for each initial value, eg. 1, 2 etc. That way you ignore any values that don't at least match the first value. As you're more familiar with the data input you might have more ideas about how to organise these.

Question 9

Thank you, all those suggestions are very helpful. I tried to separate out the evaluation function so that other ones can be used. Do you have any idea of how I might solve the memory problem with the checked options?

Question 10

@Pavlin I added a bit to my answer about that. I'm not sure if you were using the set knowing it's a memory cost for the sake of speed, but I'm unsure how you could reduce that memory without sacrificing the speed.

Question 11

The speed of the algorithm without marking solutions as done is a lot slower. Thank you for your answer.

Question 12

@Pavlin I'm a little confused by that comment, my suggestions were to change to using either a list or a file for storing the values of done, I wasn't suggesting you remove it entirely. Also I recommend waiting before accepting an answer here. Code Review can take a while for good answers to appear, someone might have much better suggestions for the memory management.

Question 13

Oh, my bad. I understood your suggestion, I was simply commenting the memory / speed trade-off; before I put in the set to store checked options, finding some solutions took twice as long in some cases. Thank you for info, I'm completely new here, and don't know how long it usually takes.

Question 14

is_solution is not reliable, as explained by Gareth Rees. The floating point issues can be avoided by not using division. Multiplying both sides of the equation by n*x*y*z leads to an equivalent form 4*x*y*z == n*(x*y + x*z + y*z).
Note that queue.PriorityQueue has some overhead from being thread-safe. In single-threaded code you could leverage the heapq module instead.
Since the items you add to the queue always increment the sum by one on the items you pull from the queue, you could use a simple FIFO queue instead. See collections.deque for that.
You can avoid both the done set and the paths queue by generating the options according to this approach:
- Loop i through positive integers
- For each i, generate all combinations x,y,z that sum to i

My solution:

import itertools
def triples_to_sum(sum_xyz):
 '''All (x,y,z) tuples where 1 <= x <= y <= z
 and x+y+x = sum_xyz
 '''
 for x in range(1, sum_xyz // 3 + 1):
 sum_yz = sum_xyz - x
 for y in range(x, sum_yz // 2 + 1):
 z = sum_yz - y
 yield x,y,z
def triples():
 '''All (x,y,z) tuples where 1 <= x <= y <= z
 ordered by x+y+z
 '''
 for i in itertools.count():
 yield from triples_to_sum(i)
def egyptian_triples(n):
 '''All solutions to 1/x + 1/y + 1/z = 4/n
 where 1 <= x <= y <= z, ordered by x+y+z
 ''' 
 for x,y,z in triples():
 if 4*x*y*z == n*(x*y + x*z + y*z):
 yield x,y,z
if __name__ == '__main__':
 print(next(egyptian_triples(18)))

Question 15

Thank you, this is helpful. I didn't know about heapq and it looks very useful. I don't know why anyone would use the standard priority queue knowing that exists.

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2015-11-04 18:08:06Z

There's a bug in is_solution:
```
>>> is_solution((2, 3, 6), 4)
False
```
even though $$ {1\over2} + {1\over3} + {1\over6} = 1 = {4\over4}. $$

This is because \${1\over3}\$ and \${1\over6}\$ can't be exactly represented as floating-point numbers:
```
>>> 1/2 + 1/3 + 1/6
0.9999999999999999
```
The way to solve this is to represent fractions using the fractions.Fraction class:
```
>>> Fraction(1, 2) + Fraction(1, 3) + Fraction(1, 6) == Fraction(4, 4)
True
```
The question claims the algorithm is "A*", but that's not true. The A* search algorithm tries to find the least-cost path to an identifiable goal in a directed graph. But here you can't identify the goal (because you're looking for a minimum, you won't know what it is until the search is complete), and you don't care about the cost of the path taken to get there.

It would be better to call the algorithm "best-first search".

You're basically searching over all triples of integers looking for one that happens to add up to your target. This is a waste of time, because after you've picked \$y\$ and \$z\,ドル say, then you can know that $$ x = {1 \over {4\over n} - {1\over y} - {1\over z}} $$ and all you have to do is compute this and check that it's an integer.

So a (slightly) better strategy looks like this:

from collections import namedtuple
from fractions import Fraction
from itertools import chain, count, takewhile
from math import floor
Triple = namedtuple('Triple', 'x y z')
def egyptian_triples(n):
 """All solutions to 1/x + 1/y + 1/z = 4/n in lexicographic order."""
 tz = Fraction(4, n)
 for z in count(floor(1 / tz) + 1):
 ty = tz - Fraction(1, z)
 for y in range(floor(1 / ty) + 1, z + 1):
 tx = ty - Fraction(1, y)
 if tx.numerator == 1 and tx.denominator <= y:
 yield Triple(tx.denominator, y, z)
def min_egyptian_triple(n):
 """Return the solution to 1/x + 1/y + 1/z = 4/n with minimum x + y + z."""
 triples = egyptian_triples(n)
 # First solution in lexicographic order.
 first = next(triples)
 max_z = sum(first) - 2
 # Return minimum solution.
 return min(chain((first,), takewhile(lambda t: t.z <= max_z, triples)),
 key=sum)

This is substantially faster:

>>> from timeit import timeit
>>> timeit(lambda:main(115), number=1)
(290, (60, 92, 138))
19.941934717004187
>>> timeit(lambda:print(min_egyptian_triple(115)), number=1)
Triple(x=60, y=92, z=138)
0.46365413605235517

Thank you, this is very helpful! I forgot that most programming languages handle floating point numbers like that. This was a simple assignment we got in AI while talking about A*, so naturally I didn't think much of it. A A* value gets calculated by adding the current path cost and the heuristic estimate. If we were to regard this implementation as A*, what would the summation of x, y and z here be classified as? It's not really a heuristic estimate, but I don't think it could be said it's the path travelled up to that point. You clearly know this well, what would this be classified as?
I've rephrased my objection to describing this algorithm as "A*".
Thank you very much for your time to do this. It helps a great deal.

Stack Exchange Network

Finding Egyptian fractions with A* search

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding Egyptian fractions with A* search

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions