I am currently working on a Python 3.4.3 project which includes solving the Traveling Salesman Problem using different algorithms. I have just written a brute force algorithm and I would love some feedback. Now I understand that the number of operations increases by the factorial of the route length so brute forcing is not considered feasible, but for my project I need a standard brute force.
My overarching question is: How can I make this as fast and efficient as possible?
import itertools
def distance(p1, p2):
#calculates distance from two points
d = (((p2[0] - p1[0])**2) + ((p2[1] - p1[1])**2))**.5
return int(d)
def calCosts(routes, nodes):
travelCosts = []
for route in routes:
travelCost = 0
#Sums up the travel cost
for i in range(1,len(route)):
#takes an element of route, uses it to find the corresponding coords and calculates the distance
travelCost += distance(nodes[str(route[i-1])], nodes[str(route[i])])
travelCosts.append(travelCost)
#pulls out the smallest travel cost
smallestCost = min(travelCosts)
shortest = (routes[travelCosts.index(smallestCost)], smallestCost)
#returns tuple of the route and its cost
return shortest
def genRoutes(routeLength):
#lang hold all the 'alphabet' of nodes
lang = [ x for x in range(2,routeLength+1) ]
#uses built-in itertools to generate permutations
routes = list(map(list, itertools.permutations(lang)))
#inserts the home city, must be the first city in every route
for x in routes:
x.insert(0,1)
return routes
def main(nodes=None, instanceSize=5):
#nodes and instanceSize are passed into main() using another program
#I just gave them default values for this example
#The Node lookup table.
Nodes = {
'1': (565.0, 575.0),
'2': (25.0, 185.0),
'3': (345.0, 750.0),
'4': (945.0, 685.0),
'5': (845.0, 655.0),
'6': (880.0, 660.0),
'7': (25.0, 230.0),
'8': (525.0, 1000.0),
'9': (580.0, 1175.0),
'10': (650.0, 1130.0),
'11': (1605.0, 620.0),
'12': (1220.0, 580.0),
'13': (1465.0, 200.0),
'14': (1530.0, 5.0),
'15': (845.0, 680.0)
}
routes = genRoutes(instanceSize)
shortest = calCosts(routes, Nodes)
print("Shortest Route: ", shortest[0])
print("Travel Cost: ", shortest[1])
if __name__ == '__main__':
main()
-
\$\begingroup\$ You present us with a list of 15 nodes, but instanceSize/routeLength of only 5. But if I'm not mistaken, route generator only generates routes from the first five nodes. Is that intended? Or should it generate routes between 5 nodes inbetween all 15 nodes (though always including node 1)? \$\endgroup\$holroy– holroy2015年11月09日 20:13:27 +00:00Commented Nov 9, 2015 at 20:13
-
\$\begingroup\$ @holroy Yes that is intended, I set the default value for 5 because the program crashes at about 13 nodes. So it will only generate permutations for 2-5. \$\endgroup\$Mr. Foots– Mr. Foots2015年11月09日 21:24:14 +00:00Commented Nov 9, 2015 at 21:24
3 Answers 3
Lets have a little style review, before some code refactoring, and finish off with some performance comparison.
Style and code review
I suggest you read up on PEP8, which is the official style guide for Python.
- Not using
snake_case
for variable and function names - You are using mostlycamelCase
, and in some cases one letter variables - Good use of singular vs plural – I like combinations like
for route in routes
. They make sense - Good use of
if __name__ ...
- This is good construct - Many temporary variables – You have a little too many temporary variables, which are only used once. These can often be skipped
- Avoid using indexes in loops – In general, it is better to avoid using indexes in loops, rather than looping on the elements themself. Especially bad is it when you use
str(i)
to get the index, when theNodes
list should have used pure integers directly as keys. - Avoid building large lists in memory – Your code builds the
routes
list in memory. When calling this for a route length of 9, your original code uses approx 6.7MB to store the routes. My optimised routine below, uses 0.02MB because it uses generators instead... - Avoid recalculating distances – Python can memoize the output from a function for each given input by using decorators. This means that you can calculate the distance between two points once, and the next time you need it the memoize function picks your previous calculation of the result.
- Use docstrings instead of ordinary comment to document functions – If you use docstrings, some of the editors will provide interactive help when you use your functions, and it is according the style guide.
Performance comparison
I made some variants of the code, both of your original, the one by SuperBiasedMan (with some correction as his code currently has a bug), some versions using only generators, itertools, sum
and min
, and finally a version using for
loops instead of list comprehension and a slightly alternate optimisation to find the minimum.
I used the following version of memoize, which most likely can be further optimised:
def memoize(f):
""" Memoization decorator for functions taking one or more arguments. """
class memodict(dict):
def __init__(self, f):
self.f = f
def __call__(self, *args):
return self[args]
def __missing__(self, key):
ret = self[key] = self.f(*key)
return ret
return memodict(f)
@memoize
def distance(p1, p2):
"""Calculates distance between two points, memoizes result"""
d = (((p2[0] - p1[0])**2) + ((p2[1] - p1[1])**2)) **.5
return int(d)
One version taking it to the extreme regarding list comprehension and generators are this one:
def route_generator(route_length):
"""Generate all possible routes of length route_length, starting at 1."""
for route in permutations(xrange(2, route_length+1)):
yield (1, ) + route
def main_holroy_v2(instance_size=INSTANCE_SIZE):
print min((sum(distance(NODES[start], NODES[stop])
for start, stop in pairwise(route)), route)
for route in route_generator(instance_size))
I made the Nodes
into a global dict with integers as key, named NODES
. After some testing I found that the extreme variant was slower than expected, so I fumbled around and found this to be the fastest in my environment:
def find_shortest_route3(nodes, route_length):
"""Find shortest route of length route_length from nodes."""
minimum_distance = None
for route in permutations(xrange(2, route_length+1)):
current_distance = 0
prev = nodes[1]
for next in route:
current_distance += distance(prev, nodes[next])
prev = nodes[next]
if minimum_distance and current_distance > minimum_distance:
break
else:
if not minimum_distance or current_distance < minimum_distance:
minimum_distance = current_distance
minimum_route = route
return minimum_distance, minimum_route
def main_minimum3(instance_size=INSTANCE_SIZE):
distance.clear()
cost, route = find_shortest_route3(NODES, instance_size)
print('Shortest route: {}'.format((1, ) + route))
print('Travel cost : {}'.format(cost))
Some comments on this code:
- It turns out that list comprehensions are slightly slower than plain for loops in this case
- Instead of completing the sum for a route, I break out of the inner loop if the sum has crossed an earlier minimum. The current route is then not interesting any more
- The actual route returned does not include the starting point. This can easily be added, as done in the
print
statement. This eliminates the need for either adding in the starting point to all routes, or to remove all routes not starting at the correct point. In short, it is more efficient- If the inner
for
loop completes ordinarily (no breaks), then it goes into theelse
block of thefor
loop. A slightly strange construct, but for this purpose ideal, as we then can compare the minimum distance versus the current route distance - The
distance.clear()
is used to reset the memoisation between test run done using timeit in IPython. Memory profiling was done using a module namememory_profiler
.
- If the inner
Here are some of the timing results I got:
In [325]: %timeit main_org_str(9) # Your original code
1 loops, best of 3: 529 ms per loop
In [326]: %timeit main_org(9) # Using int as keys
1 loops, best of 3: 304 ms per loop
In [327]: %timeit main_holroy_v2(9)
1 loops, best of 3: 420 ms per loop
In [328]: %timeit main_superbiasedman(9)
1 loops, best of 3: 362 ms per loop
In [330]: %timeit main_minimum3(9)
10 loops, best of 3: 146 ms per loop
In [330]: %timeit main_minimum3(9) # After removing `**0.5`
10 loops, best of 3: 116 ms per loop
As can be seen here your original version took around 0.5 seconds, but only changing into using ints as keys and memoisation, shaved off 0.2 seconds. My extreme version and version by SuperBiasedMan, actually uses longer time than your version when optimised. But still the fastest version is the minimum3
version, clocking in at 146 milliseconds. (The last number, 116 ms, comes if you remove the **0.5
in distance()
. Taking the square root is somewhat expensive.)
So there you have it, your code is not too bad timewize, but it used way too much memory. When coding according to the most pythonic ways, using generators, list comprehensions and sum
and min
you lower the memory consumption, changes the readability, but don't always gain that much in reduced time.
Of the alternatives I tested, the optimal combination for a brute force approach runs in approximately 30% of the time, and uses almost none memory. Some other timings for my optimal method: (10: 1.26 seconds, 11: 11.7 seconds, 12: 1 min 54 seconds). So better algorithms should be applied if wanting to calculate the shortest route for routes longer than 11/12 nodes.
Performance optimisations
In calCosts
you have nested for loops. Python is a pretty slow language, so there's a lot of time to save here if you arrange it right. Python has functions that will run loops in C source code, speeding them up considerably. One of these functions you can use here is sum
. It takes an iterable and returns the same of each value of the iterable. In your case, you can immediately set travelCost
by creating a generator expression and passing it to sum
. A generator expression is basically a for loop like expression that will evaluate all iterations of the loop. Here's how yours would look:
for route in routes:
travelCosts.append(sum(distance(nodes[str(route[i-1])], nodes[str(route[i])])
for i in range(1, len(route)))
Now instead of a slow Python loop you have a quick sum
calculation. However you could make it even faster by using a list comprehension. They're just like generator expressions except that they evaluate to a list instead of just an iterable expression. Here's how you could build travelCosts
in one line:
travelCosts = [sum(distance(nodes[str(route[i-1])], nodes[str(route[i])])
for i in range(1, len(route))
for route in routes]
This speeds up both loops considerably. One small note, you can just return a tuple directly, no need to build it at all.
#returns tuple of the route and its cost
return routes[travelCosts.index(smallestCost)], smallestCost
In genRoutes
you don't need to assign range
with a list comprehension. Instead call list
on it to create a list.
lang = list(range(2,routeLength+1))
You could also call map
on routes
to add 1 to the start of all of them. You can pass a lambda to map
, and a lambda is basically a small function. You wouldn't be able to pass x.insert
since it modifies in place, but you could use a concatenation expression which would be faster than your insert
anyway. I'd then only convert routes
to a list when returning it.
#uses built-in itertools to generate permutations
routes = map(list, itertools.permutations(lang))
#inserts the home city, must be the first city in every route
routes = map(lambda elem: [1] + elem, routes)
return list(routes)
Readability and style notes
Using ** 0.5
is a bit confusing as some people might forget that's the equivalent of getting the square root. It's worth leaving a comment explaining what that's doing.
Your naming doesn't match the Python style. Names should be in snake_case, constants should be in UPPER_SNAKE_CASE. So you should have names like instance_size
, gen_routes
etc.
When returning a tuple it makes more sense to use Python's multiple assignment. So you could do this:
shortest, cost = calCosts(routes, Nodes)
print("Shortest Route: ", shortest)
print("Travel Cost: ", cost)
-
\$\begingroup\$ Thank you, but one more question: After a certain number of routes the list crashes with a
MemoryError
, is there anyway to avoid this? \$\endgroup\$Mr. Foots– Mr. Foots2015年11月09日 18:39:54 +00:00Commented Nov 9, 2015 at 18:39 -
1\$\begingroup\$ @Mr.Foots Avoid having the list of all permutations, use generators instead. \$\endgroup\$Janne Karila– Janne Karila2015年11月09日 18:46:19 +00:00Commented Nov 9, 2015 at 18:46
-
\$\begingroup\$ @JanneKarila Aha! I will try those out \$\endgroup\$Mr. Foots– Mr. Foots2015年11月09日 18:49:57 +00:00Commented Nov 9, 2015 at 18:49
-
\$\begingroup\$ Have you actually tested your code this time @SuperBiasedMan? I have trouble getting it compile when patching together the different parts of your answer... \$\endgroup\$holroy– holroy2015年11月09日 22:39:56 +00:00Commented Nov 9, 2015 at 22:39
The performance-critical part is the loop in calCosts
. A couple of suggestions:
- Precompute distances between all pairs of nodes and store them in a dict. The brute-force search needs all of the distances many times.
- Avoid the
str
conversion in the loop by usingint
keys in the dict.
To avoid running out of memory, do not store all routes in a list. Make genRoutes
a generator instead.
Explore related questions
See similar questions with these tags.