A* pathfinding algorithm too slow

Question 1

I'm trying to make a pathfinding algorithm for my project. I need it for a simulation and I need it to be fast!!!

I watched a few videos explaining the algorithm online and came up with the code below.

However, the performance varies a lot considering the start/finish points. My project should have grids that go upwards of 500x500

A very slow example is the following, with a very small grid:

I have no idea what's making the algorithm so slow

Dir_Map = {
 "1": (1, 0),
 "2": (0, 1),
 "3": (-1, 0),
 "4": (0, -1),
 "5": (1, 1),
 "6": (-1, -1),
 "7": (1, -1),
 "8": (-1, 1)
}
class Node():
 def __init__(self, pos):
 self.x = pos[0]
 self.y = pos[1]
 self.g = 0
 self.h = 0
 self.f = 0
 self.beggining = 1
 self.parent = None
 def update(self, final):
 self.h = manhattan(self, final)
 self.f = self.f + self.g 
class Child(Node):
 def __init__(self, x, y, parent):
 Node.__init__(self, (parent.x + x, parent.y + y))
 self.g = abs(x) + abs(y) + parent.g
 self.beggining = 0
 self.came_from = (parent.x, parent.y)
 self.parent = parent
def sortbyF(node):
 return node.f
def manhattan(current, final):
 return abs(current.x - final.x) + abs(current.y - final.y)
def check_List(list, successor):
 for ind, val in enumerate(list):
 if (val.x, val.y) == (successor.x, successor.y):
 if val.f < successor.f:
 return 1
 return 0
openlist = []
closedlist = []
def recursion(node):
 if node.beggining is 0:
 closedlist.append(node.came_from)
 recursion(node.parent)
def pfind(grid, start, finish):
 start_node = Node((start[0], start[1]))
 finish_node = Node((finish[0], finish[1]))
 Stop = 0
 openlist.append(start_node)
 while len(openlist) is not 0:
 current = openlist.pop(0)
 for dirs in Dir_Map.values():
 suc = Child(dirs[0], dirs[1], current)
 if grid[suc.x][suc.y] is not 1 and suc.x <= len(grid[:][1]) and suc.y <= len(grid[1][:]):
 suc.update(finish_node)
 if (suc.x, suc.y) == finish:
 #Added so it adds the last step.
 #New suc.came_from = finish
 suc = Child(dirs[0], dirs[1], suc)
 #retrace the path.
 recursion(suc)
 closedlist.reverse()
 return closedlist
 if check_List(openlist, suc):
 continue
 openlist.append(suc)
 else:
 continue
 openlist.sort(key=sortbyF)
 return None
grid = [[0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 1, 1, 1, 0],
 [0, 0, 1, 0, 0, 0, 1, 0],
 [0, 0, 1, 0, 0, 0, 1, 0],
 [0, 0, 1, 1, 1, 1, 1, 0],
 [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0]]
print(pfind(grid, (0,0), (4,3)))

Question 2

Judging from your directions, you allow diagonal movement, so using the Manhattan distance as heuristic might violate the conditions for a heuristic to be admissible, namely that it must never overestimate the cost. You will almost always want to use the Euclidean distance as heuristic for such grid worlds.

Question 3

That has little to do with performance in the first place, but more with the assumptions under that optimality of the found path is guaranteed.

Question 4

@AlexV: With diagonal movement allowed, half the Manhattan distance is a correct heuristic. That's because a single diagonal movement is equivalent to two orthogonal moves.

Question 5

@MSalters: it is a valid heuristic, although not the tightest available heuristic. As mentioned by harold in the comments on Alex's answer, Chebyshev is better. That is, the maximum of the vertical and horizontal distance.

Question 6

Speaking of heuristics, is this line meant to say that? self.f = self.f + self.g Methinks one of those fs should be a h.

Question 7

There is some reasonable general algorithmic and python-specific advice here.

The lesson that needs to precede everything is, until you know where you're spending your time, you can't work out how to speed things up. It's well worth learning to use python's cProfile or another profiling tool.

Here's a run with the profiler of your example.

C:\Users\Josiah\Desktop>python -m cProfile -s cumtime astar.py
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 3)]
 6911416 function calls (6911411 primitive calls) in 8.103 seconds
 Ordered by: cumulative time
 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.000 0.000 8.103 8.103 {built-in method builtins.exec}
 1 0.000 0.000 8.103 8.103 astar.py:2(<module>)
 1 0.042 0.042 8.102 8.102 astar.py:59(pfind)
 15538 6.918 0.000 6.918 0.000 astar.py:43(check_List)
 2082 0.645 0.000 1.087 0.001 {method 'sort' of 'list' objects}
 6719169 0.442 0.000 0.442 0.000 astar.py:37(sortbyF)
 16658 0.019 0.000 0.031 0.000 astar.py:28(__init__)
 15539 0.008 0.000 0.018 0.000 astar.py:22(update)
 16660 0.010 0.000 0.010 0.000 astar.py:13(__init__)
 15539 0.008 0.000 0.010 0.000 astar.py:40(manhattan)
 64394 0.004 0.000 0.004 0.000 {built-in method builtins.abs}
 33161 0.003 0.000 0.003 0.000 {built-in method builtins.len}
 2083 0.002 0.000 0.002 0.000 {method 'pop' of 'list' objects}
 8494 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects}
 2083 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects}
 1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
 2 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__}
 6/1 0.000 0.000 0.000 0.000 astar.py:53(recursion)
 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
 1 0.000 0.000 0.000 0.000 astar.py:12(Node)
 1 0.000 0.000 0.000 0.000 {method 'reverse' of 'list' objects}
 1 0.000 0.000 0.000 0.000 astar.py:27(Child)

cumtime is the amount of time spent in a function and all functions it calls.
tottime is the amount of time just spent in the function.

What you can see is that the whole run takes 8.1 seconds, and 6.9 of those are spent directly inside check_List. That's 85% of the whole program's runtime, and basically the only place that's worth optimising. After all, even if you got everything else to take no time at all, you'd only reduce your runtime by 15%. Your only options are call check_List less or make it faster.

The purpose of check_List is to determine whether successor is the fastest way seen thus far to get to wherever successor goes to. You've done that by looping over a list. It would be faster to maintain and check a dictionary of all the places you've seen thus far.

That is, change check_list to check_dict

def check_dict(openDict, successor):
 if (successor.x, successor.y) in openDict:
 return openDict[(successor.x, successor.y)].f < successor.f
 return False

and instead of

 if check_List(openlist, suc):
 continue
 openlist.append(suc)

You'd want

 if check_Dict(openDict, suc):
 continue
 openlist.append(suc)
 openDict[(suc.x, suc.y)] = suc

My variation, which changes nothing except replacing check_list with check_dict as above, runs in 0.025 seconds instead of 8.1.

You might notice that that is significantly better than an 85% speed up, which seems odd because 15% of the run time wasn't actually in check_list at all.

As it happens, my variation fixes a subtle bug that I didn't see initially. As above, check_list is meant to prevent you from revisiting nodes that you've already found a way to. It does that by exhaustively checking all the nodes that you are currently considering for a next step. However, that's not the same! Because of current = openlist.pop(0) there are some entries that you have found routes to, but are no longer part of openlist! Because you don't have any indicator to avoid going back to them, you bounce back and forth among your early nodes.

Patching that bug by separately storing each node you pop off into current and searching them as well in check_list gets almost as good performance as using the dictionary. Well, it doesn't. It takes 0.125 seconds on my machine, which is a whole five times slower than the dictionary, and liable to get worse as the graph gets bigger. But it's a good sight better than 8 seconds!

Question 8

I upvoted you as soon as I got to "measure first", but that is a fantastic answer (and a performance improvement of x300 is not to be sneezed at!)

Question 9

This answer was amazing. Thank you very much! I had no idea I could check how long each function took. I changed to a dictionary like you said and instead of using an open list I used and open set. Instead of sorting, I did min(openset, key = sortbyF) which was also faster

Question 10

And this along with the question is why knowing your data structures is important. Nice pointer for discussions^^

Question 11

Great answer. Summing up: the original poster should first have checked to see that they actually implemented the algorithm correctly, which they did not. Second, measure against a goal. Third, if your goal is not met, use your measurements to find the hot spots. Fourth, optimize the hot spots by understanding why they are slow. Fifth, measure again, and repeat the process.

Question 12

Avoid recursion if the iterative solution is easily available. Rewrite recursion in a tail-recursive fashion:
```
def recursion(node):
 if node.beginning is not 0:
 return
 closedlist.append(node.came_from)
 recursion(node.parent)
```
and eliminate the tail-recursion completely:
```
def recursion(node):
 while node.beginning is 0:
 closedlist.append(node.came_from)
 node = node.parent
```
As a perk benefit, it is clear now that the recursion just traces the path from the node upwards. Call it appropriately
```
def trace_path(node):
```
Handling openlist looks very suboptimal.
- list.pop(0) has a linear time complexity.
- Along the same line, check_list, which is also a linear pass over the openlist, is also performed at every iteration.
- You sort it on every iteration of the
```
while len(openlist) is not 0
```
  loop. Meanwhile, the iteration adds very few (at most 8, if I am not mistaken) nodes to it. That means that you are constantly sorting an almost sorted list. Usually, it is not the best strategy.
All that said, a sorted dictionary looks more promising than a list.

Question 13

You mean a sorted dictionary with the f cost as key? That makes sense I guess but wouldn't extra code be needed for the cases where multiple nodes have the same cost? popping from lists is easy but doing it from a dictionary is a pain

Question 14

@AndréRocha: Because that's what priority queues are made for.

Question 15

Without disputing the principle in general, Python specifically uses "Timsort" which is better optimised to handle near-sorted lists than say quicksort.

Question 16

Since you want to go for performance and your environment is tightly constrained in terms of which nodes are neighbors, you should not use a generic graph structure. E.g. you can use a grid to store the costs of each of the nodes you have computed, as well as a similar grid to store a reference or whatever to the parent of each node.

Also, don't use a normal list for the openlist since you will have to sort it repeatedly if you follow the naive approach as you do at the moment. Python has the heapq module or queue.PriorityQueue exactly for that purpose. You would then use the node's cost + its estimated heuristic cost as priority value - Python does: the lower the value, the higher up. Furthermore, it's also possible to implement A* without using a closelist, which would eliminate another lump of data you would have to care for.

There is also no need to look at all the elements of the openlist once you have found a path to the goal. IIRC you can terminate early once the goal is part of the openlist (under the assumption of an admissible heuristic).

I still stand by what I said about the heuristic function in a comment under your code. You will usually always want to use the Euclidean distance for grid worlds as yours. Python will also happily support you in computing this distance with hypot from the math module. Using hypot(dx, dy) will likely be at least little bit faster than sqrt(dx*dx + dy*dy) and also numerically more stable. The comments also describe heuristics that might even be more suitable for this case.

Addendum

I cobbled together some code that still ignores some of the recommendations above and reuses some of the structural ideas (like a modified version of the Node class), but is considerably faster while producing the same result. Also note that this in an implementation that does not use a closelist.

Your original implementation took about 12s here on my machine. The version runs in less than 1ms.

from heapq import heappush, heappop
from math import hypot, sqrt
SQRT2 = sqrt(2.0)
DIRECTIONS = ((1, 0, 1.0), (0, 1, 1.0), (-1, 0, 1.0), (0, -1, 1.0),
 (1, 1, SQRT2), (-1, -1, SQRT2), (1, -1, SQRT2), (-1, 1, SQRT2))
class Node:
 def __init__(self, x, y, cost=float("inf"), h=0, parent=None):
 self.x = x
 self.y = y
 self.cost = cost
 self.h = h
 self.parent = parent
 def update(self, new_parent, new_cost):
 self.parent = new_parent
 self.cost = new_cost
 def __repr__(self):
 return "Node(x={x}, y={y}, cost={cost}, h={h}, parent={parent})".format(**self.__dict__)
 @property
 def priority(self):
 return self.cost + self.h
 @property
 def pos(self):
 return self.x, self.y
 def __eq__(self, other):
 return self.x == other.x and self.y == other.y
 def __lt__(self, other):
 """This allows Node to be used in the priority queue directly"""
 return self.priority < other.priority
def make_grid(n_rows, n_cols, value):
 """Make a n_rows x n_cols grid filled with an initial value"""
 return [[value for _ in range(n_cols)] for _ in range(n_rows)]
def euclidean_distance(node1, node2):
 """Compute the Euclidean/L2 distance between two nodes"""
 return hypot(node1[0] - node2[0], node1[1] - node2[1])
def is_valid(x, y, grid, x_max, y_max):
 """Check the bounds and free space in the map"""
 if 0 <= x < x_max and 0 <= y < y_max:
 return grid[x][y] == 0
 return False
def pfind_new(grid, start, goal):
 x_max, y_max = len(grid), len(grid[0])
 # None will later be used as sentinel for "no node here (yet)"
 nodes = make_grid(x_max, y_max, None)
 start_node = Node(*start, cost=0, h=euclidean_distance(start, goal))
 nodes[start_node.x][start_node.y] = start_node
 goal_node = Node(*goal)
 nodes[goal_node.x][goal_node.y] = goal_node
 # openlist will be used a priority queue and has to be accessed using
 # heappush and heappop from the heapq module. The Node class was modified
 # to work well with this kind of datastructure.
 openlist = []
 heappush(openlist, start_node)
 found = False
 while not found:
 # get the node with the least overall cost (actual + heuristic)
 current = heappop(openlist)
 for direction in DIRECTIONS:
 # compute new coordinates
 x_n, y_n = current.x + direction[0], current.y + direction[1]
 if not is_valid(x_n, y_n, grid, x_max, y_max):
 continue
 # we have valid coordinates
 if nodes[x_n][y_n] is None:
 nodes[x_n][y_n] = Node(
 x_n, y_n, h=euclidean_distance((x_n, y_n), goal)
 )
 # the new cost is made up if the current cost + transition
 new_cost = nodes[current.x][current.y].cost + direction[2]
 if new_cost < nodes[x_n][y_n].cost:
 # cool, we have found a faster path to this node, let's update
 # it's predecessor
 nodes[x_n][y_n].update(current.pos, new_cost)
 heappush(openlist, nodes[x_n][y_n])
 if nodes[x_n][y_n] == goal_node:
 # we're done, get out of here
 found = True
 break
 # openlist is empty and we have not bailed out with found. seems like
 # there is nothing we can do here
 if not openlist:
 return []
 # backtracking
 path = []
 current = goal_node
 # this is a little bit weird because I decided to store only the
 # coordinates instead of the parent itself. Why? Because repr(node) is way
 # more readable that way ;-)
 while True:
 path.append(current.pos)
 if current.parent is not None:
 current = nodes[current.parent[0]][current.parent[1]]
 else:
 break
 # the path is built by backtracking from the goal, so we have to reverse it
 return path[::-1]
if __name__ == "__main__":
 import timeit
 grid = [[0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 1, 1, 1, 0],
 [0, 0, 1, 0, 0, 0, 1, 0],
 [0, 0, 1, 0, 0, 0, 1, 0],
 [0, 0, 1, 1, 1, 1, 1, 0],
 [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0]]
 start = (0, 0)
 goal = (4, 3)
 t_start = timeit.default_timer()
 path = pfind_new(grid, start, goal)
 print(f"took: {timeit.default_timer() - t_start}")
 assert path == [(0, 0), (1, 1), (2, 2), (3, 3), (4, 3)]

Question 17

I just want to highlight the last comment about heuristic functions. Correctness is more important than speed, and using a heuristic that doesn't properly bound the cost can give an incorrect result.

Question 18

Euclidean distance isn't good for grid worlds at all, it underestimates the distance between almost any pair of points (on an open plain), causing unnecessary node expansion. Chebyshev distance or Octile distance (depending on the diagonal movement cost) are appropriate when diagonal movement is allowed, then the only underestimation is caused by obstacles.

Question 19

@harold: I come from a robotics background and it's usually deemed appropriate for global path planning on grid maps. But this is not to be read as if I wanted to question your expertise.

Question 20

Interesting, is movement at any angle used in that context?

Question 21

Would you like to discuss this in chat?

Josiah Josiah 6,1062 gold badges18 silver badges39 bronze badges · Accepted Answer · 2019-06-26 23:38:40Z

There is some reasonable general algorithmic and python-specific advice here.

The lesson that needs to precede everything is, until you know where you're spending your time, you can't work out how to speed things up. It's well worth learning to use python's cProfile or another profiling tool.

Here's a run with the profiler of your example.

C:\Users\Josiah\Desktop>python -m cProfile -s cumtime astar.py
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 3)]
 6911416 function calls (6911411 primitive calls) in 8.103 seconds
 Ordered by: cumulative time
 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.000 0.000 8.103 8.103 {built-in method builtins.exec}
 1 0.000 0.000 8.103 8.103 astar.py:2(<module>)
 1 0.042 0.042 8.102 8.102 astar.py:59(pfind)
 15538 6.918 0.000 6.918 0.000 astar.py:43(check_List)
 2082 0.645 0.000 1.087 0.001 {method 'sort' of 'list' objects}
 6719169 0.442 0.000 0.442 0.000 astar.py:37(sortbyF)
 16658 0.019 0.000 0.031 0.000 astar.py:28(__init__)
 15539 0.008 0.000 0.018 0.000 astar.py:22(update)
 16660 0.010 0.000 0.010 0.000 astar.py:13(__init__)
 15539 0.008 0.000 0.010 0.000 astar.py:40(manhattan)
 64394 0.004 0.000 0.004 0.000 {built-in method builtins.abs}
 33161 0.003 0.000 0.003 0.000 {built-in method builtins.len}
 2083 0.002 0.000 0.002 0.000 {method 'pop' of 'list' objects}
 8494 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects}
 2083 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects}
 1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
 2 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__}
 6/1 0.000 0.000 0.000 0.000 astar.py:53(recursion)
 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
 1 0.000 0.000 0.000 0.000 astar.py:12(Node)
 1 0.000 0.000 0.000 0.000 {method 'reverse' of 'list' objects}
 1 0.000 0.000 0.000 0.000 astar.py:27(Child)

cumtime is the amount of time spent in a function and all functions it calls.
tottime is the amount of time just spent in the function.

What you can see is that the whole run takes 8.1 seconds, and 6.9 of those are spent directly inside check_List. That's 85% of the whole program's runtime, and basically the only place that's worth optimising. After all, even if you got everything else to take no time at all, you'd only reduce your runtime by 15%. Your only options are call check_List less or make it faster.

The purpose of check_List is to determine whether successor is the fastest way seen thus far to get to wherever successor goes to. You've done that by looping over a list. It would be faster to maintain and check a dictionary of all the places you've seen thus far.

That is, change check_list to check_dict

def check_dict(openDict, successor):
 if (successor.x, successor.y) in openDict:
 return openDict[(successor.x, successor.y)].f < successor.f
 return False

and instead of

 if check_List(openlist, suc):
 continue
 openlist.append(suc)

You'd want

 if check_Dict(openDict, suc):
 continue
 openlist.append(suc)
 openDict[(suc.x, suc.y)] = suc

My variation, which changes nothing except replacing check_list with check_dict as above, runs in 0.025 seconds instead of 8.1.

You might notice that that is significantly better than an 85% speed up, which seems odd because 15% of the run time wasn't actually in check_list at all.

As it happens, my variation fixes a subtle bug that I didn't see initially. As above, check_list is meant to prevent you from revisiting nodes that you've already found a way to. It does that by exhaustively checking all the nodes that you are currently considering for a next step. However, that's not the same! Because of current = openlist.pop(0) there are some entries that you have found routes to, but are no longer part of openlist! Because you don't have any indicator to avoid going back to them, you bounce back and forth among your early nodes.

Patching that bug by separately storing each node you pop off into current and searching them as well in check_list gets almost as good performance as using the dictionary. Well, it doesn't. It takes 0.125 seconds on my machine, which is a whole five times slower than the dictionary, and liable to get worse as the graph gets bigger. But it's a good sight better than 8 seconds!

I upvoted you as soon as I got to "measure first", but that is a fantastic answer (and a performance improvement of x300 is not to be sneezed at!)
This answer was amazing. Thank you very much! I had no idea I could check how long each function took. I changed to a dictionary like you said and instead of using an open list I used and open set. Instead of sorting, I did min(openset, key = sortbyF) which was also faster
And this along with the question is why knowing your data structures is important. Nice pointer for discussions^^
Great answer. Summing up: the original poster should first have checked to see that they actually implemented the algorithm correctly, which they did not. Second, measure against a goal. Third, if your goal is not met, use your measurements to find the hot spots. Fourth, optimize the hot spots by understanding why they are slow. Fifth, measure again, and repeat the process.

Stack Exchange Network

A* pathfinding algorithm too slow

3 Answers 3

Addendum

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

A* pathfinding algorithm too slow

3 Answers 3

Addendum

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions