Optimisation of A* in C++

Question 1

I'm working on an A* pathfinding algorithm to solve a programming problem, and part of the constraint is the runtime. I need to find a path in a 2D grid with blocked or non-blocked tiles. You can't move diagonally.

I have succeeded to make a working prototype of the function which can and will solve the problems encountered, but it lacks optimisation.

I think that I need to optimise my heuristic function. For now, I calculate candidly: the distance between the current tile and the goal if no tile were blocked, plus the current cost.

n.distGoal = abs(n.x - nTargetX) + abs(n.y - nTargetY) + n.cost;

Interestingly, I've found an answer that does not include the cost when calculating its heuristic. When trying it out, my algorithm doesn't work anymore.

Have you any clues on how I could make the function more optimised?

Here are a few more snippets of my code, to help understand and maybe spot an error that causes long runtime:

Finding the neighbours:

set<tile> neighbours;
 for(int k=-1;k<2;k+=2){
 if(!(cur.x+k < 0|| cur.x+k>=nMapWidth)){ //if we're out of bonds, there's nowhere to go
 struct tile neigh;
 neigh.x = cur.x+k;
 neigh.y = cur.y; //yum
 neigh.mapWidth = nMapWidth;
 int positionInMap = neigh.x + (neigh.y*nMapWidth);
 if(!(pMap[positionInMap] == 0)){ //if we can walk there
 neigh.cost = cur.cost + 1;
 neighbours.insert(neigh);
 }
 }
 if(!(cur.y+k < 0 || cur.y+k>=nMapHeight)){ //if we're out of bonds, there's nowhere to go
 struct tile neigh;
 neigh.x = cur.x;
 neigh.y = cur.y+k; 
 neigh.mapWidth = nMapWidth;
 int positionInMap = neigh.x + (neigh.y*nMapWidth);
 if(!(pMap[positionInMap] == 0)){ //if we can walk there
 neigh.cost = cur.cost + 1;
 neighbours.insert(neigh);
 }
 }
 }

Reconstructing the path when we reached the end:

void buildPath(map<tile,tile> cameFrom, tile cur, int* pOutBuffer){ //Reconstructs the path
 int i = cur.distGoal - 1;
 pOutBuffer[i] = cur.x + cur.y*cur.mapWidth;
 map<tile,tile>::iterator it = cameFrom.find(cur);
 while (it != cameFrom.end()){
 i--;
 if(i<0){
 break;
 }
 cur = it->second;
 pOutBuffer[i] = cur.x + cur.y*cur.mapWidth;
 it = cameFrom.find(cur);
 }
}

Question 2

The cost is definitely needed. The answer you linked to only mentions the estimated distance from the current node to the goal for brevity, since those two parts are usually separated. I don't think you will get huge gains with another distance. Is your code really that slow? Did you profile it? Do you want to use it somewhere specific or is just an exercise?

Question 3

Here is a paste to my entire code, so it's easier to understand. pastebin.com/FF5Dm0d5 I'm trying to enter a code challenge to improve my C++ skills, but my program sadly doesn't meet the runtime criteria : so too slow! I'm trying to profile it with chrono, which you can see in the paste. My cost is calculated as neigh.cost = cur.cost + 1, as all tiles have either a cost of 1 or are impassable.

Question 4

If you look at the pseudocode on the A* article on Wikipedia you'll see that you're missing the the heuristics part. Without the heuristics it's not A*.

Question 5

I've added heuristics and changed from using a map to using a vector. Also, tiles now have their parent coordinates as an attribute, aswell as a visited boolean. The code is significantly faster! But apparently, it doesn't solve corner cases. Thanks for your help, I'm back to work I guess!

Question 6

Interestingly, I've found an answer that does not include the cost when calculating its heuristic. When trying it out, my algorithm doesn't work anymore.

Let's make sure that you understand the algorithm. It is:

We have a queue where the entries in the queue are a possible path and an estimate of the cost of that path.
The queue keeps itself sorted by cost, lowest cost path first, highest cost path last

Basically the idea is to assume that the lowest-cost path is the one to explore first; only if that does not work out do we try the next-lowest.

So now let's talk about the cost. The estimated cost is the sum of:

The known cost of the path so far
The estimated cost of getting to the goal from the end of the path

If you've already gone 100 squares (known) and you think you have 50 more to go (estimated), the total estimated cost is 150.

You MUST include the known cost in with the estimated cost of the remainder; if you did not then you would, say, take a path known to be at least 100 long and put it in the queue as though it had a cost of 50!

So you must compute the estimated cost in two parts: the known cost of the path, and the estimated cost of the remainder. It is the second part that is usually called the "heuristic", because it's a guess. The known cost can be computed exactly.

I think that I need to optimise my heuristic function. For now, I calculate candidly : the distance between the current tile and the goal if no tile were blocked, plus the current cost.

The key feature of the heuristic is that it must never overestimate. A heuristic which has this property is called admissible.

It sounds like your heuristic is admissible. If you can only move orthogonally, not diagonally, then the Manhattan distance is admissible.

Your heuristic is called the Manhattan distance because it's the distance between two points in New York City, if you're constrained to moving on the grid rather than going diagonally.

If you can move diagonally then the Manhattan distance can overestimate, and is not admissible. If diagonal movement is allowed then use the Euclidean distance: the straight-line distance between two points.

One way you can test your algorithm is to use zero as your heuristic; just say that the estimated cost is the known cost plus nothing. That is an admissible heuristic because it always under-estimates. A-star with a zero heuristic is also called Dijkstra's Algorithm.

You say you want to optimize your heuristic. The best way to optimize the heuristic is to make it as accurate as possible without ever overestimating, while at the same time making it fast to compute. The Manhattan distance is a great heuristic because it is very fast to compute, is often exact, and never overestimates.

The more accurate the heuristic is, the fewer paths the algorithm has to consider before finding the best one. So accuracy is important.

I would stick to using the Manhattan distance and profile your code to figure out where the performance problem is. Likely the problem is not with your heuristic at all, but with maintaining the heap invariant in the priority queue, or with figuring out which nodes are already visited.

If you're interested in this subject you might want to read my series of articles from 2007 on implementing a-star in C#.

https://blogs.msdn.microsoft.com/ericlippert/tag/astar/

Eric Lippert Eric Lippert 15k4 gold badges39 silver badges57 bronze badges · Accepted Answer · 2016-09-22 17:57:24Z

Interestingly, I've found an answer that does not include the cost when calculating its heuristic. When trying it out, my algorithm doesn't work anymore.

Let's make sure that you understand the algorithm. It is:

We have a queue where the entries in the queue are a possible path and an estimate of the cost of that path.
The queue keeps itself sorted by cost, lowest cost path first, highest cost path last

Basically the idea is to assume that the lowest-cost path is the one to explore first; only if that does not work out do we try the next-lowest.

So now let's talk about the cost. The estimated cost is the sum of:

The known cost of the path so far
The estimated cost of getting to the goal from the end of the path