Improving generic A* algorithm performance

Question 1

Here is my A* algorithm. I tried hard to implement it generically, but come up only with one idea to use lambdas for heuristic and next nodes (neighbours / successors) functions. I know that using good heuristic function is the key to speed up, but here i'm trying to squeeze everything from current implementation not taking into account any external factors.


/**
 * @brief generic a* algorithm
 *
 * @tparam T any type with operator< and operator==
 * @tparam Heuristic function, returns uint32_t
 * @tparam Mutator function returns iterable of all next states of node
 * @param initial initial state of node
 * @param expected expected state of node
 * @param heuristic function like manhattan or euclidean distance
 * @param next next mutator
 * @return std::vector<T> representing path from expected to initial
 */
template <class T, class Heuristic, class Mutator>
std::vector<T> path(const T& initial, const T& expected,
 Heuristic heuristic, Mutator next) {
 struct Node {
 using ref = std::shared_ptr<Node>;
 explicit Node(const T& data, ref parent, uint32_t g, uint32_t h)
 : data(data), parent(parent), g(g), h(h), f(g + h) {}
 const T data; // actual data
 const ref parent{}; // pointer to parent
 // scores
 const uint32_t g, h, f;
 };
 static auto path = [](auto n) {
 std::vector<T> result;
 for (; n; n = n->parent)
 result.push_back(n->data);
 return result;
 };
 // comparator for nodes::ref's to make heap
 static auto comp = [](auto&& lhs, auto&& rhs) { return lhs->f > rhs->f; };
 // std::vector<T> closed;
 std::set<T> closed; // <- required operator<
 // vector will be used as priority_queue
 std::vector<typename Node::ref> opened;
 // initial state with g score = 0
 opened.push_back(std::make_shared<Node>(initial, nullptr, 0,
 heuristic(initial, expected)));
 while (not opened.empty()) {
 // pop opened into curr
 const auto curr = opened.front();
 // if goal found
 if (curr->data == expected) return path(curr);
 // poping priority queue
 std::pop_heap(opened.begin(), opened.end(), comp);
 opened.pop_back();
 // foreach node in successors
 for (auto&& child : next(curr->data)) {
 // if was in closed
 if (!closed.insert(child).second) continue;
 // setting up child node
 const auto node = std::make_shared<Node>(child, curr, curr->g + 1,
 heuristic(child, expected));
 // find the node with the same data as child
 auto it = std::find_if(opened.begin(), opened.end(), [&child](auto&& node) {
 return node->data == child;
 });
 // if opened does not contains child data push and continue
 if (it == opened.end()) {
 opened.push_back(std::move(node));
 std::push_heap(opened.begin(), opened.end(), comp);
 continue;
 }
 // if current child node is better than found node
 // replace previos node with current
 if (node->f < (*it)->f) *it = node;
 }
 }
 return {}; // impossible to reach
}

Is there any way to improve performance?

Question 2

The not operator confused me for a second until I remembered about alternative operators. I'm not sure if it's considered good practice to use them?

Question 3

@jdt Why exactly 'not' itstead of ! is a bad practice?

Question 4

I’m not saying it is bad practice, it’s just something that you don't see very often. Generally, it’s a good idea to stick to a convention. If you had while (not opened.empty()) I would have expected to see the same in if (!closed.insert(child).second).

Question 5

Please don't change the question after it has been answered, especially something mentioned in the answer.

Question 6

I mean maybe 'not' has some overhead and that is why it is bad practice. But you're right about convention. I just like the way 'and', 'not' and 'or' sounds in code, makes it sound like a Shakespear's poem. (Wish I could say the same for the 'eq', 'not_eq', 'or_eq'...)

Question 7

Linear search through the whole heap

// find the node with the same data as child
auto it = std::find_if(opened.begin(), opened.end(), [&child](auto&& node) {
 return node->data == child;
});

This is a clear issue. It may seem like there is no way to avoid it, but there actually is. Unfortunately, it will mean having to re-implement the heap functions. The trick is this:

Maintain an unordered_map (or as secondary choice, map) from T to the index in the heap at which the node with a given data is located.
All heap-related functions must update the indices in that map when they move nodes. This is why they need to be re-implemented. It won't change their time complexity though.
"find node with the same data as child" can implemented as a lookup in the map.

"Re-parenting" a node changes its `f` score

*it = node; is not sufficient to update a node when a shortcut has been found to it. Its f score changed, that's why this is being done in the first place. Leaving it in same position may mean that it violates the heap property, which will have to be restored, otherwise future heap operations become unreliable.

Reachable unreachable code

The return in the end, return {}; // impossible to reach can be reached, namely when the goal is unreachable from the start and the amount of search space which is reachable from the start is finite.

Shared pointers

Nodes do not really need shared pointers to their parents, conceptually they do not own their parent after all, the parent pointers are just to say "go there to follow the path". What is really needed is these two things:

There must be some way to go from a node to its parent.
.. and that must be possible when the path finding is done, so the lifetime of this data must be until after the path has been traced.

That can be accomplished with shared pointers, but that has some disadvantages:

Creating nodes is non-trivial, requiring individual dynamic allocation.
Nodes have some additional, hidden, memory overhead (from being shared, and also from being individually allocated).
Getting rid of the nodes that still exist when the function returns is non-trivial.

There are various alternatives. For example (these can be combined/mixed to some extent):

All nodes could be owned by one big vector which is destroyed all in one go when the function returns (the destructor of Node can be trivial as long as T has a trivial destructor, then all the nodes in the vector just poof out of existence without ceremony). References to nodes can be replaced with indexes into that vector, or if you really want, you could still dynamically allocate the individual nodes and refer to them by non-owning pointer (except in the vector, which would own them with a unique pointer), but that negates some of the benefit of doing this.
The parent relationships could be recorded outside the nodes, in a map from T to T. This way the nodes are only needed as part of the Open set (the only thing needed from a closed node is its parent, but that isn't in the node anymore in this scheme), which could own them by value. This may be bad if T is a large type, because it creates a lot more Ts.
By the way, you are already using a similar approach for the closed set, which in some implementations is implemented as a bool isClosed in the Node class.
If the search space is a grid, temporary grids can be used to store the nodes (or its separate constituents). Nodes could be referred to by their position in the grid, parents specifically can also be recorded by their direction.

Question 8

OK, thanks a lot, i almost lost hope for good answers, and here is a marvelous one. Thanks, i'll try your suggestions tomorrow.

Question 9

@David let me know how it goes, this site doesn't require it I'm just curious

Question 10

Sorry, today was actually really busy day, finifshing my term work, i will inform you as soon as algorithm gets better. Btw i use two kinds of tests for it, i'm trying to solve real path between two vectors, and to solve 15 game. At the moment it takes nearly 13k generations and 120 depth level with 400-800 ms on debug to achieve goal with vectors like (0, 0) -> (80, 80) with step +-1 on all directions, but 15 game appears to be unsolvable.

Question 11

Avoid many small allocations.
Avoid shared ownership.
Avoid linear search.

All of those are expensive and a redesign of your data-structures will get rid of them:

A std::vector to store your nodes. You can stably refer to them by index now.
A std::unordered_map (or at least a std::map if the external index cannot be hashed) to map from the external identifiers to internal indices into the vector.

You should use a custom allocator for this, considering your specific use-case a simple linear allocator would work wonders.
A std::priority_queue to store the workitems.

Only store the data you need:
While all of f,g and h are used in explaining the algorithm, you really only need the estimated cost. Only if the heuristic is expensive should you consider storing that separately.

Don't hard-code a specific type needlessly. You can derive the type for the estimated cost of using a node from the return-value of the heuristic (and the edge-weights if you add those).

If on exploring a node you find a new node, calculate its priority and add a workitem.
If you find a better way to a node instead, update its priority and add a workitem.
Obsolete workitems can be discarded when you get to them, because the priority in the workitem doesn't match the node any longer. Fiddling with items somewhere in the middle of a heap easily destroys the heap, or at least gets expensive.

This rewrite also gets rid of the closed list, allowing you to use any admissible heuristic, not just a monotone (aka consistent) one. A restriction which you didn't document.

Avoid passing small trivial types by constant reference. The indirection might inhibit optimization.

What about weighted edges? Currently, your generic algorithm has a fixed edge-weight of 1.

If you add the endpoint to the path, you can distinguish no path found (empty) from trivial path found (end is start).

As an untested example:

template <class T, class F1, class F2>
std::vector<T> path(T start, T target, F1 heuristic, F2 next) {
 using Cost = decltype(heuristic(start, target)
 + std::get<0>(next(start).begin()));
 struct Node {
 const T data;
 std::size_t parent;
 Cost cost;
 };
 struct Open {
 Cost cost;
 std::size_t id;
 bool operator<(const Open& rhs) { return cost > rhs.cost; };
 };
 std::vector<Node> nodes {{ start, 0, heuristic(start, target)}};
 std::unordered_map<T, std::size_t> ids {{ nodes[0].data, 0}};
 // ^^ Find or write a simple linear allocator for all those nodes
 std::priority_queue<Open> open;
 open.emplace(nodes[0].cost, 0);
 while (!open.empty()) {
 auto [cost, id] = open.top();
 open.pop();
 if (nodes[id].cost != cost)
 continue;
 if (nodes[id].data == target) {
 std::vector<T> r;
 for (; id; id = nodes[id].parent)
 r.push_back(nodes[id].data);
 r.push_back(nodes[id].data);
 std::reverse(r.begin(), r.end());
 return r;
 }
 cost -= heuristic(nodes[id].data, target);
 for (auto&& [weight, data] : next(nodes[id].data)) {
 auto [p, added] = ids.try_emplace(data, nodes.size());
 auto estimate = cost + weight + heuristic(data, target);
 if (added)
 nodes.emplace_back(data, id, estimate);
 else if (estimate < nodes[p->second()].cost)
 nodes[p->second()].cost = estimate;
 else
 continue;
 open.emplace(estimate, p->second());
 }
 }
 return {};
}

Question 12

Changes

Ok so after @harold and @Deduplicator gave me a couple pieces of advise (Big thanks btw) I ended up with the following implementation. Performance win is not huge but there is one, about 100-300 ms for 8game solve and 50 ms for two dimensional vectors.


namespace {
// the reason why this structure is not in closure
// is because of operator< has to be friend
// and friend is only available outside closure
struct Open {
 size_t id; // vector stack id instead of ptr
 uint32_t g, h; // still separate variables
 explicit Open(size_t id, uint32_t g, uint32_t h) : id(id), g(g), h(h) {}
 // comparator for priority queue
 friend bool operator<(Open lhs, Open rhs) { return lhs.g + lhs.h > rhs.g + rhs.h; }
};
} // namespace
/**
 * @brief generic a* algorithm
 *
 * @tparam T any type with operator< and operator==
 * @tparam Heuristic function, returns uint32_t
 * @tparam Mutator function returns iterable of all next states of node
 * @param initial initial state of node
 * @param expected expected state of node
 * @param heuristic function like manhattan or euclidean distance
 * @param next next mutator
 * @return std::vector<T> representing path from expected to initial
 */
template <class T, class Heuristic, class Mutator>
std::vector<T> path(const T& initial, const T& expected,
 Heuristic heuristic, Mutator next, size_t nextWeight = 1) {
 struct Node {
 const T data; // actual data
 size_t parent; // index to parent instead of ptr
 uint32_t f;
 explicit Node(const T& data, size_t parent, uint32_t f)
 : data(data), parent(parent), f(f) {
 }
 };
 std::vector<Node> nodes{Node(initial, 0, heuristic(initial, expected))}; // as was suggested
 // now pq, there is no need to iterate through underlying vector anymore
 std::priority_queue<Open> opened;
 // map because i don't want to force user implement hash function for their data
 std::map<T, size_t> closed{{initial, 0}}; 
 // initial state
 opened.emplace(0, 0, nodes[0].f);
 while (not opened.empty()) {
 // pop opened into curr
 const auto curr = opened.top();
 opened.pop();
 // if goal found
 if (nodes[curr.id].data == expected) {
 // no lambda for better performance?
 std::vector<T> result;
 result.reserve(curr.g / nextWeight); // using g for our advantage
 auto id = curr.id;
 for (; id; id = nodes[id].parent)
 result.push_back(std::move(nodes[id].data)); // move because we don't need this data anymore in closure
 result.push_back(std::move(nodes[id].data));
 return result; // not reversing this is users responsibility
 }
 // foreach node in successors
 for (auto&& child : next(nodes[curr.id].data)) {
 // trying to insert into closed map
 const auto [it, inserted] = closed.try_emplace(child, nodes.size());
 // precomputing g and h values
 const auto g = curr.g + nextWeight;
 const auto h = heuristic(child, expected);
 // data wasn't in closed creating new node in nodes vector
 if (inserted)
 nodes.emplace_back(child, curr.id, g + h);
 // if f cost of existed node is worse than current just updating cost
 else if (g + h < nodes[it->second].f)
 nodes[it->second].f = g + h;
 else 
 continue;
 
 // finally constructing new open node in open queue
 opened.emplace(it->second, g, h);
 }
 }
 return {}; // path not found
}

Question 13

Did you try plugging a linear allocator into the map? (not included in the standard library)

score 4 · Answer 1 · 2021-11-27 18:55:07Z

Linear search through the whole heap

// find the node with the same data as child
auto it = std::find_if(opened.begin(), opened.end(), [&child](auto&& node) {
 return node->data == child;
});

This is a clear issue. It may seem like there is no way to avoid it, but there actually is. Unfortunately, it will mean having to re-implement the heap functions. The trick is this:

Maintain an unordered_map (or as secondary choice, map) from T to the index in the heap at which the node with a given data is located.
All heap-related functions must update the indices in that map when they move nodes. This is why they need to be re-implemented. It won't change their time complexity though.
"find node with the same data as child" can implemented as a lookup in the map.

"Re-parenting" a node changes its `f` score

*it = node; is not sufficient to update a node when a shortcut has been found to it. Its f score changed, that's why this is being done in the first place. Leaving it in same position may mean that it violates the heap property, which will have to be restored, otherwise future heap operations become unreliable.

Reachable unreachable code

The return in the end, return {}; // impossible to reach can be reached, namely when the goal is unreachable from the start and the amount of search space which is reachable from the start is finite.

Shared pointers

Nodes do not really need shared pointers to their parents, conceptually they do not own their parent after all, the parent pointers are just to say "go there to follow the path". What is really needed is these two things:

There must be some way to go from a node to its parent.
.. and that must be possible when the path finding is done, so the lifetime of this data must be until after the path has been traced.

That can be accomplished with shared pointers, but that has some disadvantages:

Creating nodes is non-trivial, requiring individual dynamic allocation.
Nodes have some additional, hidden, memory overhead (from being shared, and also from being individually allocated).
Getting rid of the nodes that still exist when the function returns is non-trivial.

There are various alternatives. For example (these can be combined/mixed to some extent):

All nodes could be owned by one big vector which is destroyed all in one go when the function returns (the destructor of Node can be trivial as long as T has a trivial destructor, then all the nodes in the vector just poof out of existence without ceremony). References to nodes can be replaced with indexes into that vector, or if you really want, you could still dynamically allocate the individual nodes and refer to them by non-owning pointer (except in the vector, which would own them with a unique pointer), but that negates some of the benefit of doing this.
The parent relationships could be recorded outside the nodes, in a map from T to T. This way the nodes are only needed as part of the Open set (the only thing needed from a closed node is its parent, but that isn't in the node anymore in this scheme), which could own them by value. This may be bad if T is a large type, because it creates a lot more Ts.
By the way, you are already using a similar approach for the closed set, which in some implementations is implemented as a bool isClosed in the Node class.
If the search space is a grid, temporary grids can be used to store the nodes (or its separate constituents). Nodes could be referred to by their position in the grid, parents specifically can also be recorded by their direction.

OK, thanks a lot, i almost lost hope for good answers, and here is a marvelous one. Thanks, i'll try your suggestions tomorrow.
@David let me know how it goes, this site doesn't require it I'm just curious
Sorry, today was actually really busy day, finifshing my term work, i will inform you as soon as algorithm gets better. Btw i use two kinds of tests for it, i'm trying to solve real path between two vectors, and to solve 15 game. At the moment it takes nearly 13k generations and 120 depth level with 400-800 ms on debug to achieve goal with vectors like (0, 0) -> (80, 80) with step +-1 on all directions, but 15 game appears to be unsolvable.

score 4 · Answer 2 · 2021-11-28 21:13:16Z

Avoid many small allocations.
Avoid shared ownership.
Avoid linear search.

All of those are expensive and a redesign of your data-structures will get rid of them:

A std::vector to store your nodes. You can stably refer to them by index now.
A std::unordered_map (or at least a std::map if the external index cannot be hashed) to map from the external identifiers to internal indices into the vector.

You should use a custom allocator for this, considering your specific use-case a simple linear allocator would work wonders.
A std::priority_queue to store the workitems.

Only store the data you need:
While all of f,g and h are used in explaining the algorithm, you really only need the estimated cost. Only if the heuristic is expensive should you consider storing that separately.

Don't hard-code a specific type needlessly. You can derive the type for the estimated cost of using a node from the return-value of the heuristic (and the edge-weights if you add those).

If on exploring a node you find a new node, calculate its priority and add a workitem.
If you find a better way to a node instead, update its priority and add a workitem.
Obsolete workitems can be discarded when you get to them, because the priority in the workitem doesn't match the node any longer. Fiddling with items somewhere in the middle of a heap easily destroys the heap, or at least gets expensive.

This rewrite also gets rid of the closed list, allowing you to use any admissible heuristic, not just a monotone (aka consistent) one. A restriction which you didn't document.

Avoid passing small trivial types by constant reference. The indirection might inhibit optimization.

What about weighted edges? Currently, your generic algorithm has a fixed edge-weight of 1.

If you add the endpoint to the path, you can distinguish no path found (empty) from trivial path found (end is start).

As an untested example:

template <class T, class F1, class F2>
std::vector<T> path(T start, T target, F1 heuristic, F2 next) {
 using Cost = decltype(heuristic(start, target)
 + std::get<0>(next(start).begin()));
 struct Node {
 const T data;
 std::size_t parent;
 Cost cost;
 };
 struct Open {
 Cost cost;
 std::size_t id;
 bool operator<(const Open& rhs) { return cost > rhs.cost; };
 };
 std::vector<Node> nodes {{ start, 0, heuristic(start, target)}};
 std::unordered_map<T, std::size_t> ids {{ nodes[0].data, 0}};
 // ^^ Find or write a simple linear allocator for all those nodes
 std::priority_queue<Open> open;
 open.emplace(nodes[0].cost, 0);
 while (!open.empty()) {
 auto [cost, id] = open.top();
 open.pop();
 if (nodes[id].cost != cost)
 continue;
 if (nodes[id].data == target) {
 std::vector<T> r;
 for (; id; id = nodes[id].parent)
 r.push_back(nodes[id].data);
 r.push_back(nodes[id].data);
 std::reverse(r.begin(), r.end());
 return r;
 }
 cost -= heuristic(nodes[id].data, target);
 for (auto&& [weight, data] : next(nodes[id].data)) {
 auto [p, added] = ids.try_emplace(data, nodes.size());
 auto estimate = cost + weight + heuristic(data, target);
 if (added)
 nodes.emplace_back(data, id, estimate);
 else if (estimate < nodes[p->second()].cost)
 nodes[p->second()].cost = estimate;
 else
 continue;
 open.emplace(estimate, p->second());
 }
 }
 return {};
}

David DavidDavid 514 bronze badges · Answer 3 · 2021-11-29 18:48:36Z

Changes

Ok so after @harold and @Deduplicator gave me a couple pieces of advise (Big thanks btw) I ended up with the following implementation. Performance win is not huge but there is one, about 100-300 ms for 8game solve and 50 ms for two dimensional vectors.


namespace {
// the reason why this structure is not in closure
// is because of operator< has to be friend
// and friend is only available outside closure
struct Open {
 size_t id; // vector stack id instead of ptr
 uint32_t g, h; // still separate variables
 explicit Open(size_t id, uint32_t g, uint32_t h) : id(id), g(g), h(h) {}
 // comparator for priority queue
 friend bool operator<(Open lhs, Open rhs) { return lhs.g + lhs.h > rhs.g + rhs.h; }
};
} // namespace
/**
 * @brief generic a* algorithm
 *
 * @tparam T any type with operator< and operator==
 * @tparam Heuristic function, returns uint32_t
 * @tparam Mutator function returns iterable of all next states of node
 * @param initial initial state of node
 * @param expected expected state of node
 * @param heuristic function like manhattan or euclidean distance
 * @param next next mutator
 * @return std::vector<T> representing path from expected to initial
 */
template <class T, class Heuristic, class Mutator>
std::vector<T> path(const T& initial, const T& expected,
 Heuristic heuristic, Mutator next, size_t nextWeight = 1) {
 struct Node {
 const T data; // actual data
 size_t parent; // index to parent instead of ptr
 uint32_t f;
 explicit Node(const T& data, size_t parent, uint32_t f)
 : data(data), parent(parent), f(f) {
 }
 };
 std::vector<Node> nodes{Node(initial, 0, heuristic(initial, expected))}; // as was suggested
 // now pq, there is no need to iterate through underlying vector anymore
 std::priority_queue<Open> opened;
 // map because i don't want to force user implement hash function for their data
 std::map<T, size_t> closed{{initial, 0}}; 
 // initial state
 opened.emplace(0, 0, nodes[0].f);
 while (not opened.empty()) {
 // pop opened into curr
 const auto curr = opened.top();
 opened.pop();
 // if goal found
 if (nodes[curr.id].data == expected) {
 // no lambda for better performance?
 std::vector<T> result;
 result.reserve(curr.g / nextWeight); // using g for our advantage
 auto id = curr.id;
 for (; id; id = nodes[id].parent)
 result.push_back(std::move(nodes[id].data)); // move because we don't need this data anymore in closure
 result.push_back(std::move(nodes[id].data));
 return result; // not reversing this is users responsibility
 }
 // foreach node in successors
 for (auto&& child : next(nodes[curr.id].data)) {
 // trying to insert into closed map
 const auto [it, inserted] = closed.try_emplace(child, nodes.size());
 // precomputing g and h values
 const auto g = curr.g + nextWeight;
 const auto h = heuristic(child, expected);
 // data wasn't in closed creating new node in nodes vector
 if (inserted)
 nodes.emplace_back(child, curr.id, g + h);
 // if f cost of existed node is worse than current just updating cost
 else if (g + h < nodes[it->second].f)
 nodes[it->second].f = g + h;
 else 
 continue;
 
 // finally constructing new open node in open queue
 opened.emplace(it->second, g, h);
 }
 }
 return {}; // path not found
}

Did you try plugging a linear allocator into the map? (not included in the standard library)

Stack Exchange Network

Improving generic A* algorithm performance

3 Answers 3

Linear search through the whole heap

"Re-parenting" a node changes its `f` score

Reachable unreachable code

Shared pointers

Changes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improving generic A* algorithm performance

3 Answers 3

Linear search through the whole heap

"Re-parenting" a node changes its f score

Reachable unreachable code

Shared pointers

Changes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

"Re-parenting" a node changes its `f` score