Here is my A* algorithm. I tried hard to implement it generically, but come up only with one idea to use lambdas for heuristic and next nodes (neighbours / successors) functions. I know that using good heuristic function is the key to speed up, but here i'm trying to squeeze everything from current implementation not taking into account any external factors.
/**
* @brief generic a* algorithm
*
* @tparam T any type with operator< and operator==
* @tparam Heuristic function, returns uint32_t
* @tparam Mutator function returns iterable of all next states of node
* @param initial initial state of node
* @param expected expected state of node
* @param heuristic function like manhattan or euclidean distance
* @param next next mutator
* @return std::vector<T> representing path from expected to initial
*/
template <class T, class Heuristic, class Mutator>
std::vector<T> path(const T& initial, const T& expected,
Heuristic heuristic, Mutator next) {
struct Node {
using ref = std::shared_ptr<Node>;
explicit Node(const T& data, ref parent, uint32_t g, uint32_t h)
: data(data), parent(parent), g(g), h(h), f(g + h) {}
const T data; // actual data
const ref parent{}; // pointer to parent
// scores
const uint32_t g, h, f;
};
static auto path = [](auto n) {
std::vector<T> result;
for (; n; n = n->parent)
result.push_back(n->data);
return result;
};
// comparator for nodes::ref's to make heap
static auto comp = [](auto&& lhs, auto&& rhs) { return lhs->f > rhs->f; };
// std::vector<T> closed;
std::set<T> closed; // <- required operator<
// vector will be used as priority_queue
std::vector<typename Node::ref> opened;
// initial state with g score = 0
opened.push_back(std::make_shared<Node>(initial, nullptr, 0,
heuristic(initial, expected)));
while (not opened.empty()) {
// pop opened into curr
const auto curr = opened.front();
// if goal found
if (curr->data == expected) return path(curr);
// poping priority queue
std::pop_heap(opened.begin(), opened.end(), comp);
opened.pop_back();
// foreach node in successors
for (auto&& child : next(curr->data)) {
// if was in closed
if (!closed.insert(child).second) continue;
// setting up child node
const auto node = std::make_shared<Node>(child, curr, curr->g + 1,
heuristic(child, expected));
// find the node with the same data as child
auto it = std::find_if(opened.begin(), opened.end(), [&child](auto&& node) {
return node->data == child;
});
// if opened does not contains child data push and continue
if (it == opened.end()) {
opened.push_back(std::move(node));
std::push_heap(opened.begin(), opened.end(), comp);
continue;
}
// if current child node is better than found node
// replace previos node with current
if (node->f < (*it)->f) *it = node;
}
}
return {}; // impossible to reach
}
Is there any way to improve performance?
3 Answers 3
Linear search through the whole heap
// find the node with the same data as child auto it = std::find_if(opened.begin(), opened.end(), [&child](auto&& node) { return node->data == child; });
This is a clear issue. It may seem like there is no way to avoid it, but there actually is. Unfortunately, it will mean having to re-implement the heap functions. The trick is this:
- Maintain an
unordered_map
(or as secondary choice,map
) fromT
to the index in the heap at which the node with a givendata
is located. - All heap-related functions must update the indices in that map when they move nodes. This is why they need to be re-implemented. It won't change their time complexity though.
- "find node with the same data as child" can implemented as a lookup in the map.
"Re-parenting" a node changes its f
score
*it = node;
is not sufficient to update a node when a shortcut has been found to it. Its f
score changed, that's why this is being done in the first place. Leaving it in same position may mean that it violates the heap property, which will have to be restored, otherwise future heap operations become unreliable.
Reachable unreachable code
The return in the end, return {}; // impossible to reach
can be reached, namely when the goal is unreachable from the start and the amount of search space which is reachable from the start is finite.
Shared pointers
Nodes do not really need shared pointers to their parents, conceptually they do not own their parent after all, the parent pointers are just to say "go there to follow the path". What is really needed is these two things:
- There must be some way to go from a node to its parent.
- .. and that must be possible when the path finding is done, so the lifetime of this data must be until after the path has been traced.
That can be accomplished with shared pointers, but that has some disadvantages:
- Creating nodes is non-trivial, requiring individual dynamic allocation.
- Nodes have some additional, hidden, memory overhead (from being shared, and also from being individually allocated).
- Getting rid of the nodes that still exist when the function returns is non-trivial.
There are various alternatives. For example (these can be combined/mixed to some extent):
- All nodes could be owned by one big vector which is destroyed all in one go when the function returns (the destructor of
Node
can be trivial as long asT
has a trivial destructor, then all the nodes in the vector just poof out of existence without ceremony). References to nodes can be replaced with indexes into that vector, or if you really want, you could still dynamically allocate the individual nodes and refer to them by non-owning pointer (except in the vector, which would own them with a unique pointer), but that negates some of the benefit of doing this. - The parent relationships could be recorded outside the nodes, in a map from
T
toT
. This way the nodes are only needed as part of the Open set (the only thing needed from a closed node is its parent, but that isn't in the node anymore in this scheme), which could own them by value. This may be bad ifT
is a large type, because it creates a lot moreT
s.
By the way, you are already using a similar approach for the closed set, which in some implementations is implemented as abool isClosed
in theNode
class. - If the search space is a grid, temporary grids can be used to store the nodes (or its separate constituents). Nodes could be referred to by their position in the grid, parents specifically can also be recorded by their direction.
-
\$\begingroup\$ OK, thanks a lot, i almost lost hope for good answers, and here is a marvelous one. Thanks, i'll try your suggestions tomorrow. \$\endgroup\$David– David2021年11月27日 22:25:04 +00:00Commented Nov 27, 2021 at 22:25
-
\$\begingroup\$ @David let me know how it goes, this site doesn't require it I'm just curious \$\endgroup\$user555045– user5550452021年11月28日 15:49:15 +00:00Commented Nov 28, 2021 at 15:49
-
\$\begingroup\$ Sorry, today was actually really busy day, finifshing my term work, i will inform you as soon as algorithm gets better. Btw i use two kinds of tests for it, i'm trying to solve real path between two vectors, and to solve 15 game. At the moment it takes nearly 13k generations and 120 depth level with 400-800 ms on debug to achieve goal with vectors like (0, 0) -> (80, 80) with step +-1 on all directions, but 15 game appears to be unsolvable. \$\endgroup\$David– David2021年11月28日 19:45:15 +00:00Commented Nov 28, 2021 at 19:45
Avoid many small allocations.
Avoid shared ownership.
Avoid linear search.
All of those are expensive and a redesign of your data-structures will get rid of them:
A
std::vector
to store your nodes. You can stably refer to them by index now.A
std::unordered_map
(or at least astd::map
if the external index cannot be hashed) to map from the external identifiers to internal indices into the vector.You should use a custom allocator for this, considering your specific use-case a simple linear allocator would work wonders.
A
std::priority_queue
to store the workitems.
Only store the data you need:
While all of f,g and h are used in explaining the algorithm, you really only need the estimated cost. Only if the heuristic is expensive should you consider storing that separately.
Don't hard-code a specific type needlessly. You can derive the type for the estimated cost of using a node from the return-value of the heuristic (and the edge-weights if you add those).
If on exploring a node you find a new node, calculate its priority and add a workitem.
If you find a better way to a node instead, update its priority and add a workitem.
Obsolete workitems can be discarded when you get to them, because the priority in the workitem doesn't match the node any longer. Fiddling with items somewhere in the middle of a heap easily destroys the heap, or at least gets expensive.
This rewrite also gets rid of the closed list, allowing you to use any admissible heuristic, not just a monotone (aka consistent) one. A restriction which you didn't document.
Avoid passing small trivial types by constant reference. The indirection might inhibit optimization.
What about weighted edges? Currently, your generic algorithm has a fixed edge-weight of 1.
If you add the endpoint to the path, you can distinguish no path found (empty) from trivial path found (end is start).
As an untested example:
template <class T, class F1, class F2>
std::vector<T> path(T start, T target, F1 heuristic, F2 next) {
using Cost = decltype(heuristic(start, target)
+ std::get<0>(next(start).begin()));
struct Node {
const T data;
std::size_t parent;
Cost cost;
};
struct Open {
Cost cost;
std::size_t id;
bool operator<(const Open& rhs) { return cost > rhs.cost; };
};
std::vector<Node> nodes {{ start, 0, heuristic(start, target)}};
std::unordered_map<T, std::size_t> ids {{ nodes[0].data, 0}};
// ^^ Find or write a simple linear allocator for all those nodes
std::priority_queue<Open> open;
open.emplace(nodes[0].cost, 0);
while (!open.empty()) {
auto [cost, id] = open.top();
open.pop();
if (nodes[id].cost != cost)
continue;
if (nodes[id].data == target) {
std::vector<T> r;
for (; id; id = nodes[id].parent)
r.push_back(nodes[id].data);
r.push_back(nodes[id].data);
std::reverse(r.begin(), r.end());
return r;
}
cost -= heuristic(nodes[id].data, target);
for (auto&& [weight, data] : next(nodes[id].data)) {
auto [p, added] = ids.try_emplace(data, nodes.size());
auto estimate = cost + weight + heuristic(data, target);
if (added)
nodes.emplace_back(data, id, estimate);
else if (estimate < nodes[p->second()].cost)
nodes[p->second()].cost = estimate;
else
continue;
open.emplace(estimate, p->second());
}
}
return {};
}
Changes
Ok so after @harold and @Deduplicator gave me a couple pieces of advise (Big thanks btw) I ended up with the following implementation. Performance win is not huge but there is one, about 100-300 ms for 8game solve and 50 ms for two dimensional vectors.
namespace {
// the reason why this structure is not in closure
// is because of operator< has to be friend
// and friend is only available outside closure
struct Open {
size_t id; // vector stack id instead of ptr
uint32_t g, h; // still separate variables
explicit Open(size_t id, uint32_t g, uint32_t h) : id(id), g(g), h(h) {}
// comparator for priority queue
friend bool operator<(Open lhs, Open rhs) { return lhs.g + lhs.h > rhs.g + rhs.h; }
};
} // namespace
/**
* @brief generic a* algorithm
*
* @tparam T any type with operator< and operator==
* @tparam Heuristic function, returns uint32_t
* @tparam Mutator function returns iterable of all next states of node
* @param initial initial state of node
* @param expected expected state of node
* @param heuristic function like manhattan or euclidean distance
* @param next next mutator
* @return std::vector<T> representing path from expected to initial
*/
template <class T, class Heuristic, class Mutator>
std::vector<T> path(const T& initial, const T& expected,
Heuristic heuristic, Mutator next, size_t nextWeight = 1) {
struct Node {
const T data; // actual data
size_t parent; // index to parent instead of ptr
uint32_t f;
explicit Node(const T& data, size_t parent, uint32_t f)
: data(data), parent(parent), f(f) {
}
};
std::vector<Node> nodes{Node(initial, 0, heuristic(initial, expected))}; // as was suggested
// now pq, there is no need to iterate through underlying vector anymore
std::priority_queue<Open> opened;
// map because i don't want to force user implement hash function for their data
std::map<T, size_t> closed{{initial, 0}};
// initial state
opened.emplace(0, 0, nodes[0].f);
while (not opened.empty()) {
// pop opened into curr
const auto curr = opened.top();
opened.pop();
// if goal found
if (nodes[curr.id].data == expected) {
// no lambda for better performance?
std::vector<T> result;
result.reserve(curr.g / nextWeight); // using g for our advantage
auto id = curr.id;
for (; id; id = nodes[id].parent)
result.push_back(std::move(nodes[id].data)); // move because we don't need this data anymore in closure
result.push_back(std::move(nodes[id].data));
return result; // not reversing this is users responsibility
}
// foreach node in successors
for (auto&& child : next(nodes[curr.id].data)) {
// trying to insert into closed map
const auto [it, inserted] = closed.try_emplace(child, nodes.size());
// precomputing g and h values
const auto g = curr.g + nextWeight;
const auto h = heuristic(child, expected);
// data wasn't in closed creating new node in nodes vector
if (inserted)
nodes.emplace_back(child, curr.id, g + h);
// if f cost of existed node is worse than current just updating cost
else if (g + h < nodes[it->second].f)
nodes[it->second].f = g + h;
else
continue;
// finally constructing new open node in open queue
opened.emplace(it->second, g, h);
}
}
return {}; // path not found
}
-
\$\begingroup\$ Did you try plugging a linear allocator into the map? (not included in the standard library) \$\endgroup\$Deduplicator– Deduplicator2021年11月30日 15:17:48 +00:00Commented Nov 30, 2021 at 15:17
not
operator confused me for a second until I remembered about alternative operators. I'm not sure if it's considered good practice to use them? \$\endgroup\$while (not opened.empty())
I would have expected to see the same inif (!closed.insert(child).second)
. \$\endgroup\$