Here is working code to get N closest points to some reference point.
Please help to improve it, specifically by commenting on my use of std
algorithms and vectors/iterators manipulations.
Nearest.h
#pragma once
#include <iostream>
#include <algorithm>
#include <cmath>
#include <vector>
namespace nearest {
struct Point {
double x;
double y;
double z;
bool operator==(const Point& rhp) {
return x == rhp.x && y == rhp.y && z == rhp.z;
}
};
// Return N closest points to first point in vector in order
std::vector<Point> nearestN(const std::vector<Point>& points,
int N);
// Return N closest points to a reference point, in order - all
// within maximum distance threshold
std::vector<Point> nearestN(const std::vector<Point>& points,
int N,
const Point& reference,
double distanceThreshold);
}
Nearest.cpp
#include "Nearest.h"
using namespace std;
namespace nearest {
// to print Point to cout
std::ostream& operator<<(std::ostream &o, const nearest::Point& p) {
o << "(" << p.x << ", " << p.y << ", " << p.z << ")" << std::endl;
return o;
}
// defines sorting order for points
struct ByDistance {
const Point& origin;
bool operator()(const Point& lp, const Point& rp) {
return dist(origin, lp) < dist(origin, rp);
}
static double dist(const Point& lp, const Point& rp){
double xs = (lp.x - rp.x) * (lp.x - rp.x);
double ys = (lp.y - rp.y) * (lp.y - rp.y);
double zs = (lp.z - rp.z) * (lp.z - rp.z);
return sqrt(xs + ys + zs);
}
};
vector<Point> nearestN(const vector<Point>& points, int N) {
if (points.size() == 0) {
vector<Point> empty;
return empty;
}
return nearestN(points, N, points[0], INFINITY);
}
vector<Point> nearestN(const vector<Point>& points,
int N,
const Point& reference,
double distanceThreshold) {
vector<Point> temp;
temp.insert(temp.begin(), points.begin(), points.end());
// filtering vector to remove all points far then distance from reference
temp.erase(remove_if(temp.begin(),
temp.end(),
[&reference, distanceThreshold](Point& p){ return ByDistance::dist(p, reference) > distanceThreshold; }),
temp.end());
ByDistance distance = {reference};
sort(temp.begin(), temp.end(), distance);
auto sz = min(static_cast<int>(temp.size()), N);
vector<Point> result(sz);
copy_n(temp.begin(), sz, result.begin());
return result;
}
}
-
2\$\begingroup\$ How many points are usually in your input? If it's a lot and you find this code getting to be too slow, you might look into using something like a k-d tree. \$\endgroup\$user1118321– user11183212015年10月24日 04:36:01 +00:00Commented Oct 24, 2015 at 4:36
4 Answers 4
A comparison-function doesn't change either argument, so should take them by
const&
.You normally don't make comparison-operators member-functions to make sure they handle conversions equally for both arguments.
If neccessary, that might mean making themfriend
-functions, whether defined inline or not.If you provide
==
, you should also provide!=
:They model one concept, which can be trivially implemented in terms of each other (so much one wonders why the language does not implicitly define the second, with explicit overriding for strange types like floating-point), and everyone expects there to be both or none.
Consider naming your type
Point3
to clarify that this is the 3d-point-variant.struct Point3 { double x, y, z; }; bool operator==(const Point3& a, const Point3& b) { return a.x==b.x && a.y==b.y && a.z==b.z; } bool operator!=(const Point3& a, const Point3& b) { return !(a==b); }
Don't ask for
size()
if all you want to know is whether a container isempty()
. While there is no performance-difference forstd::vector
, there is one for other containers. Also, the latter is more explicit.nearestN
is severely sub-par:- Use squared distances to avoid the costly square-roots.
- Don't create copies you don't need.
- Just like there's an algorithm for removing elements fulfilling a predicate, there is one for only copying them:
std::copy_if
, coupled withstd::back_inserter
. - Anyway, you actually want to build a list of candidates, and then choose the N best.
constexpr double distance2(const Point3& a, const Point3& b) { return (a.x-b.x)*(a.x-b.x)+(a.y-b.y)*(a.y-b.y)+(a.z-b.z)*(a.z-b.z); } std::vector<Point3> nearestN( const std::vector<Point3>& v, int maxN, const Point3& center, double maxDistance ) { assert(maxDistance >= 0 && maxN >= 0); maxDistance *= maxDistance; std::vector<std::pair<double, const Point3*>> temp; temp.reserve(v.size()); for(auto&& x : v) { double d = distance2(x, center); if(d <= maxDistance) temp.emplace_back(d, &x); } if(maxN >= temp.size()) maxN = temp.size(); else std::sort(begin(temp), end(temp)); std::vector<Point3> r; r.reserve(maxN); for(size_t i = 0; i < maxN; ++i) r.push_back(*temp[i].second); return r; }
Using a heap (
std::push_heap
,std::pop_heap
), you can keep the list of candidates minimal and thus reduce the complexity to O(n*log(min(n, N))) (n=elements in input, N=max elements in output.
I used a customreplace_heap
(equivalent to popping the head and then pushing a replacement) adapted from How to replace top element of heap efficiently withouth re-establishing heap invariant twice? for even more efficiency:std::vector<Point3> nearestN( const std::vector<Point3>& v, std::size_t maxN, const Point3& center, double maxDistance ) { assert(maxDistance >= 0); if(maxN <= 0) return {}; maxDistance *= maxDistance; if(std::isinf(maxDistance) && v.size() <= maxN) return v; using pair = std::pair<double, const Point3*>; std::unique_ptr<pair[]> heap{new pair[maxN]}; size_t N = 0, i = v.size(); while(i && N < maxN) { double d = distance2(center, v[--i]); if(d <= maxDistance) heap[N++] = {d, &v[i]}; } if(i) std::make_heap(heap.get(), heap.get() + N); while(i) { double d = distance2(center, v[--i]); if(d < heap[0].first) { heap[0] = {d, &v[i]}; replace_heap(&heap[0], &heap[N]); } } std::vector<Point3> r; r.reserve(N); for(i = 0; i < N; i++) r.push_back(*heap[i].second); return r; }
Prefer a
std::span<const T>
over astd::vector<T> const&
. Less indirection and potentially cheaper construction lead to more efficiency.Even though it was only standardised with C++20, there are plenty implementations for even C++11.
See "What is a "span" and when should I use one?" for more details.
Faster sorting
If the only purpose of the distance computation is for comparison, then you can speed it up by removing the call to sqrt()
. You'll end up comparing the squares of the distances, which is equivalent to comparing the actual distances.
ByDistance
looks very ad-hoc. If you movedist
toPoint
(where it really belongs) as a member function,ByDistance
is not required at all, e.g:temp.erase(remove_if(temp.begin(), temp.end(), [&reference, distanceThreshold](Point& p){ return reference.dist(p > distanceThreshold; }), temp.end());
and
sort(temp.begin(), temp.end(), [&reference](const Point& p1, const Point& p2) { return reference.dist(p1) < reference.dist(p2); }); );
I don't see a need to create yet another
vector
withvector<Point> result(sz);
You may reuse
temp
withtemp.resize()
instead.Mandatory advisory against using namespace std.
-
\$\begingroup\$ Are you missing a close paren in that first code block? Something doesn't look right to me. \$\endgroup\$user1118321– user11183212015年10月24日 04:30:15 +00:00Commented Oct 24, 2015 at 4:30
-
\$\begingroup\$ Free functions should generally be preferred, especially if the public interface suffices to implement them. That counts double for functions symmetric in their arguments. \$\endgroup\$Deduplicator– Deduplicator2015年10月24日 18:10:28 +00:00Commented Oct 24, 2015 at 18:10
Your operator overloads (
operator==
andoperator()
) should beconst
since they don't modify any data members.operator<<
can just be a single line. It could also use"\n"
instead ofstd::endl
so that one isn't forced to have a buffer flush when invoking it (there is a different between the two).return o << "(" << p.x << ", " << p.y << ", " << p.z << ")\n";
This can be shortened:
if (points.size() == 0) { vector<Point> empty; return empty; }
by using
empty()
and by returning an anonymousvector
instead:if (points.empty()) { return vector<Point>(); }
Explore related questions
See similar questions with these tags.