N closest points to the reference point

Question 1

Here is working code to get N closest points to some reference point.

Please help to improve it, specifically by commenting on my use of std algorithms and vectors/iterators manipulations.

Nearest.h

#pragma once
#include <iostream>
#include <algorithm>
#include <cmath>
#include <vector>
namespace nearest {
struct Point {
 double x;
 double y;
 double z;
 bool operator==(const Point& rhp) {
 return x == rhp.x && y == rhp.y && z == rhp.z;
 }
};
// Return N closest points to first point in vector in order
std::vector<Point> nearestN(const std::vector<Point>& points,
 int N);
// Return N closest points to a reference point, in order - all
// within maximum distance threshold
std::vector<Point> nearestN(const std::vector<Point>& points,
 int N,
 const Point& reference,
 double distanceThreshold);
}

Nearest.cpp

#include "Nearest.h"
using namespace std;
namespace nearest {
 // to print Point to cout
 std::ostream& operator<<(std::ostream &o, const nearest::Point& p) {
 o << "(" << p.x << ", " << p.y << ", " << p.z << ")" << std::endl;
 return o;
 }
 // defines sorting order for points
 struct ByDistance {
 const Point& origin;
 bool operator()(const Point& lp, const Point& rp) {
 return dist(origin, lp) < dist(origin, rp);
 }
 static double dist(const Point& lp, const Point& rp){
 double xs = (lp.x - rp.x) * (lp.x - rp.x);
 double ys = (lp.y - rp.y) * (lp.y - rp.y);
 double zs = (lp.z - rp.z) * (lp.z - rp.z);
 return sqrt(xs + ys + zs);
 }
 };
 vector<Point> nearestN(const vector<Point>& points, int N) {
 if (points.size() == 0) {
 vector<Point> empty;
 return empty;
 }
 return nearestN(points, N, points[0], INFINITY);
 }
 vector<Point> nearestN(const vector<Point>& points,
 int N,
 const Point& reference,
 double distanceThreshold) {
 vector<Point> temp;
 temp.insert(temp.begin(), points.begin(), points.end());
 // filtering vector to remove all points far then distance from reference
 temp.erase(remove_if(temp.begin(), 
 temp.end(),
 [&reference, distanceThreshold](Point& p){ return ByDistance::dist(p, reference) > distanceThreshold; }),
 temp.end());
 ByDistance distance = {reference};
 sort(temp.begin(), temp.end(), distance);
 auto sz = min(static_cast<int>(temp.size()), N);
 vector<Point> result(sz);
 copy_n(temp.begin(), sz, result.begin());
 return result;
 }
}

Question 2

How many points are usually in your input? If it's a lot and you find this code getting to be too slow, you might look into using something like a k-d tree.

Question 3

A comparison-function doesn't change either argument, so should take them by const&.
You normally don't make comparison-operators member-functions to make sure they handle conversions equally for both arguments.
If neccessary, that might mean making them friend-functions, whether defined inline or not.
If you provide ==, you should also provide !=:

They model one concept, which can be trivially implemented in terms of each other (so much one wonders why the language does not implicitly define the second, with explicit overriding for strange types like floating-point), and everyone expects there to be both or none.

Consider naming your type Point3 to clarify that this is the 3d-point-variant.

 struct Point3
 {
 double x, y, z;
 };
 bool operator==(const Point3& a, const Point3& b) {
 return a.x==b.x && a.y==b.y && a.z==b.z;
 }
 bool operator!=(const Point3& a, const Point3& b) { return !(a==b); }

Don't ask for size() if all you want to know is whether a container is empty(). While there is no performance-difference for std::vector, there is one for other containers. Also, the latter is more explicit.

nearestN is severely sub-par:

Use squared distances to avoid the costly square-roots.
Don't create copies you don't need.
Just like there's an algorithm for removing elements fulfilling a predicate, there is one for only copying them: std::copy_if, coupled with std::back_inserter.
Anyway, you actually want to build a list of candidates, and then choose the N best.

constexpr double distance2(const Point3& a, const Point3& b) {
 return (a.x-b.x)*(a.x-b.x)+(a.y-b.y)*(a.y-b.y)+(a.z-b.z)*(a.z-b.z);
}
std::vector<Point3> nearestN(
 const std::vector<Point3>& v,
 int maxN,
 const Point3& center,
 double maxDistance
) {
 assert(maxDistance >= 0 && maxN >= 0);
 maxDistance *= maxDistance;
 std::vector<std::pair<double, const Point3*>> temp;
 temp.reserve(v.size());
 for(auto&& x : v) {
 double d = distance2(x, center);
 if(d <= maxDistance)
 temp.emplace_back(d, &x);
 }
 if(maxN >= temp.size())
 maxN = temp.size();
 else
 std::sort(begin(temp), end(temp));
 std::vector<Point3> r;
 r.reserve(maxN);
 for(size_t i = 0; i < maxN; ++i)
 r.push_back(*temp[i].second);
 return r;
}

Using a heap (std::push_heap, std::pop_heap), you can keep the list of candidates minimal and thus reduce the complexity to O(n*log(min(n, N))) (n=elements in input, N=max elements in output.
I used a custom replace_heap (equivalent to popping the head and then pushing a replacement) adapted from How to replace top element of heap efficiently withouth re-establishing heap invariant twice? for even more efficiency:

std::vector<Point3> nearestN(
 const std::vector<Point3>& v,
 std::size_t maxN,
 const Point3& center,
 double maxDistance
) {
 assert(maxDistance >= 0);
 if(maxN <= 0) return {};
 maxDistance *= maxDistance;
 if(std::isinf(maxDistance) && v.size() <= maxN) return v;
 using pair = std::pair<double, const Point3*>;
 std::unique_ptr<pair[]> heap{new pair[maxN]};
 size_t N = 0, i = v.size();
 while(i && N < maxN) {
 double d = distance2(center, v[--i]);
 if(d <= maxDistance)
 heap[N++] = {d, &v[i]};
 }
 if(i)
 std::make_heap(heap.get(), heap.get() + N);
 while(i) {
 double d = distance2(center, v[--i]);
 if(d < heap[0].first) {
 heap[0] = {d, &v[i]};
 replace_heap(&heap[0], &heap[N]);
 }
 }
 std::vector<Point3> r;
 r.reserve(N);
 for(i = 0; i < N; i++)
 r.push_back(*heap[i].second);
 return r;
}

Prefer a std::span<const T> over a std::vector<T> const&. Less indirection and potentially cheaper construction lead to more efficiency.

Even though it was only standardised with C++20, there are plenty implementations for even C++11.
See "What is a "span" and when should I use one?" for more details.

Question 4

Faster sorting

If the only purpose of the distance computation is for comparison, then you can speed it up by removing the call to sqrt(). You'll end up comparing the squares of the distances, which is equivalent to comparing the actual distances.

Question 5

ByDistance looks very ad-hoc. If you move dist to Point (where it really belongs) as a member function, ByDistance is not required at all, e.g:

 temp.erase(remove_if(temp.begin(), 
 temp.end(),
 [&reference, distanceThreshold](Point& p){
 return reference.dist(p > distanceThreshold;
 }),
 temp.end());

and

 sort(temp.begin(), temp.end(),
 [&reference](const Point& p1, const Point& p2) {
 return reference.dist(p1) < reference.dist(p2);
 });
 );

I don't see a need to create yet another vector with
```
 vector<Point> result(sz);
```
You may reuse temp with temp.resize() instead.
Mandatory advisory against using namespace std.

Question 6

Are you missing a close paren in that first code block? Something doesn't look right to me.

Question 7

Free functions should generally be preferred, especially if the public interface suffices to implement them. That counts double for functions symmetric in their arguments.

Question 8

Your operator overloads (operator== and operator()) should be const since they don't modify any data members.
operator<< can just be a single line. It could also use "\n" instead of std::endl so that one isn't forced to have a buffer flush when invoking it (there is a different between the two).
```
return o << "(" << p.x << ", " << p.y << ", " << p.z << ")\n";
```

This can be shortened:

if (points.size() == 0) {
 vector<Point> empty;
 return empty;
}

by using empty() and by returning an anonymous vector instead:

if (points.empty()) {
 return vector<Point>();
}

Deduplicator Deduplicator 19.6k1 gold badge32 silver badges65 bronze badges · Accepted Answer · 2015-10-24 16:41:40Z

A comparison-function doesn't change either argument, so should take them by const&.
You normally don't make comparison-operators member-functions to make sure they handle conversions equally for both arguments.
If neccessary, that might mean making them friend-functions, whether defined inline or not.
If you provide ==, you should also provide !=:

They model one concept, which can be trivially implemented in terms of each other (so much one wonders why the language does not implicitly define the second, with explicit overriding for strange types like floating-point), and everyone expects there to be both or none.

Consider naming your type Point3 to clarify that this is the 3d-point-variant.

 struct Point3
 {
 double x, y, z;
 };
 bool operator==(const Point3& a, const Point3& b) {
 return a.x==b.x && a.y==b.y && a.z==b.z;
 }
 bool operator!=(const Point3& a, const Point3& b) { return !(a==b); }

Don't ask for size() if all you want to know is whether a container is empty(). While there is no performance-difference for std::vector, there is one for other containers. Also, the latter is more explicit.

nearestN is severely sub-par:

Use squared distances to avoid the costly square-roots.
Don't create copies you don't need.
Just like there's an algorithm for removing elements fulfilling a predicate, there is one for only copying them: std::copy_if, coupled with std::back_inserter.
Anyway, you actually want to build a list of candidates, and then choose the N best.

constexpr double distance2(const Point3& a, const Point3& b) {
 return (a.x-b.x)*(a.x-b.x)+(a.y-b.y)*(a.y-b.y)+(a.z-b.z)*(a.z-b.z);
}
std::vector<Point3> nearestN(
 const std::vector<Point3>& v,
 int maxN,
 const Point3& center,
 double maxDistance
) {
 assert(maxDistance >= 0 && maxN >= 0);
 maxDistance *= maxDistance;
 std::vector<std::pair<double, const Point3*>> temp;
 temp.reserve(v.size());
 for(auto&& x : v) {
 double d = distance2(x, center);
 if(d <= maxDistance)
 temp.emplace_back(d, &x);
 }
 if(maxN >= temp.size())
 maxN = temp.size();
 else
 std::sort(begin(temp), end(temp));
 std::vector<Point3> r;
 r.reserve(maxN);
 for(size_t i = 0; i < maxN; ++i)
 r.push_back(*temp[i].second);
 return r;
}

Using a heap (std::push_heap, std::pop_heap), you can keep the list of candidates minimal and thus reduce the complexity to O(n*log(min(n, N))) (n=elements in input, N=max elements in output.
I used a custom replace_heap (equivalent to popping the head and then pushing a replacement) adapted from How to replace top element of heap efficiently withouth re-establishing heap invariant twice? for even more efficiency:

std::vector<Point3> nearestN(
 const std::vector<Point3>& v,
 std::size_t maxN,
 const Point3& center,
 double maxDistance
) {
 assert(maxDistance >= 0);
 if(maxN <= 0) return {};
 maxDistance *= maxDistance;
 if(std::isinf(maxDistance) && v.size() <= maxN) return v;
 using pair = std::pair<double, const Point3*>;
 std::unique_ptr<pair[]> heap{new pair[maxN]};
 size_t N = 0, i = v.size();
 while(i && N < maxN) {
 double d = distance2(center, v[--i]);
 if(d <= maxDistance)
 heap[N++] = {d, &v[i]};
 }
 if(i)
 std::make_heap(heap.get(), heap.get() + N);
 while(i) {
 double d = distance2(center, v[--i]);
 if(d < heap[0].first) {
 heap[0] = {d, &v[i]};
 replace_heap(&heap[0], &heap[N]);
 }
 }
 std::vector<Point3> r;
 r.reserve(N);
 for(i = 0; i < N; i++)
 r.push_back(*heap[i].second);
 return r;
}

Prefer a std::span<const T> over a std::vector<T> const&. Less indirection and potentially cheaper construction lead to more efficiency.

Even though it was only standardised with C++20, there are plenty implementations for even C++11.
See "What is a "span" and when should I use one?" for more details.

Stack Exchange Network

N closest points to the reference point

4 Answers 4

Faster sorting

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

N closest points to the reference point

4 Answers 4

Faster sorting

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions