I'm learning C++ as well as algorithms. Here's my implementation of finding the closest pair problem. I tried to minimize memory by using only iterators.
And points are being read from points.txt
file content:
2,1
3,5
8,3
5,8
9,1
5,2
3,3
4,5
6,5
1,9
Output is:
Closest pair are:
3, 5
4, 5
Here's the code:
#include <iostream>
#include <array>
#include <algorithm>
#include <fstream>
#include <cfloat>
#include <cmath>
struct Point {
Point(double x = 0, double y = 0) {
x_coordinate = x;
y_coordinate = y;
}
double x_coordinate;
double y_coordinate;
static bool sortByX(const Point &lhs, const Point &rhs) {
return lhs.x_coordinate < rhs.x_coordinate;
}
static bool sortByY(const Point &lhs, const Point &rhs) {
return lhs.y_coordinate < rhs.y_coordinate;
}
};
template<std::size_t SIZE>
using p_iterator = typename std::array<Point, SIZE>::iterator;
template<std::size_t SIZE>
using p_iterators_array = std::array<typename std::array<Point, SIZE>::iterator, SIZE>;
void initialize_points(std::array<Point, 10> &points) {
double x, y;
int i = 0;
char c;
std::ifstream infile("./points.txt");
while((infile >> x >> c >> y) && (c == ',') && (i < 11)) {
points[i++] = Point(x, y);
}
}
double calculate_distance(Point &p1, Point &p2) {
return std::sqrt(std::pow(p1.x_coordinate - p2.x_coordinate, 2) + std::pow(p1.y_coordinate - p2.y_coordinate , 2));
}
template<typename T>
p_iterators_array<2> eucledian_closest(T &points, int size) {
p_iterators_array<2> closest_arr;
double closest_distance = DBL_MAX, distance = 0.0;
for(int i = 0; i < size - 1; i++){
for(int j = i + 1; j < size; j++) {
distance = calculate_distance(points[i], points[j]);
if(distance < closest_distance && distance != 0) {
closest_distance = distance;
closest_arr[0] = points + i;
closest_arr[1] = points + j;
}
}
}
return closest_arr;
}
template<std::size_t SIZE>
p_iterators_array<2> closest_split_pair(p_iterator<SIZE> points_iterator, p_iterators_array<2> &closest_side_pairs) {
std::vector<p_iterator<SIZE>> split_pairs;
p_iterators_array<2> final_result;
double closest_distance = DBL_MAX, distance = 0.0;
typename std::array<Point, SIZE>::iterator midpoint = points_iterator + (SIZE/2);
//filtering points to only points in sigma-2sigma rectangle
for (size_t i = 0; i < SIZE; i++) {
if(std::abs(points_iterator[i].x_coordinate - midpoint->x_coordinate) < calculate_distance(*(closest_side_pairs[0]), *(closest_side_pairs[1]))){
split_pairs.push_back(points_iterator + i);
}
}
//finding closest pair in split_pairs
for (size_t i = 0; i < split_pairs.size() - 1; i++) {
for (size_t j = i+1; (j < 7) && (j < split_pairs.size()) ; j++) {
distance = calculate_distance(*(split_pairs[i]), *(split_pairs[j]));
if(distance < closest_distance && distance != 0) {
closest_distance = distance;
final_result[0] = split_pairs[i];
final_result[1] = split_pairs[j];
}
}
}
//comparing split paris distance and side pairs distance
if(calculate_distance(*(closest_side_pairs.front()), *(closest_side_pairs.back())) < calculate_distance(*(final_result.front()), *(final_result.back()))) {
final_result = closest_side_pairs;
}
return final_result;
}
template<std::size_t SIZE>
p_iterators_array<2> closest_side_pair(p_iterator<SIZE> points_iterator, p_iterator<SIZE> x_arr_iterator, p_iterator<SIZE> y_arr_iterator) {
std::size_t delimeter = SIZE / 2 ;
if(delimeter <= 3) {
return eucledian_closest(points_iterator, delimeter);
}
p_iterators_array<2> closest_left, closest_right, result;
closest_left = closest_side_pair<SIZE / 2>(points_iterator, x_arr_iterator, y_arr_iterator);
closest_right = closest_side_pair<SIZE / 2>(points_iterator + delimeter, x_arr_iterator + delimeter, y_arr_iterator + delimeter);
if(calculate_distance(*(closest_left.front()), *(closest_left.back())) < calculate_distance(*(closest_right.front()), *(closest_right.back()))) {
result = closest_left;
} else {
result = closest_right;
}
return closest_split_pair<SIZE>(points_iterator, result);
}
int main()
{
std::array<Point, 10> points;
initialize_points(points);
std::array<Point, 10> x_p = points;
std::array<Point, 10> y_p = points;
std::sort(x_p.begin(), x_p.end(), Point::sortByX);
std::sort(y_p.begin(), y_p.end(), Point::sortByY);
p_iterators_array<2> closest_result = closest_side_pair<10>(points.begin(), x_p.begin(), y_p.begin());
closest_result = closest_split_pair<10>(points.begin(), closest_result);
std::cout << "Closest pair are: " << std::endl;
for(auto p: closest_result) {
std::cout << p->x_coordinate << ", " << p->y_coordinate << std::endl;
}
}
1 Answer 1
Here are some observations that may help you improve your code.
Use all required #include
s
The code uses std::vector
which means that it should #include <vector>
. It was not difficult to infer, but it helps reviewers if the code is complete.
Use better names
The Point
class is well named and easy to understand, but member functions sortByX
and sortByY
are not as good because they only return the result of a comparison and don't actually sort anything by themselves. I might call them Xless
and Yless
which would be a better explanation of what they do.
Eliminate "magic numbers"
There are a few numbers in the code, such as 2
and 10
that have a specific meaning in their particular context. Generally it's better to avoid that and give such constants meaningful names. That way, if anything ever needs to be changed, you won't have to go hunting through the code for all instances of "10" and then trying to determine if this particular 10 is relevant to the desired change or if it is some other constant that happens to have the same value.
Don't hardcode file names
The points.txt
might be something that a user of this program wants to place elsewhere or name something else, so both hardcoding the name and burying it in the middle of the program are unhelpful.
Write more flexible code
Right now, only sets of exactly ten 2-D points can be evaluated. What if I have twelve points? What if I want to compare 3D points? The code could be made much more flexible in both cases by rethinking both the algorithms and the data structures. I'd recommend using a std::array
for each point and a std::vector
of those points for the main array.
Use idiomatic C++
The initialize_points
is an odd standalone function taking a very particular kind of structure as an argument. It would make more sense to write an extractor for the Point
class and then call it until the end of the list. For example, to do nothing except create 2D points from std::cin
and then echo them back to std::cout
we might have something like this:
int main()
{
std::vector<Point<2>> points;
Point<2> p{{}};
while (std::cin >> p) {
points.push_back(p);
}
for (const auto &p : points) {
std::cout << p << "\n";
}
}
The standard inserter and extractors are typically implemented as friend
functions:
template <std::size_t SIZE>
class Point : public std::array<double, SIZE> {
public:
static constexpr char DELIMITER{','};
Point(std::array<double, SIZE> lst)
: std::array<double, SIZE>{lst}
{}
friend std::ostream &operator<<(std::ostream &out, const Point &p) {
bool notfirst = false;
for (const auto &coord : p) {
if (notfirst) {
out << DELIMITER;
} else {
notfirst = true;
}
out << coord;
}
return out;
}
friend std::istream &operator>>(std::istream &in, Point &p) {
if (in >> p[0]) {
for (std::size_t i=1; i < SIZE; ++i) {
char c;
in >> c >> p[i];
if (c != DELIMITER) {
in.setstate(std::ios::failbit);
}
}
}
return in;
}
};
Fix the bug
Try putting 2,1
as the first and last coordinate pair, and you'll see that the program incorrectly identifies the closes pair as (3,5)
and (4,5)
.
Reconsider the algorithm
It's not necessary to sort on both the X and Y coordinates. See this description of a planar solution for a detailed explanation of an optimal algorithm. The existing code appears to attempt some but not all of this algorithm.
Avoid calculations that are not needed
When optimizing for performance (only after the code is correct!), it's often useful to ponder which calculations might not need to be done. In this case, it's mostly not necessary to do calculate the sqrt
to get the actual distance between points since ordering may be done on squared distances with no loss of generality or correctness.
-
\$\begingroup\$ First, thank you for taking a part of your time to write your observations. It's very helpful. I have only one question, about putting
2,1
as the first and last point, I have a check to make sure it's not the same point repeated, because that would make it a single point and not considered as the closest pair. I don't know if that's what you meant or not. But if it's not i'd be grateful if you clarified more. Thanks again. \$\endgroup\$Rafael Adel– Rafael Adel2016年10月03日 11:58:57 +00:00Commented Oct 3, 2016 at 11:58 -
\$\begingroup\$ Try
2,1
and2,1.5
and you'll see the problem. \$\endgroup\$Edward– Edward2016年10月03日 12:29:47 +00:00Commented Oct 3, 2016 at 12:29 -
\$\begingroup\$ Aha! I found out the cause. But I had a problem solving it, If you took a look at my SO question I'd be grateful. Thanks. stackoverflow.com/questions/39834257/… \$\endgroup\$Rafael Adel– Rafael Adel2016年10月03日 17:29:46 +00:00Commented Oct 3, 2016 at 17:29
Explore related questions
See similar questions with these tags.
closest_side_pair()
doesn't callclosest_split_pair()
. How can you find the closest points in one half by only recursively splitting in half without considering points split across the half? \$\endgroup\$