Finding closest pair of 2D points, using divide-and-conquer

Question 1

I'm learning C++ as well as algorithms. Here's my implementation of finding the closest pair problem. I tried to minimize memory by using only iterators.

And points are being read from points.txt file content:

2,1
3,5
8,3
5,8
9,1
5,2
3,3
4,5
6,5
1,9

Output is:

Closest pair are: 
3, 5
4, 5

Here's the code:

#include <iostream>
#include <array>
#include <algorithm>
#include <fstream>
#include <cfloat>
#include <cmath>
struct Point {
 Point(double x = 0, double y = 0) {
 x_coordinate = x;
 y_coordinate = y;
 }
 double x_coordinate;
 double y_coordinate;
 static bool sortByX(const Point &lhs, const Point &rhs) {
 return lhs.x_coordinate < rhs.x_coordinate;
 }
 static bool sortByY(const Point &lhs, const Point &rhs) {
 return lhs.y_coordinate < rhs.y_coordinate;
 }
};
template<std::size_t SIZE>
using p_iterator = typename std::array<Point, SIZE>::iterator;
template<std::size_t SIZE>
using p_iterators_array = std::array<typename std::array<Point, SIZE>::iterator, SIZE>;
void initialize_points(std::array<Point, 10> &points) {
 double x, y;
 int i = 0;
 char c;
 std::ifstream infile("./points.txt");
 while((infile >> x >> c >> y) && (c == ',') && (i < 11)) {
 points[i++] = Point(x, y);
 }
}
double calculate_distance(Point &p1, Point &p2) {
 return std::sqrt(std::pow(p1.x_coordinate - p2.x_coordinate, 2) + std::pow(p1.y_coordinate - p2.y_coordinate , 2));
}
template<typename T>
p_iterators_array<2> eucledian_closest(T &points, int size) {
 p_iterators_array<2> closest_arr;
 double closest_distance = DBL_MAX, distance = 0.0;
 for(int i = 0; i < size - 1; i++){
 for(int j = i + 1; j < size; j++) {
 distance = calculate_distance(points[i], points[j]);
 if(distance < closest_distance && distance != 0) {
 closest_distance = distance;
 closest_arr[0] = points + i;
 closest_arr[1] = points + j;
 }
 }
 }
 return closest_arr;
}
template<std::size_t SIZE>
p_iterators_array<2> closest_split_pair(p_iterator<SIZE> points_iterator, p_iterators_array<2> &closest_side_pairs) {
 std::vector<p_iterator<SIZE>> split_pairs;
 p_iterators_array<2> final_result;
 double closest_distance = DBL_MAX, distance = 0.0;
 typename std::array<Point, SIZE>::iterator midpoint = points_iterator + (SIZE/2);
 //filtering points to only points in sigma-2sigma rectangle
 for (size_t i = 0; i < SIZE; i++) {
 if(std::abs(points_iterator[i].x_coordinate - midpoint->x_coordinate) < calculate_distance(*(closest_side_pairs[0]), *(closest_side_pairs[1]))){
 split_pairs.push_back(points_iterator + i);
 }
 }
 //finding closest pair in split_pairs
 for (size_t i = 0; i < split_pairs.size() - 1; i++) {
 for (size_t j = i+1; (j < 7) && (j < split_pairs.size()) ; j++) {
 distance = calculate_distance(*(split_pairs[i]), *(split_pairs[j]));
 if(distance < closest_distance && distance != 0) {
 closest_distance = distance;
 final_result[0] = split_pairs[i];
 final_result[1] = split_pairs[j];
 }
 }
 }
 //comparing split paris distance and side pairs distance
 if(calculate_distance(*(closest_side_pairs.front()), *(closest_side_pairs.back())) < calculate_distance(*(final_result.front()), *(final_result.back()))) {
 final_result = closest_side_pairs;
 }
 return final_result;
}
template<std::size_t SIZE>
p_iterators_array<2> closest_side_pair(p_iterator<SIZE> points_iterator, p_iterator<SIZE> x_arr_iterator, p_iterator<SIZE> y_arr_iterator) {
 std::size_t delimeter = SIZE / 2 ;
 if(delimeter <= 3) {
 return eucledian_closest(points_iterator, delimeter);
 }
 p_iterators_array<2> closest_left, closest_right, result;
 closest_left = closest_side_pair<SIZE / 2>(points_iterator, x_arr_iterator, y_arr_iterator);
 closest_right = closest_side_pair<SIZE / 2>(points_iterator + delimeter, x_arr_iterator + delimeter, y_arr_iterator + delimeter);
 if(calculate_distance(*(closest_left.front()), *(closest_left.back())) < calculate_distance(*(closest_right.front()), *(closest_right.back()))) {
 result = closest_left;
 } else {
 result = closest_right;
 }
 return closest_split_pair<SIZE>(points_iterator, result);
}
int main()
{
 std::array<Point, 10> points;
 initialize_points(points);
 std::array<Point, 10> x_p = points;
 std::array<Point, 10> y_p = points;
 std::sort(x_p.begin(), x_p.end(), Point::sortByX);
 std::sort(y_p.begin(), y_p.end(), Point::sortByY);
 p_iterators_array<2> closest_result = closest_side_pair<10>(points.begin(), x_p.begin(), y_p.begin());
 closest_result = closest_split_pair<10>(points.begin(), closest_result);
 std::cout << "Closest pair are: " << std::endl;
 for(auto p: closest_result) {
 std::cout << p->x_coordinate << ", " << p->y_coordinate << std::endl;
 }
}

Question 2

This program doesn't look correct for the simple fact that closest_side_pair() doesn't call closest_split_pair(). How can you find the closest points in one half by only recursively splitting in half without considering points split across the half?

Question 3

@JS1 Fixed. Please check the edited code. Thanks.

Question 4

I would recommend to avoid any pow(x, n) call, where n is an integer. Normally pow is rather costly compared to the explicit multiplication.

Question 5

Here are some observations that may help you improve your code.

Use all required `#include`s

The code uses std::vector which means that it should #include <vector>. It was not difficult to infer, but it helps reviewers if the code is complete.

Use better names

The Point class is well named and easy to understand, but member functions sortByX and sortByY are not as good because they only return the result of a comparison and don't actually sort anything by themselves. I might call them Xless and Yless which would be a better explanation of what they do.

Eliminate "magic numbers"

There are a few numbers in the code, such as 2 and 10 that have a specific meaning in their particular context. Generally it's better to avoid that and give such constants meaningful names. That way, if anything ever needs to be changed, you won't have to go hunting through the code for all instances of "10" and then trying to determine if this particular 10 is relevant to the desired change or if it is some other constant that happens to have the same value.

Don't hardcode file names

The points.txt might be something that a user of this program wants to place elsewhere or name something else, so both hardcoding the name and burying it in the middle of the program are unhelpful.

Write more flexible code

Right now, only sets of exactly ten 2-D points can be evaluated. What if I have twelve points? What if I want to compare 3D points? The code could be made much more flexible in both cases by rethinking both the algorithms and the data structures. I'd recommend using a std::array for each point and a std::vector of those points for the main array.

Use idiomatic C++

The initialize_points is an odd standalone function taking a very particular kind of structure as an argument. It would make more sense to write an extractor for the Point class and then call it until the end of the list. For example, to do nothing except create 2D points from std::cin and then echo them back to std::cout we might have something like this:

int main()
{
 std::vector<Point<2>> points;
 Point<2> p{{}};
 while (std::cin >> p) {
 points.push_back(p);
 }
 for (const auto &p : points) {
 std::cout << p << "\n";
 }
}

The standard inserter and extractors are typically implemented as friend functions:

template <std::size_t SIZE>
class Point : public std::array<double, SIZE> {
public:
 static constexpr char DELIMITER{','};
 Point(std::array<double, SIZE> lst) 
 : std::array<double, SIZE>{lst}
 {}
 friend std::ostream &operator<<(std::ostream &out, const Point &p) {
 bool notfirst = false;
 for (const auto &coord : p) {
 if (notfirst) {
 out << DELIMITER; 
 } else {
 notfirst = true;
 } 
 out << coord;
 }
 return out;
 }
 friend std::istream &operator>>(std::istream &in, Point &p) {
 if (in >> p[0]) {
 for (std::size_t i=1; i < SIZE; ++i) {
 char c;
 in >> c >> p[i];
 if (c != DELIMITER) {
 in.setstate(std::ios::failbit);
 }
 }
 }
 return in;
 }
};

Fix the bug

Try putting 2,1 as the first and last coordinate pair, and you'll see that the program incorrectly identifies the closes pair as (3,5) and (4,5).

Reconsider the algorithm

It's not necessary to sort on both the X and Y coordinates. See this description of a planar solution for a detailed explanation of an optimal algorithm. The existing code appears to attempt some but not all of this algorithm.

Avoid calculations that are not needed

When optimizing for performance (only after the code is correct!), it's often useful to ponder which calculations might not need to be done. In this case, it's mostly not necessary to do calculate the sqrt to get the actual distance between points since ordering may be done on squared distances with no loss of generality or correctness.

Question 6

First, thank you for taking a part of your time to write your observations. It's very helpful. I have only one question, about putting 2,1 as the first and last point, I have a check to make sure it's not the same point repeated, because that would make it a single point and not considered as the closest pair. I don't know if that's what you meant or not. But if it's not i'd be grateful if you clarified more. Thanks again.

Question 7

Try 2,1 and 2,1.5 and you'll see the problem.

Question 8

Aha! I found out the cause. But I had a problem solving it, If you took a look at my SO question I'd be grateful. Thanks. stackoverflow.com/questions/39834257/…

Edward EdwardEdward 67.2k4 gold badges120 silver badges284 bronze badges · Answer 1 · 2016-10-01 21:24:53Z

Here are some observations that may help you improve your code.

Use all required `#include`s

The code uses std::vector which means that it should #include <vector>. It was not difficult to infer, but it helps reviewers if the code is complete.

Use better names

The Point class is well named and easy to understand, but member functions sortByX and sortByY are not as good because they only return the result of a comparison and don't actually sort anything by themselves. I might call them Xless and Yless which would be a better explanation of what they do.

Eliminate "magic numbers"

There are a few numbers in the code, such as 2 and 10 that have a specific meaning in their particular context. Generally it's better to avoid that and give such constants meaningful names. That way, if anything ever needs to be changed, you won't have to go hunting through the code for all instances of "10" and then trying to determine if this particular 10 is relevant to the desired change or if it is some other constant that happens to have the same value.

Don't hardcode file names

The points.txt might be something that a user of this program wants to place elsewhere or name something else, so both hardcoding the name and burying it in the middle of the program are unhelpful.

Write more flexible code

Right now, only sets of exactly ten 2-D points can be evaluated. What if I have twelve points? What if I want to compare 3D points? The code could be made much more flexible in both cases by rethinking both the algorithms and the data structures. I'd recommend using a std::array for each point and a std::vector of those points for the main array.

Use idiomatic C++

The initialize_points is an odd standalone function taking a very particular kind of structure as an argument. It would make more sense to write an extractor for the Point class and then call it until the end of the list. For example, to do nothing except create 2D points from std::cin and then echo them back to std::cout we might have something like this:

int main()
{
 std::vector<Point<2>> points;
 Point<2> p{{}};
 while (std::cin >> p) {
 points.push_back(p);
 }
 for (const auto &p : points) {
 std::cout << p << "\n";
 }
}

The standard inserter and extractors are typically implemented as friend functions:

template <std::size_t SIZE>
class Point : public std::array<double, SIZE> {
public:
 static constexpr char DELIMITER{','};
 Point(std::array<double, SIZE> lst) 
 : std::array<double, SIZE>{lst}
 {}
 friend std::ostream &operator<<(std::ostream &out, const Point &p) {
 bool notfirst = false;
 for (const auto &coord : p) {
 if (notfirst) {
 out << DELIMITER; 
 } else {
 notfirst = true;
 } 
 out << coord;
 }
 return out;
 }
 friend std::istream &operator>>(std::istream &in, Point &p) {
 if (in >> p[0]) {
 for (std::size_t i=1; i < SIZE; ++i) {
 char c;
 in >> c >> p[i];
 if (c != DELIMITER) {
 in.setstate(std::ios::failbit);
 }
 }
 }
 return in;
 }
};

Fix the bug

Try putting 2,1 as the first and last coordinate pair, and you'll see that the program incorrectly identifies the closes pair as (3,5) and (4,5).

Reconsider the algorithm

It's not necessary to sort on both the X and Y coordinates. See this description of a planar solution for a detailed explanation of an optimal algorithm. The existing code appears to attempt some but not all of this algorithm.

Avoid calculations that are not needed

When optimizing for performance (only after the code is correct!), it's often useful to ponder which calculations might not need to be done. In this case, it's mostly not necessary to do calculate the sqrt to get the actual distance between points since ordering may be done on squared distances with no loss of generality or correctness.

First, thank you for taking a part of your time to write your observations. It's very helpful. I have only one question, about putting 2,1 as the first and last point, I have a check to make sure it's not the same point repeated, because that would make it a single point and not considered as the closest pair. I don't know if that's what you meant or not. But if it's not i'd be grateful if you clarified more. Thanks again.
Aha! I found out the cause. But I had a problem solving it, If you took a look at my SO question I'd be grateful. Thanks. stackoverflow.com/questions/39834257/…

Stack Exchange Network

Finding closest pair of 2D points, using divide-and-conquer

1 Answer 1

Use all required `#include`s

Use better names

Eliminate "magic numbers"

Don't hardcode file names

Write more flexible code

Use idiomatic C++

Fix the bug

Reconsider the algorithm

Avoid calculations that are not needed

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding closest pair of 2D points, using divide-and-conquer

1 Answer 1

Use all required #includes

Use better names

Eliminate "magic numbers"

Don't hardcode file names

Write more flexible code

Use idiomatic C++

Fix the bug

Reconsider the algorithm

Avoid calculations that are not needed

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use all required `#include`s