How to optimize/parallelize the following clustering/joining algorithm:

Question 1

I have relatively-small algorithm that takes up ~60% of the total run-time of my scientific code (57 lines of 3600), so I would like to find a way to optimize what I'm doing and make the code order-independent so that I can apply a cilk_for parallel strcture.

Here's what it does, verbally: I have an std::vector of pointers to custom objects called Segment (vector<Segment*> newSegment). Each Segment contains a std::vector of integers (mesh indices). In this function, I would like to find any Segment that overlaps with any another, with overlap being defined as the member indices overlapping on the number line. If they do overlap, I would like to join them together (insert the A.indices into B.indices) and delete one (delete A).

ex. 1: A.indices={1,2,3} B.indices={4,5,6} do not overlap; do nothing

ex. 2: A.indices={1,2,4} B.indices={3,5,6} do overlap; A= deleted B.indices={1,2,3,4,5,6}

The overlaps are sparse, but existent.

Here's the current code:

main algorithm:

//make sure segments don't overlap
for (unsigned i = 0; i < newSegment.size(); ++i) {
 if (newSegment[i]->size() == 0) continue;
 for (unsigned j = i + 1; j < newSegment.size(); ++j) {
 if (newSegment[i]->size() == 0) continue;
 if (newSegment[j]->size() == 0) continue;
 int i1 = newSegment[i]->begin();
 int i2 = static_cast<int>(newSegment[i]->end());
 int j1 = newSegment[j]->begin();
 int j2 = static_cast<int>(newSegment[j]->end());
 int L1 = abs(i1 - i2); 
 int L2 = abs(j1 - j2); 
 int dist = max(i1,i2,j1,j2) - min(i1,i2,j1,j2);
 //if overlap, fold segments together
 //copy indices from shorter segment to taller segment
 if (dist <= L1 + L2) {
 unsigned more, less;
 if (newSegment[i]->slope == newSegment[j]->slope) {
 if (value_max[i] > value_max[j]) {
 more = i;
 less = j;
 } else {
 more = j;
 less = i;
 }
 } else if (newSegment[i]->size() == 1) {
 more = j; less = i;
 } else if (newSegment[j]->size() == 1) {
 more = i; less = j;
 } else assert(1 == 0);
 while(!newSegment[less]->indices.empty()) {
 unsigned index = newSegment[less]->indices.back();
 newSegment[less]->indices.pop_back();
 newSegment[more]->indices.push_back(index);
 }
 }
 }
}//end overlap check
//delete empty segments
vector<unsigned> delList;
for (unsigned i = 0; i < newSegment.size(); ++i) {
 if (newSegment[i]->size() == 0) { //delete empty
 delList.push_back(i);
 continue;
 }
}
while (delList.size() > 0) {
 unsigned index = delList.back();
 delete newSegment.at(index);
 newSegment.erase(newSegment.begin() + index);
 delList.pop_back();
}

Relevant Segment object class definition and member functions:

class Segment{
 public:
 Segment();
 ~Segment();
 unsigned size();
 int begin();
 unsigned end();
 std::vector<int> indices;
 double slope;
};
int Segment::begin() {
 if (!is_sorted(indices.begin(),indices.end())) std::sort(indices.begin(),indices.end());
 if (indices.size() == 0) return -1; 
 return indices[0];
}
unsigned Segment::end() {
 if (!is_sorted(indices.begin(),indices.end())) std::sort(indices.begin(),indices.end());
 return indices.back();
}
unsigned Segment::size() {
 unsigned indSize = indices.size();
 if (indSize == 1) {
 if (indices[0] == -1) return 0;
 } 
 return indSize;
}

Ideas:

Since I don't care about the order of the Segment objects, they could be in an orderless container?
In my algorithm, I find overlap by looking at the first and last indices of each segment. I do an std::is_sorted (and then maybe a std::sort) when I fetch the indices because the list can change when more indices are inserted. Maybe I could put the indices in a std::set rather than std::vector to save the explicit sort-checking/sorting?
I'm pretty sure that by editing the indices as I go, this makes it order-dependent. Perhaps, I could break the code into the following organization using the concept of an undirected graph to make it order-independent:
- edge discovery (without modifying indices)
- join clusters of connected nodes (Segment objects that overlap) using a graph traversal
- delete empty Segment objects

Questions

Are either of the ideas above worthwhile or negligible to performance?
How else can I optimize it?
How (if not the above) can I make the algorithm order-independent?

Question 2

Is is_sorted() implemented by checking an internal flag (that gets set by sort() and unset by mutators) or does it compare elements to each other until it finds ones that are out of order? In fact, I don't see any mutators on Segment, so is that meant to be an immutable class? (if so, you could try ensuring sorted-ness once on construction)

Question 3

@lxrec std::is_sorted ( link ). I don't have any mutator methods per se, I just push_back directly to the public indices vector (e.g. newSegment[j]->indices.push_back(i)). I do only push_back to indices in only two other places in the code, so I could std::sort after each one, as well as sort when I combine them in this algorithm, and then delete the is_sorted/sort when accessing.

Question 4

The merging part (of the original algorithm) sounds like what a disjoint set (union-find) algorithm (en.wikipedia.org) would do.

Question 5

The is_sorted() function is probably expensive, and so you should avoid it. Why not sort everything in one go right at the beginning before entering the loops?

The best way to optimize your code is by inventing a new algorithm which avoids the nested loops of N, because that has a complexity of O(N^2) (see "big-Oh notation".) See Bart van Ingen Schenau's comment below on how to achieve this.

Question 6

The second if (newSegment[i]->size() == 0) continue; statement may, in fact, be needed since the indices can be removed if i overlaps with an earlier j. What's the performance consequence of sorting everything before entering the loops and after insertion VS making indices a std::set instead?

Question 7

I think the overlap problem is pretty clear regardless of the rest of my code. I think the pairwise search for overlap is unavoidably O(N^2) (right?), so I want to make it as quick and parallel as possible.

Question 8

@Stershic: The search for overlap isn't necessarily O(N^2). If you can change the order of the segments, you could order them by starting element. Then you would only have to look at adjacent segments to detect an overlap (you have an overlap if Segment[i].end() > Segment[i+1].begin()).

Question 9

As for using sets, they theoretically have a O(1) access time just as arrays do, but in actuality they are burdened by a non-negligible constant factor, which is not accounted for by big-oh notation. In any case, I do not really see how sets could be of any use to you here.

Question 10

I would recommend sorting the indices within each segment once in the beginning, and then either re-sorting after a merge, or, perhaps even better yet, merging the segments in such a way that no re-sorting is necessary. (It is easy to do when you know that both arrays are already sorted, see programmers.stackexchange.com/q/267406/41811.)

Question 11

I came to identical algorithm than @BartVanIngenSchenau in this comment Basically sort the set of segments based on the min element of each segment. Then two adjacent element overlap if and only if Segment[i].max >= Segment[i+1].min

But I would like to add that sorting looks unnecessary at all and only keeping the max and the min element. Just update them when merging segments. (segment1+segment2).min = min(segment1.min,segment2.min) and (segment1+segment2).max = max(segment1.max,segment2.max) Moreover if the segment are sorted by min element you have (Segment[i]+Segment[i+1]).min = segment[i].min (but this last thing could be premature optimization.) I noted + the merge of two segments.

For cache locality, the best for merging might be to have a layout simimar to the following layout

ptr_to_2nd_segment
n_elt_of_1st_segment,
min_elt_of_1st_segment,
[
[other_elts_of_1st_segment,]
max_elt_of_1st_segment,]
ptr_to_3rd_segment
n_elt_of_2nd_segment,
min_elt_of_2nd_segment,
[
[other_elts_of_2nd_segment,]
max_elt_of_2nd_segment,]
...

merging two elements in this configuration would be quite simple, it would be just a matter of updating ptr to next element, adding the number of elements,shifting the second segment elements and swapping the max elements if necessary. That would let some junk after each merge (8 bytes on 32 bits architecture and 16 bytes on 64 bits architecture). Knowing if you can support such junk is application dependent (moreover, you could doing a kind of garbage collection between two iteration of the algorithm.)

For parallelizing, once the set of segments are sorted by min element, you can divide in n part the set of segments and doing the merge independently. Then only merge at the border of each parts. But as @MikeNakis says in this comment as merging is quite memory bound, they might not be well parallelize

Mike Nakis Mike Nakis 32.8k7 gold badges82 silver badges116 bronze badges · Accepted Answer · 2015-11-08 17:52:43Z

4

The is_sorted() function is probably expensive, and so you should avoid it. Why not sort everything in one go right at the beginning before entering the loops?

The best way to optimize your code is by inventing a new algorithm which avoids the nested loops of N, because that has a complexity of O(N^2) (see "big-Oh notation".) See Bart van Ingen Schenau's comment below on how to achieve this.

Share

Improve this answer

edited Nov 8, 2015 at 20:29

answered Nov 8, 2015 at 17:52

Mike Nakis's user avatar

Mike Nakis Mike Nakis

32.8k7 gold badges82 silver badges116 bronze badges

15

The second if (newSegment[i]->size() == 0) continue; statement may, in fact, be needed since the indices can be removed if i overlaps with an earlier j. What's the performance consequence of sorting everything before entering the loops and after insertion VS making indices a std::set instead?

Stershic
– Stershic

2015年11月08日 18:01:03 +00:00
Commented Nov 8, 2015 at 18:01
I think the overlap problem is pretty clear regardless of the rest of my code. I think the pairwise search for overlap is unavoidably O(N^2) (right?), so I want to make it as quick and parallel as possible.

Stershic
– Stershic

2015年11月08日 18:02:48 +00:00
Commented Nov 8, 2015 at 18:02
4

@Stershic: The search for overlap isn't necessarily O(N^2). If you can change the order of the segments, you could order them by starting element. Then you would only have to look at adjacent segments to detect an overlap (you have an overlap if Segment[i].end() > Segment[i+1].begin()).

Bart van Ingen Schenau
– Bart van Ingen Schenau

2015年11月08日 19:50:01 +00:00
Commented Nov 8, 2015 at 19:50
1

As for using sets, they theoretically have a O(1) access time just as arrays do, but in actuality they are burdened by a non-negligible constant factor, which is not accounted for by big-oh notation. In any case, I do not really see how sets could be of any use to you here.

Mike Nakis
– Mike Nakis

2015年11月08日 20:33:57 +00:00
Commented Nov 8, 2015 at 20:33
1

I would recommend sorting the indices within each segment once in the beginning, and then either re-sorting after a merge, or, perhaps even better yet, merging the segments in such a way that no re-sorting is necessary. (It is easy to do when you know that both arrays are already sorted, see programmers.stackexchange.com/q/267406/41811.)

Mike Nakis
– Mike Nakis

2015年11月08日 20:34:36 +00:00
Commented Nov 8, 2015 at 20:34

| Show 10 more comments

Stack Exchange Network

How to optimize/parallelize the following clustering/joining algorithm:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to optimize/parallelize the following clustering/joining algorithm:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions